How do we make 18th century search results accessible to a 21st century audience?

In recent posts, we’ve been giving you a backstage look at how we are building this online edition of Samuel Johnson’s A Dictionary of the English Language. In this post, we’ll explain how the computer calls up words when you search for them—and why the search results look the way they do.

Briefly, our results display follows modern dictionary conventions, which differ somewhat from the conventions that Johnson followed.

We created “labels” for our search results

When you search for a word—or click the “Random Word” button multiple times—the site returns a list of words that descends down the left side of the screen.

Search results display a list of words descending down the left side of the screen.
Search returns a list of words

Internally, we call these words labels because each one identifies a particular entry. When you click on a word/label, the online dictionary displays the entry for that word in Johnson’s Dictionary.

This concept seems straightforward enough . . . until you realize that the words in the list don’t actually appear in that form anywhere in the dictionary entry! For example, here’s the entry for rhombick, adj.:

Facsimile image of 1755 entry for "rhombick, adj." https://johnsonsdictionaryonline.com/1755/rhombick_adj
Facsimile of 1755 entry for rhombick, adj.

Although the terms rhombick and adj appear in the entry, they do not appear in this form. For one thing, the word rhombick is written with a combination of regular and small caps, and it has an accent mark after the o. Also, a peek inside the XML reveals a lot of markup around these terms that wouldn’t belong in a results list:

excerpt from XML file for rhombick, adj. that shows markup within the entry
Excerpt from XML file for rhombick, adj.

Because nothing in the transcribed entry was easy to grab and turn into a label, we decided to supply the labels ourselves. Our editorial philosophy has been to align every XML file with Johnson’s text and to keep our modern additions separate where possible. Accordingly, we store the word labels in their own database. Here’s a small piece:

Excerpt from database of headword labels; shows fields for folder, filename, label, search term for the words flammability through flamy
Excerpt from internal database of headword labels

When someone types a word into the search box, the computer looks for matches in the search term column, finds the corresponding files in the filename column, and displays the corresponding labels from the label column.

Our labels follow modern conventions

In keeping with our goal of making Johnson’s text as accessible as a contemporary dictionary, our word labels follow modern conventions, which differ from Johnson’s conventions in several ways. All these differences are intended to make the word labels easier to read:

We omit stress marks

Johnson indicates a word’s primary stress by putting an accent mark into the stressed syllable. Stress is the relative emphasis with which syllables of a word are spoken. Imagine the difference in stress between saying the word “abstract” AB-stract versus saying it ab-STRACT.

facsimile image of 1755 entry for abstract, n.s.; https://johnsonsdictionaryonline.com/1755/abstract_ns
Abstract, n.s. puts primary stress on the first syllable
facsimile image of 1755 entry for abstract, v.a.; https://johnsonsdictionaryonline.com/1755/abstract_va
Abstract, v.a. puts primary stress on the second syllable

The accent mark gives information about pronouncing the word, but the mark is not part of the word’s normal spelling. Our transcriptions preserve Johnson’s stress marks, but our word labels omit them.

screenshot of search for abstract, showing word labels and part of the transcription for abstract, v.a. in the 1755 edition; https://johnsonsdictionaryonline.org/views/search.php?term=abstract
The transcription (right) preserves Johnson’s stress marks, but the word labels (left) omit them

We modernize capitalization

Johnson developed a format for headword capitalization that provides information about each word’s origins. Most headwords appear in ALL CAPS. Headwords that Johnson considered to be derived from another English word are presented in Small Capital Letters (which, unfortunately, WordPress will not display; see the image of the 1755 entry for fumingly, below, for an example). And headwords that Johnson considered to be more foreign than English appear in ITALICIZED CAPITAL LETTERS.

facsimile image of Johnson's 1755 entry for fumid, adj. https://johnsonsdictionaryonline.com/1755/fumid_adj
Johnson’s 1755 entry for fumid, adj. puts the headword in ALL CAPS
facsimile image of Johnson's 1755 entry for fumingly, adv. https://johnsonsdictionaryonline.com/1755/fumingly_adv
Johnson’s 1755 entry for fumingly, adv. puts the headword in Small Capital Letters
facsimile image of Johnson's 1755 entry for fumette, n.s. https://johnsonsdictionaryonline.com/1755/fumette_ns
Johnson’s 1755 entry for fumette, n.s. puts the headword in ITALICIZED CAPITAL LETTERS

Our entry transcriptions preserve Johnson’s headword formatting. Our labels, however, follow modern lexicographic practices; words appear in lowercase letters unless capitalization is required by modern convention, as with proper nouns; see, for example, this entry for Hippocrates’s sleeve, n.s.:

facsimile image of Johnson's 1755 entry for Hippocrates's sleeve: https://johnsonsdictionaryonline.com/1755/Hippocratess_sleeve_ns
Johnson’s 1755 entry for Hippocrates’s sleeve, n.s.
Transcription of Johnson’s 1755 entry for Hippocrates’s sleeve, n.s. preserves his capitalization, but our word label follows modern conventions, capitalizing only the proper noun

We omit articles and particles

Johnson’s headwords are often preceded by an article (A, An, The) or particle (To), a practice that was common at the time. Modern dictionaries omit these words, and so do the word labels, though we preserve these words in the transcription.

screenshot of label for "flaw, v.a." next to the transcription of the 1755 entry for To Flaw. v.a.
The particle To is preserved in the transcription but omitted from the word label.

We standardize parts of speech

Johnson notes the parts of speech of many headwords, but—as is typical for the eighteenth century—he did not consistently employ the same tidy abbreviation every time.  His part of speech designations range from the brief n.s. to the more elaborate this is a kind of substantive, being, according to its signification, singular or plural. Some of his abbreviations are clear, if variable, such as part. / partic. / particip. / participle. Others are more opaque, such as ad., which could signify either adjective or adverb.  At times, Johnson did not list any part of speech.

Our entry transcriptions preserve Johnson’s part of speech designations as he provided them. Our labels, however, standardize Johnson’s abbreviations in order to make search results easier to read. For example, the labels use the abbreviation part. for participle no matter how Johnson abbreviated the term in the entry.

It’s important to note that our labels present Johnson’s designations in abbreviated form. Our labels do not attempt to interpret Johnson’s designations. Where Johnson’s part of speech designation is opaque, as with ad., we change nothing. And where Johnson’s part of speech designation is too long to fit on a label, we omit it from the label but do not attempt to substitute something else.

We number homographs

Often, multiple headwords have identical spellings; in other words, they are homographs. At times, different headwords have not only the same spelling but also the same part of speech, which means that their labels could also be identical. Unfortunately, a list of identical labels can falsely appear to be an error:

screenshot of search results for "button"; three identical labels for "button, n.s." falsely appear to be an error
Johnson defined three separate homographs for button, n.s., but identical word labels could falsely appear to be an error

To distinguish these labels, we number them:

screenshot of search results for "button" with identical labels numbered; https://johnsonsdictionaryonline.org/views/search.php?term=button
We number otherwise-identical labels in order to distinguish homographs

Yes, we did this work by hand

We were able to automate some of the label-making process. For example, we were able to extract a list of (most) headwords and their parts of speech from our XML files. But a great deal of this work has been carried out by actual people, looking at words one at a time.

Yes, we’re still working on the labels (you can help)

We’re still working hard to improve the site infrastructure so that our labels display as intended. Sometimes we discover that a particular label has a typo, or that a search finds the correct entry but displays the wrong label, or that a search yields unexpected results.

Please help by alerting us to any surprising or confusing search results you encounter!