Right now, the only information that is digitally searchable in the menus is the descriptive data created for each item when they were cataloged. This includes useful things like the name of the restaurant, its geographical location, date etc. But the actual menu contents — all the dishes and wines once upon a time offered to the customer as they pondered the options for their meal — is only accessible through good old-fashioned sifting.
How will this information be used?
Researchers who use the collection — be they historians, chefs, nutritional scientists, or novelists looking for a juicy period detail — often have very specific questions they’re trying to answer. Where were oysters served in 19th century New York and how did their varieties and cost change over time? When did apple pie first appear on the Library’s menus? What about pizza? What was the price of a cup of coffee in 1907? To find out these sorts of things more easily, the library needs to extract all the delicious data frozen as pixels inside these digital menu photos. The best way to do this is transcription.
So just transcribe, and presto?
Well, the data will need some additional cleanup in order for their search engine to handle synonyms, spelling variants, faceting, all that good stuff, but hopefully you’ll start to get a palpable sense right away of what you're helping to build. Every transcribed item instantly becomes part of a searchable index, which allows you to much more nimbly trace dishes, ingredients and prices across the collection. NYPL will be blogging and tweeting about interesting discoveries that come up along the way. They hope eventually to offer some fun visualizations of the data.
While the Library could get decent OCR output from some of the clearer printed menus, many others are handwritten, or use fanciful typography and have idiosyncratic layouts that will result in little more than alphabet soup if they use mechanical translation methods. A more compelling reason is that they're interested in unpacking some specific types of information that are highly relevant to researchers: dishes and prices (and eventually menu sections, geographical locations and perhaps other data). Even with a crystal clear OCR text, a human being will still need to go through and identify each individual dish, price, section (appetizers, entrees, wines etc.), and so on. They're building a database of dishes.
Plus, as a library the NYPL knows that the more that people use a collection, the more the staff collectively learns about it. Their hunch is that there is a lot to be gained by inviting the public to help them go through these fascinating artifacts with careful attention, menu by menu, dish by dish. The Library also hopes that by doing so, they'll stoke people’s appetite (so to speak) to explore the collection further.
What’s OCR?
Optical Character Recognition. Basically, it’s the process by which the Library extracts usable, searchable text from scanned pages. It’s how Google Books and Hathi Trust do their search. Wikipedia has a good explanation.
More Articles
- Kings, Queens, and Courtiers: Royalty on Paper
- CultureWatch Books: The Hemlock Cup and Train Dreams
- Argo, the Movie and Wired Magazine: How the CIA Used a Fake Sci-Fi Flick to Rescue Americans From Tehran By Joshuah Bearman
- Transfigured by the Magic of Light and Shade: Impressionism and Fashion
- The Endeavour Day: FDR's Evolving Approach to Fiscal Policy in Times of Crisis
- Colors of the Universe: Chinese Hardstone Carvings
- Black Lamb and Grey Falcon, Part 1
- The Horse: From Arabia to Royal Ascot
- The Century of the Child: Contributions of women as architects, designers, teachers, critics, and social activists
- National Archives Nationwide Network and Attachments: Faces and Stories from America’s Gates






