Right now, the only information that is digitally searchable in the menus is the descriptive data created for each item when they were cataloged. This includes useful things like the name of the restaurant, its geographical location, date etc. But the actual menu contents — all the dishes and wines once upon a time offered to the customer as they pondered the options for their meal — is only accessible through good old-fashioned sifting.
How will this information be used?
Researchers who use the collection — be they historians, chefs, nutritional scientists, or novelists looking for a juicy period detail — often have very specific questions they’re trying to answer. Where were oysters served in 19th century New York and how did their varieties and cost change over time? When did apple pie first appear on the Library’s menus? What about pizza? What was the price of a cup of coffee in 1907? To find out these sorts of things more easily, the library needs to extract all the delicious data frozen as pixels inside these digital menu photos. The best way to do this is transcription.
So just transcribe, and presto?
Well, the data will need some additional cleanup in order for their search engine to handle synonyms, spelling variants, faceting, all that good stuff, but hopefully you’ll start to get a palpable sense right away of what you're helping to build. Every transcribed item instantly becomes part of a searchable index, which allows you to much more nimbly trace dishes, ingredients and prices across the collection. NYPL will be blogging and tweeting about interesting discoveries that come up along the way. They hope eventually to offer some fun visualizations of the data.
While the Library could get decent OCR output from some of the clearer printed menus, many others are handwritten, or use fanciful typography and have idiosyncratic layouts that will result in little more than alphabet soup if they use mechanical translation methods. A more compelling reason is that they're interested in unpacking some specific types of information that are highly relevant to researchers: dishes and prices (and eventually menu sections, geographical locations and perhaps other data). Even with a crystal clear OCR text, a human being will still need to go through and identify each individual dish, price, section (appetizers, entrees, wines etc.), and so on. They're building a database of dishes.
Plus, as a library the NYPL knows that the more that people use a collection, the more the staff collectively learns about it. Their hunch is that there is a lot to be gained by inviting the public to help them go through these fascinating artifacts with careful attention, menu by menu, dish by dish. The Library also hopes that by doing so, they'll stoke people’s appetite (so to speak) to explore the collection further.
What’s OCR?
Optical Character Recognition. Basically, it’s the process by which the Library extracts usable, searchable text from scanned pages. It’s how Google Books and Hathi Trust do their search. Wikipedia has a good explanation.
More Articles
- National Archives Records Lay Foundation for Killers of the Flower Moon: The Osage Murders and the Birth of the FBI
- Nichola D. Gutgold - The Most Private Roosevelt Makes a Significant Public Contribution: Ethel Carow Roosevelt Derby
- Oppenheimer: July 28 UC Berkeley Panel Discussion Focuses On The Man Behind The Movie
- 2022 Election: I Was a Ballot Chase Participant; A Part of Get Out the Vote
- "Henry Ford Innovation Nation", a Favorite Television Show
- Julia Sneden Wrote: Going Forth On the Fourth After Strict Blackout Conditions and Requisitioned Gunpowder Had Been the Law
- Jo Freeman Reviews: Gendered Citizenship: The Original Conflict Over the Equal Rights Amendment, 1920 – 1963
- Jo Freeman Writes: It’s About Time
- Jo Freeman Reviews: Lady Bird Johnson: Hiding in Plain Sight
- Women in Congress: Biographical Profiles of Former Female Members of Congress