Blog

NEH Invites Proposals that Respond to Historical and Multilingual OCR Report

February 7, 2019
By
Playbills

The Office of Digital Humanities (ODH) is excited to announce the publication of an important new report titled "A Research Agenda for Historical and Multilingual Optical Character Recognition." The report, funded by The Andrew W. Mellon Foundation and authored by David Smith and Ryan Cordell of Northeastern University, outlines a set of 9 recommendations to improve historical and multilingual OCR. The full report may be found online here: https://ocr.northeastern.edu/report/.

The idea for this report came about several years ago when staff in ODH noticed that a large number of ODH-funded projects working with textual materials were stymied or slowed by poor-quality OCR. This observation led to discussions with grantees and with staff at both the Mellon Foundation and the Library of Congress. Because Mellon staff were already exploring ways to improve the OCR of digitized texts in Arabic and other connected scripts, and LC was seeking greater accuracy in the OCR of its large digitized collection of historical newspapers, we all agreed that a report was needed assessing the state of the art in OCR and identifying key research tasks that might help advance the quality of OCR for a variety of textual materials.

The report is the culmination of about two years of research, surveys, conversations, and in-depth interviews with scholars who work on OCR and rely on OCR'd texts to do their work, with computer and information scientists working toward improving OCR, with librarians who manage digital collections, and with funders who support projects that use and refine OCR methods. The recommendations in the report range from developing methods for improving statistical analysis of OCR output to exploiting existing digital editions for training and test data to convening OCR institutes in critical research areas.

We in ODH invite scholars to consider tackling one or more of these recommendations through our standing grant programs: Digital Humanities Advancement Grants and Institutes for Advanced Topics in the Digital Humanities. In addition, OCR-related projects might be a great fit for the Division of Preservation and Access’ Research and Development program. Links to these programs are below along with next deadline dates. If you have questions about the fit of your proposed project to our grant programs, please do be in touch with us at @email.

 Grants Deadlines
 Digital Humanities Advancement Grants

January or June

 Institutes for Advanced Topics in the Digital Humanities March
 Research and Development

May

Northeastern University press release: https://ocr.northeastern.edu/