A Brief History of TypeWright
TypeWright is a tool for correcting the machine-generated-text-version of a document taken from scanned page images.
These text-versions are crucially necessary to scholarship in the digital age: they are what enables full-text searching and datamining of documents with historical importance as well as the preservation and curation of the texts of these documents. Philip Lord defines data curation as “the activity of managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and re-use” (Philip Lord, Alison Macdonald, Liz Lyon, David Giaretta, From Data Deluge to Data Curation, 2004). Since our ultimate goal is to be able to search across vast numbers of textual archives, those archives – and ideally the materials within those achives – must be digitally encoded in similar ways for them to “play nicely with each other.”
Collex gives these documents the needed common code.
OCR, optical character recognition, programs have difficulty reading pre-modern type because of line and inking unevenness, noise, unusual typefaces, ligatures, the long ‘s’, etc. Type-founding and setting practices did not become modernized until the 19th century (from 1820 to 1850). One can see the process very clearly in Google searches. Right now, the text running behind the page images of these 18th century texts has been mechanically “read” and typed by an OCR program, leaving behind errors that need to be corrected by human eyes and hands.
TypeWright is our tool for turning OCR-generated “dirty” text into digital text that can be fully accessible to searches, datamining, and other digital interactions and analysis.
TypeWright was created thanks to a grant to Miami University by the Andrew W. Mellon Foundation. It was built by Performant Software following the lead of the Australian Newspaper Digitisation Program (Rose Holley, “Many Hands Make Light Work: Public Collaborative OCR Correction in Australian Historic Newspapers,” National Library of Australia [March 2009]).
Video Table of Contents
- “Overview” 0:00:00
- “Navigation: Toolbars” 0:00:49
- “The Red Box” 0:02:13
- “The Text Correction Area” 0:04:26
- “Interacting with the Text” 0:06:23
- “Deleting an Empty Box” 0:08:21
- “Navigation: Arrow Buttons” 0:10:20
- “Report this page” 0:10:53
- “Resize Red Box” 0:12:42
- “Toolbars: Insert” 0:14:10
- “Mark Document Complete” 0:15:00
- “Instructions Section” 0:15:23
- “Further Help” 0:16:32
Having trouble viewing the above video in your browser? Visit our “TypeWright Features” playlist on YouTube for this walkthrough and more.
If you have any questions please email us: technologies@18thConnect.org.