Major Contributors

Gale-Cengage

Gale-Cengage Learning and 18thConnect have joined forces to undertake a major initiative benefitting scholars and improving the digital archive for future generations.

Gale’s ECCO catalog, Eighteenth-Century Collections Online, contains page images for 182,000 texts, some of them as lengthy as Clarissa. Because the process of creating such a set of images has taken decades of work, some of the page images are not readable enough to be transformed into typed texts by computer programs designed for this work. 18thConnect.org is a community of scholars and open-source online finding aid hosted by the University of Virginia. It has received a grant from the Mellon Foundation as well as NEH support from the NCSA and I-CHASS (the National Center for Supercomputer Applications and the Institute for Computing in the Humanities, Arts, and Social Sciences at the University of Illinois) in order to develop a new, open-source software program that, after being trained on the ECCO catalog, will itself be available for public use.

18thConnect will re-run the ECCO page images through this new program in order to generate cleaner text, if possible, that Gale has been able to do commercially. Next, 18thConnect will provide a window for users—anyone who wishes to register with an email address—to correct the typing of these texts. That these texts are correctly typed is, we believe, crucial for searching, data-mining, and making them findable and comprehensible to future generations. Users who wish to correct whole texts will receive, in compensation for their work, access to the fully typed text. 18thConnect and NINES hold workshops each summer to demonstrate to scholars how to build library-quality scholarly editions from plain typed texts. Once properly constructed, these editions can be submitted to 18thConnect for peer review. The editorial board at 18thConnect is comprised of top scholars in the field, and acceptance letters are designed to indicate the value of these editions to Promotion and Tenure Committees. Further, library-quality scholarly editions are eligible to become MLA Electronic Scholarly Editions. Positively reviewed editions are first accepted into the 18thConnect online finding aid. If a scholar’s edition has been accepted (positively peer-reviewed), Gale Cengage may choose to publish the edition along with the page images as a print-on-demand edition.

In addition, because of this mutually beneficial collaboration between Gale Cengage Learning and 18thConnect.org, the ECCO catalog is now completely searchable on the freely available 18thConnect site. Everyone may search through the bibliographical information in the Gale catalog. If you or your institution subscribes to Gale, clicking on the link returned with an entry will get the user into the ECCO text collection and take him or her directly to that particular text. (One has to be on a work computer at a subscribing institution or on a proxy server). But if the user does not subscribe, he or she may do one of two things: he or she can find out the holding libraries for that text through the ESTC (English Short Title Catalog), also online at 18thConnect.org, OR the user may simply click on “correct” in order to put the text in his or her correction queue. Once corrected, the typed text belongs to the user to do with as he or she wishes—and again, we recommend creating a scholarly edition. All the work of correction will go back into the ECCO collection to improve this valuable resource for posterity.

Text Creation Partnership

It was precisely the desire to improve for posterity the ECCO collection (Eighteenth-Century Collections Online, owned by Gale Cengage) that motivated the Text Creation Partnership at the University of Michigan to undertake the TCP-ECCO project. The TCP has produced hand-typed page images of texts in the EEBO collection (Early English Books Online, owned by ProQuest) because documents printed before 1700 are generally impossible to type mechanically. “Mechanical typing” requires having an OCR program (Optical Character Recognition Program) run though the page images and “read” them, turning into typed letters the lines of print visible in the image file or pdf. When you use the “find” function to search a pdf file, it is actually a typed version of the image that you are searching, and that version has been generated by OCR. But standard OCR engines will not work on page images of texts that were printed before about 1820, and no OCR engine at all will work on texts before about 1720.

18thConnect

Category

Archives

Meta