My18th

EEBO now in TypeWright

EEBO in TypeWright

We are pleased to announce that the Mellon-funded Early Modern OCR Project – eMOP – has completed running Optical Character Recognition Software on the 138,538 documents in ProQuest’s Early English Books Online (EEBO), and we are now making almost all of them available in 18thConnect.org for correcting the OCR. Some document images were too poor to run through the software, but we have loaded the resulting “dirty OCR” for 113,909 documents into the TypeWright tool at 18thConnect.org for crowd-sourced correction (http://www.18thconnect.org/typewright/documents). We were able to get an excellent contract with both ProQuest and Gale for all the documents that are loaded into TypeWright, all of EEBO and Eighteenth-Century Collections Online (ECCO): any scholar or student who corrects a document gets to keep it to do whatever they wish with it, ideally create an online digital edition such one you can see here, created by an undergraduate student of Stephen Gregg’s.

Once corrected, 18thConnect will send you the document in both plain-text and TEI-encoded formats. Additionally, the full-text will then be full-text searchable in both ProQuest and Gale’s EEBO and ECCO, and in 18thConnect.org. When you search the latter, 18thConnect gives search returns in the form of links to the texts in EEBO or ECCO, but, for those who use 18thConnect without subscriptions to those databases, we also provide information about holding libraries. Moreover, for those who DO subscribe to these catalogues, our research capacities will have been increased by working on the data we care about. Please note that these catalogs are being sold to libraries just as they are – in correcting the data, we are NOT increasing the profits of these companies, only our own research capacities (please see Mandell and Grumbach, “The Business of Digital Humanities: Capitalism and Enlightenment,” Scholarly and Research Communication 6.4 [2015]).

A word about search: although all of Gale’s ECCO is searchable by word, OCR errors diminish the number of results one gets. A forthcoming article by Mandell demonstrates that the error rate in searching for bigrams (two-word phrases) is 50 to 60%–that is, one is missing over half the results one might otherwise get. In the case of EEBO, only those texts that have been typed by the Text Creation Partnership are searched by word when you are searching EEBO, as you can see on the EEBO search page, in the drop-down box describing what is searchable:

EEBOSearch

We sincerely hope that professors and students can work together to make sure that these unstranscribed and poorly mechanically transcribed documents (the 85,200 documents so far not available to search as full text) do not become part of a “dark archive,” but can be fully searchable by future generations of scholars, both inside and outside the academy. [Note: This paragraph was slightly edited from it’s original version, on March 11th, 2016.]

You can access the EEBO documents at 18thConnect.org, using the TypeWright tab, “Advanced Search,” or the Search Tab and selecting “TypeWright Enabled Documents”; in both cases, also select “EEBO” under “Other Collections.”

In addition to the instructions for using TypeWright available on the site itself once you begin editing a document, we an introductory video available. We also have a few short videos available on a playlist on YouTube (and below) that introduces TypeWright features one by one, and includes a video about editing EEBO texts specifically, which pose their own kinds of problems.

 

 

Also, feel free to contact us with questions or concerns at technologies@18thConnect.org.

ASECS 2015 Pre-Conference Workshop: Liberate the Text

Come to the Liberate the Text ASECS Pre-Conference Workshop, beginning at 8:00 am on Wednesday, March 18, at the Westin Bonaventura, the San Fernando Room.

The workshop agenda is available here: http://idhmc.tamu.edu/asecs/2015-workshop-agenda/

If you can’t make it to the whole day, do stop by to see Danielle Spratt and Tonya Howe show us how to teach using TypeWright, from 2:30 to 4:00–

“Teaching students to create digital editions using TypeWright”
How can we use 18thConnect and TypeWright as a way of helping engage students more directly in understanding the literature, culture, and history of the eighteenth century? This session will be devoted to considering multiple effective models for using TypeWright in the classroom, ranging from designing a single DH-based assignment to a semester-long assignment that resulted in a digital edition. Throughout the discussion, we will focus on elements of effective project management models to offer tips and tools to help maximize the use of 18thConnect and TypeWright in addition to other resources (Scalar, Classroom Salon, etc.). The majority of the session will focus on offering hands-on time working with TypeWright to help participants think through and begin to create assignments tailored to serve the needs of faculty and their students.

From 18thConnect to TAPAS: My Digital Editing Journey

Philadelphia’s yellow fever epidemic of 1793 tested the civic and religious leaders of a city trying out its role as forerunner of liberty in the new Republic. Striving printer and publisher Mathew Carey became the self-appointed narrator of the epidemic, reporting what he witnessed, remediating anecdotes, and including lists of the recently deceased in his multiple editions of A Short Account of the Malignant Fever, Lately Prevalent in Philadelphia. Religious leaders Richard Allen and Absalom Jones offered a vitriolic retort to Carey’s depictions of the African American community. In A Narrative of the Proceedings of the Black People (1794), they criticized Carey for his racist depictions of African Americans’ efforts to aid the sick and the city generally in its time of great need and accounted for the many services their congregation at the Bethel AME Church had offered.

Image Courtesy of the American Antiquarian Society

Image Courtesy of the American Antiquarian Society

Image Courtesy of the American Antiquarian Society

Image Courtesy of the American Antiquarian Society

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

It is difficult to overstate the importance of Jones and Allen’s pamphlet: scholars such as Richard Newman, Patrick Rael, and Phillip Lapsansky have deemed it the first piece of African American protest literature. It was also the first time that two former enslaved persons claimed copyright in the United States, and the authors’, Jones and Allen’s, identities are partially mediated through print. This pamphlet war then shows how battles tied to books, in both legal and cultural landscapes, refract those fought over the agency that comes with citizenship and with establishing oneself as a viable force in a international marketplace created by empire.

 

Slide04

The yellow fever pamphlets—Carey’s, Jones and Allen’s, Dr. Benjamin Rush’s, just to name a few—have received considerable attention from scholars of the early Republic looking to understand civil, racial, and national identity in the first decade of the new nation. These pamphlets also had an international appeal, though little to no attention has been paid to the transatlantic reprints (please see my recent article in Book History on the reprinting of Carey’s pamphlet in London and Dublin). The scholarship on A Narrative of the Proceedings of the Black People almost entirely ignores the London reprint of their pamphlet, thereby overlooking the pamphlet as an early example of international abolitionist literature. The London publishing firm Darton and Harvey, Quakers with a bookshop on Gracechurch Street, just a few doors down from the Meeting House, reproduced A Narrative of the Proceedings of the Black People in the same year that it was printed in Philadelphia, and this is why I decided to create this digital edition of the London edition of the pamphlet.

My first step was to get access to an encoded version of the Philadelphia edition. 18thConnect made this possible through Typewright; I corrected the text-version of a document that Eighteenth-Century Collections Online (ECCO) has run through its optical character recognition, but has not been checked for machine errors. After I spent a few days correcting the OCR mistakes in A Narrative of the Proceedings of the Black People, Laura Mandell and her team at 18thConnect obtained for me the machine encoded xml file of the Philadelphia edition. Through the help of the TEI: Textual Encoding Initiative classes offered by Julia Flanders and Syd Bauman at the NEH-Peron_Screenshotsponsored Women Writers’ Project and the Digital Humanities Summer Institute (DHSI)(many thanks to 18thConnect for helping pay the tuition through its institutional partnership), I changed the encoding so that it reflected not the Philadelphia edition but the London edition. I also began to incorporate my own apparatuses. My edition then became one of the inaugural projects on TAPAS: TEI Archive, Publishing, and Access Service. In addition to offering the first digital surrogate of the London edition of this pamphlet, I have also done substantial research into who the people mentioned in the pamphlet are. Using city directories for 1791, 1793, 1794, and 1795 (I could not locate a Philadelphia directory for 1792), I have created a personography that charts what these people did and where they worked (often the same as where they lived). I then transformed that XML into a CSV, which I incorporated into CartoDB, through which I created a layered map.

This project grew out of the research I did for my dissertation, and when I embarked on a digital edition in early 2011, I was a complete novice in using digital editing tools.  My commitment to the pamphlet motivated to learn what I needed to make this digital edition a reality. My motivation would have come to naught had I not had the institutional support of 18th Connect and TEI initiatives at the Women Writers’ Project and the DHSI. I am grateful for their support throughout this project. The next phase for me is to submit the project to 18thConnect for review, and I would love any feedback before then, so please be in touch with me (mhardy@mwa.org) if you have any questions or comments on the project.

 

After many requests… Group Editing in TypeWright!

Numerous TypeWright users, many from the ranks of teachers and collaborating scholars, have asked us how to manage a group of people editing a single document in TypeWright.  Therefore, the team decided to produce a “How To.”

How to Create an 18thConnect Group for Editing a Document in TypeWright

A screenshot of the newest page.

Believe us, there are many, many other ways to do this!  We chose to publish this method to give anyone of you a place to start, and then develop methods of your own.  And if you are a solo editor and just want to create a new community of interest, the first two sections of this guide give basic instructions that will work for creating any group.

You can find this new guide as part of  the “TypeWright”  section in the “What is 18thConnect?” pages, along with the other new TypeWright documentation we have released this year, including an introductory video and a TypeWright FAQs page.  We also hope that group leaders–and group members–will join the 18thConnect group “TypeWright Users.”  This group provides a forum to share and discuss TypeWrighters’ ideas and experiences, as well as sharing how completed documents have been used for digital editions.

We want you to enjoy using TypeWright as much as we do, so please be sure to fill out the Survey linked from the bottom of the editing page!  Your responses will help chart the course for future developments in the capabilities and in the editable document collections that we add to TypeWright.

Happy TypeWrighting from myself, and all of us at the TypeWright Team!

 

Update:  The name of the Guide has been changed to “Creating a Group for Editing.”  The illustration still reflects the previous name.

The 18th Century Dilettante,

Anne Arundel

In Honor of Those Individuals Finishing the Semester: Lectures from 1797

Idle Schoolboy

Spring semesters around the world are drawing to a close, and the endings of academic quarters will soon follow.  Soon we will once again have time for frolicking — and for TypeWrighting!.

May children

 

 

 

 

 

 

 

 

 

To honor all students in their current fervor–and to keep their TypeWright skills from eroding over the summer–I offer these Lectures on Logic and Belles Lettres, printed at the University of Glasgow in 1979.

Continue reading “In Honor of Those Individuals Finishing the Semester: Lectures from 1797”

New TypeWright Help Pages in 18thConnect!

The TypeWright team has developed three new pages for 18thConnect, which will provide users with more information about our TypeWright tool.

 

What is _TypeWright

First is a general information page with a brief history of TypeWright development, followed by a video that walks the user through the editing interface. A page of TypeWright FAQs follows this general information. Please let us know if you’d like to see any other TypeWright questions answered, and we’d be happy to add them to the FAQ.

 

For our users who aim to utilize the crowd-sourced correction capabilities of TypeWright as the first step in making a scholarly, digital edition, we have an exciting announcement! Our 18thConnect team has carefully discussed and decided upon a set of Optional Markup Guidelines. These guidelines will allow our users to add Text Encoding Initiative (TEI) markup to TypeWright documents. Please consult the Guidelines, as we are only able to accommodate a subset of the available TEI elements in TypeWright as this time. These TEI elements will be output into a TEI/XML document when users finish correcting a text in TypeWright, for use in further digital work.

 

We are very pleased to announce this new, detailed documentation and functionality for our TypeWright tool, and we aim to continue developing features to serve the dynamic and enthusiastic eighteenth-century studies community.

 


The release of these three new pages coincided with the 2014 ASECS meeting in Williamsburg, VA and the “Liberate the Text!” workshop, organized by our 18thConnect and ARC Director, Dr. Laura Mandell.

 

Spring is Come! And with it a “Triumph of Wit” from 1712

I 005590060000010_thumbwill not speak of the weather, in hopes that we may induce Spring to stay with us for a long visit.  In Honor of her latest triumph over the Polar Vortex, I present our new TypeWright Featured Text: “The Triumph of Wit,” a 1712 collection of poems by John Shirley on various miscellaneous topics.

This text provides us with many opportunities for improving the plain text that underlies the image!  The first (pictured, left) and the second pages are considered to be images that do not require correction, so skip them and any other pages mostly filled with illustrations.  On page three (pictured, lower right) the text begins; notice that this document has been printed within a surrounding border, which the OCR engine has read and then attempted to type, making many lines of type that can be deleted with little thought.  Occasionally, the very odd issue arises with unnecessary red boxes that one encounters at the bottom of the page in the text, but that appear in the area of the uppermost border on the page image.  And some small boxes appear in the middle of lines that also have a box for the whole, or most of the whole line. For all of these erroneous red boxes you will use the red-X-button to delete the text — but remember that the box will remain on the page with nothing in the text correction box.

Triumph

The scanned pages show uneven inking, making the underlying text similarly uneven in accuracy.   Sometimes the ink bleeds through from the other side of the printed page, creating additional red boxes where there is not text.  This bleed-through and uneven inking also means that many letters and words cannot be read — even by our human eyes — with any certainty.  Please, remember not to guess at illegible words in the text, but to replace the unreadable with the @ symbol!!

Headers and footers are printed in this document, as they are in many other 18th century documents.  Regarding the titles and page numbers printed at the top of the page, the OCR engine generally ignores these, but sometimes they are read, boxed, and included in the text. The scholars who uses the texts will have their own opinion about keeping these in or excluding them from the plain text file upon which they will base their digital scholarly edition; because our goal here is a general crowd-sourced edit on this and all featured texts, let us split the difference and neither add, nor change, nor delete what is within the generated red boxes that contain header material, but let us do correct within whatever red boxes have been generated.  On the lower side of the page, the OCR engine does seem to consistently find, box in, read, and type the catchwords and folio signatures printed as the last line on pages.  Please check and correct such footers as one single line.

You may have noticed some new pages added to the “What is 18thConnect?” section of the 18thConnect website.  I will describe these fully in my next post, but in the meantime, the material on them may help you if you have any questions about using TypeWright that are not fully covered in the instructional bits below the text correction area on the TypeWright correction page.  And as always, feel free to use the “Contact us” link on every page within 18thConnect.

With all the above in mind, Happy TypeWrighting!

Your 18th Century dilettante,

Anne Arundel