My18th
Follow and join in this discussion by using your free 18thConnect account.

TypeWright Correction Questions and Answers RSS

Posted by Locker.Thaddeus on Oct 10, 2013 02:20PM

As a beginner in TypeWright, I have lots of questions!  As a folklore scholar, I have questions about the conventions and accepted/best practices for correcting these printed-text items. 
Let us use this as a forum to discuss things like "Is it more helpful for the '@' to replace each garbled letter, or to substitute for the entire word?"  In fact, look for that question as the first thread!
This comment was modified on Nov 07, 2013 02:24PM
Creative Commons License This post is protected by a Attribution Non-Commercial Share Alike Creative Commons License. Learn more here.
Replies to this topic (18)

Posted by Locker.Thaddeus on Nov 13, 2013 07:28PM

Question #2 from Angela Vietto:
Q- When is the text "done"--are we having two or three people check a line before it's "100%"? I found a few lines that had been checked by someone but still needed obvious corrections.
A- There is a three phase workflow for "completed" documents:
Step 1: Once each page has been significantly edited, and a user reaches the last page in a document, a "mark this document complete" button will appear in the editing interface. To TypeWright, the text is "done" when a user reaches the end of the document and marks the text "complete."
Step 2: Marking the document complete notifies our 18thConnect team. We then have our 18thConnect admin editors "check" the document by reviewing the fully corrected text document.
Step 3: If the document is indeed complete, the 18thConnect admin editor notifies the user and offers a text and XML version of the document. If the document is not complete, the 18thConnect admin editor marks the document "not complete," and the document is available for crowd-sourced editing once again.
This comment was modified on Nov 13, 2013 07:29PM

Posted by Locker.Thaddeus on Nov 13, 2013 07:36PM

Question #1 from Angela Vietto
Q- Will the crowd-sourced corrected texts make their way back to ECCO, or wherever they came from, or will they only be available here?
A- Yes, the crowd-sourced corrected texts will make their way back to ECCO, improving ECCO's full text search capability. The corrected text will also be indexed by 18thConnect, and so improve our full text search capability, too. In this way, TypeWright work helps illuminate the dark corners that scholars were previously unable to interact with digitally.

Remember that you, the scholar working on the correction, will be offered digital versions of the text once it is deemed complete!

Posted by Locker.Thaddeus on Nov 14, 2013 03:15PM

Question #7 from Angela Vietto:
Q-  What do we do with pages featuring illustrations?
A-  There are two actions that should be taken:
Step 1: Delete the "OCR" lines that TypeWright has identified in the image. Because TypeWright red boxes are determined by the Gale OCR output we received from ECCO, and because this determination is mechanical, parts of images are often read as "lines."  (Please note that the "deleted" red box will remain on the page, but the "text" will be deleted from the line in the OCR output.)
Step 2: Report the page - this information will make it to the 18thConnect team, who will mark the document for further analysis by the eMOP team.  This page can then be considered in the eMOP team's efforts to "teach" OCR machines how to identify images as images.
This comment was modified on Nov 14, 2013 03:25PM

Posted by Locker.Thaddeus on Nov 14, 2013 04:18PM

Questions #3, #4, #5, and #6 from Angela Vietto
Q- What conventions are we following? How do we decide whether or not to honor extra spaces? Are we keeping or deleting page numbers, page titles, catch-words at the bottom of the page? How do we deal with signature markings?
A- Many scholars have at least one idea about how to answer these questions! We hope to use the "TypeWright Correction Questions and Answers" discussion to exchange ideas and reach a consensus on many of these questions of style. (Click here to join the discussion!)
In the meantime, keep these two things in mind:
--Keep it simple, keep it searchable! TypeWright is meant to make texts "fully searchable" as well as to contribute to the preservation of our cultural heritage. Would a scholar be searching 18thConnect or ECCO for catchwords or page numbers? Would putting a space between punctuation and the end of a sentence hurt searchability?
--The "editor" chooses. Because TypeWright is, in one conception, a precursor to your development of a digital edition, then the you can choose which conventions to follow, as it is part of your responsibility as editor. On the other hand, if you are just lending your "human eyes" and hands to a project, then look to see how the other major contributors to the corrections have answered your questions in their corrections.

Posted by geremyc on Dec 03, 2013 07:11PM

I have a question about how long it takes for the TypeWright team to "check" a text after it is completed and to send the person(s) who worked on it the text/xml version of the document. I am planning a 2014 January-Term class, and I'd like to make correcting a TypeWright text and preparing a (simple) digital edition of it part of the coursework. But it's only a three week course, so I am concerned about whether we will receive the xml document in time to work on the digital edition.

Posted by Locker.Thaddeus on Dec 11, 2013 05:22PM

Reply to geremyc:
How exciting that you want your students to use TypeWright for their projects! We are in the midst of developing a formal workflow for evaluating and processing the completed documents from TypeWright. The current draft of this workflow places the turn-around time (from declared complete to decision) at one week. When we determine that an honest effort has been made to correct the OCR text, the major contributors will be offered the corrected text within that week in an e-mail which will ask in what format (text or XML) to forward the text. If only minimal or no corrections have been made, then the text will be returned to "TypeWright enabled" status for further correction, again within the week, and the contributor/corrector will receive an e-mail to that effect.

I do hope that this timeline will allow the incorporation of TypeWright into your January term assignment. And please remember TypeWright for the classes you teach during the long terms!

Posted by joechill on Jan 02, 2016 03:40PM

Any consensus on keeping or deleting the line numbers?

Posted by mkvande on Mar 22, 2016 07:20PM

Has the question about signature marks (and catchwords) been answered?
Reply