Q- Will the crowd-sourced corrected texts make their way back to ECCO, or wherever they came from, or will they only be available here?
A- Yes, the crowd-sourced corrected texts will make their way back to ECCO, improving ECCO’s full text search capability. The corrected text will also be indexed by 18thConnect, and so improve our full text search capability, too. In this way, TypeWright work helps illuminate the dark corners that scholars were previously unable to interact with digitally.
In addition, the scholar-editor(s) of a TypeWright document, once that document is confirmed complete by the 18thConnect team, receives their own plain text and XML versions of the document. These versions can then be used by the scholar-editor to make a digital edition, for data mining, or other interactions with other digital humanities tools. Digital editions made with TypeWright corrected texts can also be submitted to 18thConnect for peer review.
Q- When is the text “done”–are we having two or three people check a line before it’s “100%”? I found a few lines that had been checked by someone but still needed obvious corrections.
A- There is a three phase workflow for “completed” documents:
Step 1: Once each page has been significantly edited, and a user reaches the last page in a document, a “mark this document complete” button will appear in the editing interface. To TypeWright, the text is “done” when a user reaches the end of the document and marks the text “complete.”
Step 2: Marking the document complete will notify our 18thConnect team. We then have our 18thConnect admin editors “check” the document by reviewing the fully corrected text document.
Step 3: If the document is indeed complete, the 18thConnect admin editor notifies the user and offers a text and XML version of the document. However, because not all “marked complete” texts are perfectly corrected. When necessary, if the 18thConnect team determines that a document is not complete, the team re-releases the text for further editing by the public.
Q- What do we do with pages featuring illustrations?
A- There are two actions that should be taken:
Step 1: Delete the “OCR” lines that TypeWright has identified in the image. Because TypeWright red boxes are determined by the Gale OCR output we received from ECCO, and because this determination is mechanical, parts of images are often read as “lines.” (Please note that the “deleted” red box will remain on the page, but the “text” will be deleted from the line in the OCR output.)
Step 2: Report the page – this information will make it to the 18thConnect team, who will mark the document for further analysis by the Early Modern OCR Project (eMOP) team. This page can then be considered in the eMOP team’s efforts to “teach” OCR machines how to identify images as images.
Q- Does the red box remain on the page after I have deleted the line?
A- Yes. Each red box retains the history of the adjustments to the the line, so the red box stays on the page even after you “delete” the line. The text in that line should be struck-through when you first delete the line, and subsequent returns to that page should show no text associated with the deleted red box.
Q- What conventions are we following? How do we decide whether or not to honor extra spaces? Are we keeping or deleting page numbers, page titles, catch-words at the bottom of the page? How do we deal with signature markings?
A- Many scholars have at least one idea about how to answer these questions! We hope to use the “TypeWright Correction Questions and Answers“ discussion to exchange ideas and reach a consensus on many of these questions of style. (Click here to join the discussion!)
In the meantime, keep these two things in mind.
- Keep it simple, keep it searchable! TypeWright is meant to make texts “fully searchable” as well as to contribute to the preservation of our cultural heritage. Would a scholar be searching 18thConnect or ECCO for catchwords or page numbers? Would putting a space between punctuation and the end of a sentence hurt searchability?
- The “editor” chooses. Because TypeWright is, in one conception, a precursor to your development of a digital edition, then the user should choose which conventions to follow, as it is part of your responsibility as editor. On the other hand, if you are just lending your “human eyes” and hands to a project, then look to see how the other major contributors to the corrections have answered your questions in their corrections.
- With that said, there are some minimal requirements of editing that TypeWright admins look for when reviewing a “completed” document. These guidelines (below) can also be found in the TypeWright editing interface.
- If a word or portion of a word is illegible, type “@” in its place; please do not make any guesses about what a word might be.
- Copy original spelling and punctuation, typing what you see on the page, except in the case of the long ‘s’: use ‘s’ and not ‘f’ when ‘s’ is called for.
- Include end-of-line hyphens, preserving the syllables as they occur on each line.
If you have any other questions, please email us: email@example.com