Chapter 9. General Encoding Guidelines

This chapter of our documentation is still in beta. We welcome feedback, corrections, and questions while we finalize the page in our 2024–2025 work cycle.

Introduction to General Encoding Guidelines

Rationale

Many encoding practices are common to all types of LEMDO XML files. Subsequent chapters deal with particular types of files (modern texts, annotations, etc), but you will want to come back to this chapter on a regular basis to remind yourself of general encoding principles and practices.

Caveats and Warnings

Avoid Copying From Word Files

Word files from any era and especially from the early days of word processing contain hidden characters that will break the LEMDO build. Do not copy and paste directly from a Word file (a file with a .doc or a .docx extension) into an XML file. If you must copy content, put it in a text editor first. All PCs have a text editor already installed. There are many text editors available. Commons ones are Notepad (Windows OS), Text Editor (Linux Ubuntu OS), Atom, BBEdit, and SimpleText (Mac OS). Check the file extension of the file from which are copying. It must be .txt.

Always Open Oxygen Project File First

It is imperative that you begin every work session by opening the lemdo-all.xpr project file. This file contains important scripts that control the behaviour of Oxygen and give you access to tools that we have built especially for LEMDO users. The XPR file also ensures that you are validating your XML files against the schema and Schematron that determine how you are supposed to encode your play; the schema and Schematron catch your mistakes and prompt you to correct them. See LEMDO Oxygen Project.

Validate Your Files Before Committing

Any file that you commit to the repository must be valid. It must follow the rules that are embodied in the schema and Schematron. An invalid file can break the build process. If you open the lemdo-all.xpr file first before you open any other file, you can be sure that you are validating against the LEMDO schema and Schematron. See Validate Files.

Name Files: Naming Conventions

This document explains LEMDOʼs naming conventions. Note that the principles set out in this document apply only to TEI/XML documents and related source materials; other naming conventions are covered in LEMDO Programming Principles
When you create a new file for the project, even if it is just a word-processor file with notes in it, remember to give it a descriptive file name, and do not use punctuation or spaces in the file name. SVN can handle such characters in file and folder names, but files with these types of names can cause problems. For example, instead of Fred’s notes on documentation.odt, use Fred_notes_on_documentation.odt.
LEMDO has three different naming groups:

Infrastructure IDs

Infrastructure IDs refer to IDs belonging in the following files, including the files themselves: PERS1, BIBL1, ORGS1, TAXO1, and LOCA1. The convention that we follow requires you to use unique IDs. These IDs must be structured as follows: XXXX#, where X is a letter and # is a number consisting of one or more digits. The letters and numbers are not arbitrary: use the first four meaningful letters of the title of the file; if the file you are naming uses a particular 4-letter combination for the first time in our database, the number will be 1. Otherwise, check the A–Z Index for the next available number for this particular 4-letter combination. For example, when you create a personography file, the first four letters are PERS. Given that this file was the first instance of this letter combination, the number was 1: PERS1.xml. Always check your ID against the A–Z Index, before finalizing it and committing. The A–Z Index text file is available from the Resources menu on the lemdo-dev site.

Texts and Anthologies

When you add a new text to the repository, consult DREʼs list of play IDs, and look for the title of the play you are adding. You will notice that the convention for plays is different than that of infrastructure files. The difference is due to scholarly conventions for play naming. IDs for Shakespeare plays have been long standardized; they are even listed in the Chicago Manual of Style and the MLA Handbook. DRE extends the implied principles for naming Shakespeare plays to all other plays. If you cannot find the title you are looking for, consult with the LEMDO Director, who will in turn consult with the DRE Coordinating Editors.

Facsimiles and Performance

The conventions for facsimiles and performances include the ID of the text to which they are connected. When you name a facsimile, follow this pattern: facs, followed by an underscore (_), followed by the document ID (xml:id), followed by an underscore (_), followed by the library acronym from which the facsimile is taken. If you are not sure about the latter, consult with the LEMDO Director. For example, the facsimile corresponding to The Honest Whore, Part 1, Quarto 1 is named facs_1HWQ_Q1_Folger.
When you name a performance file, follow this pattern: perf, followed by an underscore (_),followed by the document ID (xml:id) of the text to which the performance belongs, followed by an underscore (_), followed by the ID of the anthology to which the performance belongs. For example, the files pertaining to the QME production of Friar Bacon and Friar Bungay all begin with perf_FBFB_QME.

File Extensions

Never capitalize file extensions (i.e., .jpg, .png).

Create a Unique Value for an xml:id Attribute

LEMDO uses @xml:id attributes across the project to give unique identifiers to various entities: people, productions, sources, projects, organizations, characters, divisions in documents, paragraphs in documents, and anchors in documents. We have thousands of @xml:id attributes in the LEMDO project, and the value (id) of each one of them must be unique across the entire project. Duplicate ids will break the build.
For edition files and identifiers within edition files, we meet the need for uniqueness by insisting that every @xml:id value begin with emd followed the abbreviation of the play. For example, every xml:id created for the edition of Famous Victories begins with emdFV.
For anthology pages, we meet the need for uniqueness by insisting that every @xml:id value begins with the abbreviation for the anthology. For example, the About page for QME has the @xml:id value of qme_about, which is also the name of the file (qme_about.xml). The about page for DRE has the @xml:id value of dre_about. (Note that when we build your final anthology, we remove the qme_ and dre_ portion of the file names to keep your final, public-facing URLs lightweight.)
For anchors, we meet the need for uniqueness by insisting that every @xml:id value begin with the full name of the file. Every anchor in the modern text of Famous Victories (emdFV_M) begins with emdFV_M_.

Practice: Create Unique IDs for Entities in Sitewide Databases

This section pertains mainly to LEMDO Team members at UVic: When you are creating an @xml:id value for an entity in one of the sitewide database files (PERS1, PROS1, GLOSS1, HAND1, BIBL1, PROD1, ORGS1, or TAXO1), you must check the complete list of LEMDO @xml:id values to ensure that you are creating a new value. These ids are generally four uppercase letters followed by a number (JENS1, SHAK1, ADAM1, LEMD3, and so on).
We have two ways of viewing all the @xml:id values in the project (except for the anchor, speech, and paragraph ids). Note that you must be on the Jenkins site looking at the most recent build. When you begin working for LEMDO, we will give you the links to these pages.
Generated .txt list: This page loads in your browser and lists only the ids. It is a lightweight page that loads quickly and completely. Because it does not give any details about the entity, the page is useful mainly for determining whether or not an id has been used already. Search the page with a simple Ctrl + F.
Generated HTML page: This page loads in your browser as a table. The three columns of the table list:
The id with a hyperlink to the item
Item type
Title (if applicable—not all entities have titles)
This page is huge and slow the load. Some browsers will not completely load a table as long as this one. Before you search the page, ensure that it has completely loaded. The table loads alphabetically by default and the final entries will be ids beginning with Z.
Assign the next available id. If you want to create an id beginning with ADAM, search the page for the string ADAM. If your search shows that ADAM1, ADAM2, and ADAM3 have been used, then the next available id is ADAM4.

Encode Document Status

Rationale

The status of a document determines a number of things:
The schema and schematron requirements that the document must meet to be valid
whether we can publish the document
what date we record as the date of publication
if LEMDOʼs processing should add a label indicating that the document is peer reviewed
As a document moves through the stages of remediation, encoding, peer review, proofing, and publishing, we track its movement via dated <change> elements in the <revisionDesc> . This information is also useful to the LEMDO team for tracking our progress.

Practice

Every encoded file should have a <revisionDesc> . This element should contain an @status attribute that documents the current stage of the fileʼs encoding process.
LEMDO’s predefined document status values are listed in the table below.
Value of @status Description
prgGenerated The document has been programmatically converted from IML to LEMDO TEI P5 via a series of transformations. The file is a .xml file.
IML-TEI There are stray IML tags in these texts that we retain until we have proofed the TEI. If the file is a semi-diplomatic transcription, the IML may have been checked by an ISE editor but LEMDO has not yet checked it.
IML-TEI_INP The programmatic conversion is in the process of being carefully checked by a LEMDO research assistant. The file is a .xml file.
IML-TEI_proofed The programmatic conversion has been carefully checked by a LEMDO research assistant. The file is a .xml file. The transcription has been carefully checked and corrected by a LEMDO RA against an open-access digital surrogate.
TCP-TEI The text has been programmatically converted from TCP TEI P4 to LEMDO TEI (P5) via a series of transformations. The file is a .xml file. The transcription is only as correct as the underlying TCP transcription (which contains gaps, errors, and normalized long s characters). The TCP metadata is retained.
TCP-TEI_INP The text has been programmatically converted from TCP TEI P4 to LEMDO TEI (P5) via a series of transformations. The file is in the process of being carefully corrected and proofed by a LEMDO Research Assistant. The file is a .xml file.
TCP-TEI_proofed The text has been programmatically converted from TCP TEI P4 to LEMDO TEI (P5) via a series of transformations and carefully corrected and proofed by a LEMDO Research Assistant. The file is a .xml file. The transcription has been corrected; gaps have been supplied; the long s has been restored. The TEI tagging has been checked and corrected by a LEMDO RA.
TEI_INP The text is being encoded in TEI.
TEI_proofed The text is finished in TEI and proofed.
published Files that have been published.
converted Files that have been converted
draft Files that are being drafted.
empty Files that are empty.
deprecated This document is no longer relevant, but is being preserved for archival purposes.

Note

Note: Files in /main/ should not have the status “empty” or “draft”. They all contain converted text. Eventually, these files might have the status published or deprecated.

Track Your Work

Rationale

We use the <change> element to track our work because it gives credit where credit is due and allows future encoders to see what has been done to the file before they continue work on it.

Practice: Start Your Work

When you begin encoding a file, add a <change> element as a child of the <revisionDesc> . Put the @who attribute, the @when attribute, and the @status attribute on the <change> element. These indicate who you are, the date you started remediating, and the status of the file. Add a statement for what you are doing in the file in the text node (e.g., began remediating file). For example:
<change who="pers:GALL2" when="2022-09-15" status="IML-TEI_INP">Began remediation toward publication.</change>

Practice: Complete Your Work

When you have completed working on your file, add a <change> element. Put important and relevant information about your file in the text node of this <change> element, such as what you have completed and what must still be done in it. This protocol ensures that another encoder will be able to efficiently continue work on your file if there is any further work that must be done on it. For example:
<change who="pers:GALL2" when="2022-12-13">Completed the feedback from JENS1 and finished up the file. There are no facsimile links because it is only on EEBO. EEBO has been acknowledged in the source desc.</change>

Special Case: Track Significant Writing or Encoding Tasks

You may add other <change> elements during the encoding process to indicate when you complete significant writing or encoding tasks. For example, you may want to make note of when you finished numbering <lb> elements in semi-diplomatic transcriptions. While it is optional to add <change> elements for completing significant writing or encoding tasks, it is good encoding practice to do so. You should always add a <change> element when you begin work on a file and when you finish work on it.
For example:
<change who="pers:GALL2" when="2022-06-06">Fixed the wlns following Hinmans rationale. This file should be ready for JENS1 to review.</change>

Encode Titles

Rationale

TEI allows the <title> element to appear in both the <teiHeader> (the metadata) and the <text> which might seem redundant, but actually allows you to make a helpful distinction between the title of your XML file and the title of the text that you want to give to the text captured in the XML file.
For primary texts, LEMDO aims to standardize filenaming practice while still allowing some editorial flexibility in play titles.
For born-digital texts, the title in the <titleStmt> is the title that will show up on the webpage.
Note that the title of the XML file, captured in the <titleStmt> section of the metadata, is distinct from the filename, which is generally truncated and must exactly match the xml:id of the file.

Practice

LEMDO uses the <title> element in several ways. Many types of files have titles:
born-digital documents captured in XML files
transcriptions of primary texts captured in XML files, which themselves have title pages (in the <front> element) and embedded titles:
titles on titlepages of primary texts transcribed and captured in XML files
titles on the first page of primary texts transcribed and captured in XML files
modernized primary texts, captured in XML files
edition pages, which aim to give a title to the entire edition (think of edition pages as a hybrid of the title page of a printed book and the table of contents)
anthologies, which aim to give a title to the entire anthology
You will find more information about titles for the various components of your edition in the relevant chapter for each component.

Overview

Title Type Title Location Parent Element Element Attribute Values Example Documentation
Born Digital TEI Header <titleStmt> <title> @type main Textual Introduction
Titles of Transcriptions TEI Header <titleStmt> <title> @type main Northward Ho, Quarto 1
Titles on Title Pages (semi-diplomatic transcription) front (inside the child <titlePage> ) <docTitle> <titlePart> @type main NORTH-VVARD HOE
Titles on Title Pages (M)1 front (inside <titlePage> ) <docTitle> <titlePart> @type main Northward Ho
Titles on First Page of Play (semi-diplomatic transcription) body n/a <label> type heading North-ward Hoe.
Titles of Files containing Modernized Primary Texts TEI Header <titleStmt> <title> @type main Northward Ho, Q1 Modern (or Northward Ho, Modern)
Titles of Modernized Primary Texts <front> <titleStmt> <title> @type main Northward Ho!
Titles of Editions TEI Header of edition page <titleStmt> <title> @type @main Northward Ho! A Digital Critical Edition
Titles of Anthologies (anth.xml page)2 Both TEI Headers3 <titleStmt> <title> n/a n/a Digital Renaissance Editions

Encode Split Elements

Rationale

Because of the hierarchical organization of elements, it is sometimes impossible to capture fragmented parts of a text within a single element. Thus, we need a way to connect these parts even when they are wrapped in separate elements. The linking attributes @next and @prev attributes are one way to point to discontinuous segments. Combined with the @xml:id attribute, they can be used to connect parts of the text that are separated because of the need to maintain the hierarchy of TEI elements.

Practice: Encode Split Elements

You will most commonly use the @next and @prev attributes to connect quotations that span multiple lines and letters that are interrupted by dialogue, but they are not limited to these usages.
When adding the @next and @prev attributes to a series of elements, you must give each element a value on the @xml:id attribute that is unique to the file it is in. The @xml:id value for each element includes the prefix emd, which stands for “early modern drama”; the abbreviation of the play; the name of the element that the @xml:id attribute is on; and a number unique to the file, all connected by underscores. By making each element in a series unique, we can assign them a place in the sequence in relation to the other unique elements.
After assigning @xml:id values to each element in the series of fragments you want to connect, add the @next and @prev attributes to the elements. We use the @next attribute to indicate, by assigning it a unique number, which element follows the one that the @next attribute is on. Similarly, the @prev attribute includes the unique number of the preceding line to indicate which element comes before the one that the @prev attribute is on.
Note that the first fragment in the sequence that you assign a @next attribute to will not have a @prev attribute because there is no previous fragment in the series to point to. Likewise, the last fragment in the series will have no @next attribute. All fragments between the first and last must have both the @next and @prev attributes.
See also Chapter 16: Linking, Segmentation, and Alignment in the TEI Guidelines.

Common Split Elements

This is a list of elements which most commonly need the @next and @prev attributes, with some scenarios in which they will be required:
<lg> : In modern texts, a character may interrupt a song or verse letter with prose. We want to indicate that the separate lines of the song are part of a larger whole of the song, so we must link the <lg> elements with @next and @prev.
<p> : Similarly, a character in a modern text may interrupt their reading of a prose letter, and we want to indicate that the lines of the letter are part of a larger whole.
<q> and <quote> : In modern texts, charactersʼ quotations may span multiple lines, although quotation elements cannot, so we need to link the separate quotation elements together.
See also Encode Letters and Songs in Modern Texts to learn how to encode fragmented sequences in letters and songs.

Special Case: Split Lines

See also the Shared Verse Lines.
Although split lines are also cases of fragmented pieces of the text being linked together through encoding, we do not use @next and @prev to link split lines. Since split lines are so common, we developed a unique encoding practice for them.
For split lines, we add the @part attribute and one of three values, I, M, or F, to the <l> element wrapping the split line. The I, M, and F values stand for initial, medial, and final respectively and indicate the lineʼs position in the sequence of split lines:
<div>
<!-- ... -->

  <sp who="#emdAYL_M_DukeFrederick"><!-- ... -->
    <l part="I">And get you from our court.</l>
  </sp>
  <sp who="#emdAYL_M_Rosalind">
    <speaker>Rosalind</speaker>
    <l part="M">Me, uncle?</l>
  </sp>
  <sp who="#emdAYL_M_DukeFrederick">
    <speaker>Duke Frederick</speaker>
    <l part="F">You, cousin.</l>
  </sp>
  <!-- ... -->
</div>

Quotations Spanning Multiple Lines

Sometimes a characterʼs quotation spans more than one line of verse, however, the hierarchical structure of XML means that quotation elements cannot span multiple <l> elements. In this case, you must use the @next and @prev attributes to connect the fragments of the quotation:
<lg>
<!-- ... -->

  <l>
    <q xml:id="emd2H4_M_q_16" next="#emd2H4_M_q_17">Happy am I that have a man so bold</q>
  </l>
  <l>
    <q xml:id="emd2H4_M_q_17" prev="#emd2H4_M_q_16" next="#emd2H4_M_q_18">That dares do justice on my proper son</q>
  </l>
  <l>
    <q xml:id="emd2H4_M_q_18" prev="#emd2H4_M_q_17" next="#emd2H4_M_q_19">And no less happy having such a son</q>
  </l>
  <l>
    <q xml:id="emd2H4_M_q_19" prev="#emd2H4_M_q_18" next="#emd2H4_M_q_20">That would deliver up his greatness so</q>
  </l>
  <l>
    <q xml:id="emd2H4_M_q_20" prev="#emd2H4_M_q_19">Into the hands of justice.</q> You did commit me,</l>
  <!-- ... -->
</lg>
<lg>
<!-- ... -->

  <l>
    <quote>Poor deer</quote>, quoth he, <quote xml:id="emdAYL_M_quote_3" next="#emdAYL_M_quote_4">thou makʼst a testament</quote>
  </l>
  <l>
    <quote xml:id="emdAYL_M_quote_4" prev="#emdAYL_M_quote_3" next="#emdAYL_M_quote_5">As worldlings do, giving thy sum of more</quote>
  </l>
  <l>
    <quote xml:id="emdAYL_M_quote_5" prev="#emdAYL_M_quote_4">To that which had too much</quote>. Then, being there alone,</l>
  <!-- ... -->
</lg>

Notes

1.Most anthologies do not require modernized title pages. Check with your anthology lead.
2.E.g., dre.xml, nise.xml, moms.xml
3.These files are rooted on <teiCorpus> and have two <teiHeader> elements.

Prosopography

Isabella Seales

Isabella Seales is a fourth year undergraduate completing her Bachelor of Arts in English at the University of Victoria. She has a special interest in Renaissance and Metaphysical Literature. She is assisting Dr. Jenstad with the MoEML Mayoral Shows anthology as part of the Undergraduate Student Research Award program.

Janelle Jenstad

Janelle Jenstad is a Professor of English at the University of Victoria, Director of The Map of Early Modern London, and Director of Linked Early Modern Drama Online. With Jennifer Roberts-Smith and Mark Kaethler, she co-edited Shakespeare’s Language in Digital Media: Old Words, New Tools (Routledge). She has edited John Stow’s A Survey of London (1598 text) for MoEML and is currently editing The Merchant of Venice (with Stephen Wittek) and Heywood’s 2 If You Know Not Me You Know Nobody for DRE. Her articles have appeared in Digital Humanities Quarterly, Elizabethan Theatre, Early Modern Literary Studies, Shakespeare Bulletin, Renaissance and Reformation, and The Journal of Medieval and Early Modern Studies. She contributed chapters to Approaches to Teaching Othello (MLA); Teaching Early Modern Literature from the Archives (MLA); Institutional Culture in Early Modern England (Brill); Shakespeare, Language, and the Stage (Arden); Performing Maternity in Early Modern England (Ashgate); New Directions in the Geohumanities (Routledge); Early Modern Studies and the Digital Turn (Iter); Placing Names: Enriching and Integrating Gazetteers (Indiana); Making Things and Drawing Boundaries (Minnesota); Rethinking Shakespeare Source Study: Audiences, Authors, and Digital Technologies (Routledge); and Civic Performance: Pageantry and Entertainments in Early Modern London (Routledge). For more details, see janellejenstad.com.

Joey Takeda

Joey Takeda is LEMDO’s Consulting Programmer and Designer, a role he assumed in 2020 after three years as the Lead Developer on LEMDO.

Mahayla Galliford

Research assistant, remediator, encoder, 2021–present. Mahayla Galliford is a fourth-year student in the English Honours and Humanities Scholars programs at the University of Victoria. She researches early modern drama and her Jamie Cassels Undergraduate Research Award project focused on approaches to encoding early modern stage directions.

Martin Holmes

Martin Holmes has worked as a developer in the UVicʼs Humanities Computing and Media Centre for over two decades, and has been involved with dozens of Digital Humanities projects. He has served on the TEI Technical Council and as Managing Editor of the Journal of the TEI. He took over from Joey Takeda as lead developer on LEMDO in 2020. He is a collaborator on the SSHRC Partnership Grant led by Janelle Jenstad.

Navarra Houldin

Project manager 2022–present. Textual remediator 2021–present. Navarra Houldin (they/them) completed their BA in History and Spanish at the University of Victoria in 2022. During their degree, they worked as a teaching assistant with the University of Victoriaʼs Department of Hispanic and Italian Studies. Their primary research was on gender and sexuality in early modern Europe and Latin America.

Nicole Vatcher

Technical Documentation Writer, 2020–2022. Nicole Vatcher completed her BA (Hons.) in English at the University of Victoria in 2021. Her primary research focus was womenʼs writing in the modernist period.

Tracey El Hajj

Junior Programmer 2019–2020. Research Associate 2020–2021. Tracey received her PhD from the Department of English at the University of Victoria in the field of Science and Technology Studies. Her research focuses on the algorhythmics of networked communications. She was a 2019–2020 President’s Fellow in Research-Enriched Teaching at UVic, where she taught an advanced course on Artificial Intelligence and Everyday Life. Tracey was also a member of the Map of Early Modern London team, between 2018 and 2021. Between 2020 and 2021, she was a fellow in residence at the Praxis Studio for Comparative Media Studies, where she investigated the relationships between artificial intelligence, creativity, health, and justice. As of July 2021, Tracey has moved into the alt-ac world for a term position, while also teaching in the English Department at the University of Victoria.

Metadata