This chapter of our documentation is still in beta. We welcome feedback, corrections,
and questions while we finalize the page in our 2024–2025 work cycle.
Introduction to General Encoding Guidelines
Rationale
Many encoding practices are common to all types of LEMDO XML files. Subsequent chapters
deal with particular types of files (modernized texts, annotations, etc), but you
will want to come back to this chapter on a regular basis to remind yourself of general
encoding principles and practices.
Word files from any era and especially from the early days of word processing contain
hidden characters that will break the LEMDO build. Do not copy and paste directly
from a Word file (a file with a .doc or a .docx extension) into an XML file. If you must copy content, put it in a text editor first.
All PCs have a text editor already installed. There are many text editors available.
Commons ones are Notepad (Windows OS), Text Editor (Linux Ubuntu OS), Atom, BBEdit,
and SimpleText (Mac OS). Check the file extension of the file from which are copying.
It must be .txt.
Always Open Oxygen Project File First
It is imperative that you begin every work session by opening the lemdo-all.xpr project file. This file contains important scripts that control the behaviour of
Oxygen and give you access to tools that we have built especially for LEMDO users.
The XPR file also ensures that you are validating your XML files against the schema
and Schematron that determine how you are supposed to encode your play; the schema
and Schematron catch your mistakes and prompt you to correct them. See LEMDO Oxygen Project.
Validate Your Files Before Committing
Any file that you commit to the repository must be valid. It must follow the rules
that are embodied in the schema and Schematron. An invalid file can break the build process. If you open the lemdo-all.xpr file first before
you open any other file, you can be sure that you are validating against the LEMDO
schema and Schematron. See Validate Files.
Name Files: Naming Conventions
This document explains LEMDO’s naming conventions. Note that the principles set out
in this document apply only to TEI/XML documents and related source materials; other
naming conventions are covered in LEMDO Programming Principles
When you create a new file for the project, even if it is just a word-processor file
with notes in it, remember to give it a descriptive file name, and do not use punctuation
or spaces in the file name. SVN can handle such characters in file and folder names,
but files with these types of names can cause problems. For example, instead of Fred’s notes on documentation.odt, use Fred_notes_on_documentation.odt.
Infrastructure IDs refer to IDs belonging in the following files, including the files
themselves: PERS1, BIBL1, ORGS1, TAXO1, and LOCA1. The convention that we follow requires
you to use unique IDs. These IDs must be structured as follows: XXXX#, where X is
a letter and # is a number consisting of one or more digits. The letters and numbers
are not arbitrary: use the first four meaningful letters of the title of the file;
if the file you are naming uses a particular 4-letter combination for the first time
in our database, the number will be 1. Otherwise, check the A–Z Index for the next
available number for this particular 4-letter combination. For example, when you create
a personography file, the first four letters are PERS. Given that this file was the
first instance of this letter combination, the number was 1: PERS1.xml. Always check
your ID against the A–Z Index, before finalizing it and committing. The A–Z Index
text file is available from the Resources menu on the lemdo-dev site.
Texts and Anthologies
When you add a new text to the repository, consult DRE’s list of play IDs, and look for the title of the play you are adding. You will notice that the convention
for plays is different than that of infrastructure files. The difference is due to
scholarly conventions for play naming. IDs for Shakespeare plays have been long standardized;
they are even listed in the Chicago Manual of Style and the MLA Handbook. DRE extends
the implied principles for naming Shakespeare plays to all other plays. If you cannot
find the title you are looking for, consult with the LEMDO Director, who will in turn
consult with the DRE Coordinating Editors.
Facsimiles and Performance
The conventions for facsimiles and performances include the ID of the text to which
they are connected. When you name a facsimile, follow this pattern: facs, followed by an underscore (_), followed by the document ID (xml:id), followed by
an underscore (_), followed by the library acronym from which the facsimile is taken.
If you are not sure about the latter, consult with the LEMDO Director. For example,
the facsimile corresponding to The Honest Whore, Part 1, Quarto 1 is named facs_1HWQ_Q1_Folger.
When you name a performance file, follow this pattern: perf, followed by an underscore (_),followed by the document ID (xml:id) of the text to
which the performance belongs, followed by an underscore (_), followed by the ID of
the anthology to which the performance belongs. For example, the files pertaining
to the QME production of Friar Bacon and Friar Bungay all begin with perf_FBFB_QME.
File Extensions
Never capitalize file extensions (i.e., .jpg, .png).
Create a Unique Value for an xml:id Attribute
LEMDO uses
@xml:id attributes across the project to give unique identifiers to various entities: people,
productions, sources, projects, organizations, characters, divisions in documents,
paragraphs in documents, and anchors in documents. We have thousands of
@xml:id attributes in the LEMDO project, and the value (id) of each one of them must be unique across the entire project. Duplicate ids will break the build.
For edition files and identifiers within edition files, we meet the need for uniqueness
by insisting that every
@xml:id value begin with emd followed the abbreviation of the play. For example, every xml:id created for the
edition of Famous Victories begins with emdFV.
For anthology pages, we meet the need for uniqueness by insisting that every
@xml:id value begins with the abbreviation for the anthology. For example, the About page for QME has the
@xml:id value of qme_about, which is also the name of the file (qme_about.xml). The about page for DRE has the
@xml:id value of dre_about. (Note that when we build your final anthology, we remove the qme_ and dre_ portion of the file names to keep your final, public-facing URLs lightweight.)
For anchors, we meet the need for uniqueness by insisting that every
@xml:id value begin with the full name of the file. Every anchor in the modernized text of
Famous Victories (emdFV_M) begins with emdFV_M_.
Practice: Create Unique IDs for Entities in Sitewide Databases
This section pertains mainly to LEMDO Team members at UVic: When you are creating an
@xml:id value for an entity in one of the sitewide database files (PERS1, PROS1, GLOSS1,
HAND1, BIBL1, PROD1, ORGS1, or TAXO1), you must check the complete list of LEMDO
@xml:id values to ensure that you are creating a new value. These ids are generally four
uppercase letters followed by a number (JENS1, SHAK1, ADAM1, LEMD3, and so on).
We have two ways of viewing all the
@xml:id values in the project (except for the anchor, speech, and paragraph ids). Note that
you must be on the Jenkins site looking at the most recent build. When you begin working
for LEMDO, we will give you the links to these pages.
Generated .txt list: This page loads in your browser and lists only the ids. It is
a lightweight page that loads quickly and completely. Because it does not give any
details about the entity, the page is useful mainly for determining whether or not
an ID has been used already. Search the page with a simple Ctrl + F.
Generated HTML page: This page loads in your browser as a table. The three columns
of the table list:
The ID with a hyperlink to the item
Item type
Title (if applicable—not all entities have titles)
This page is huge and slow the load. Some browsers will not completely load a table
as long as this one. Before you search the page, ensure that it has completely loaded.
The table loads alphabetically by default and the final entries will be ids beginning
with Z.
Assign the next available id. If you want to create an ID beginning with ADAM, search the page for the string ADAM. If your search shows that ADAM1, ADAM2, and ADAM3 have been used, then the next available ID is ADAM4.
Encode Document Status
Rationale
The status of a document determines a number of things:
The schema and schematron requirements that the document must meet to be valid
whether we can publish the document
what date we record as the date of publication
if LEMDO’s processing should add a label indicating that the document is peer reviewed
As a document moves through the stages of remediation, encoding, peer review, proofing,
and publishing, we track its movement via dated
<change>
elements in the
<revisionDesc>
. This information is also useful to the LEMDO team for tracking our progress.
Practice
Every encoded file should have a
<revisionDesc>
. This element should contain an
@status attribute that documents the current stage of the file’s encoding process.
LEMDO’s predefined document status values are listed in the table below.
Value of
@status
Description
prgGenerated
The document has been programmatically converted from IML to LEMDO TEI P5 via a series
of transformations. The file is a .xml file.
IML-TEI
There are stray IML tags in these texts that we retain until we have proofed the TEI.
If the file is a semi-diplomatic transcription, the IML may have been checked by an
ISE editor but LEMDO has not yet checked it.
IML-TEI_INP
The programmatic conversion is in the process of being carefully checked by a LEMDO
research assistant. The file is a .xml file.
IML-TEI_proofed
The programmatic conversion has been carefully checked by a LEMDO research assistant.
The file is a .xml file. The transcription has been carefully checked and corrected
by a LEMDO RA against an open-access digital surrogate.
TCP-TEI
The text has been programmatically converted from TCP TEI P4 to LEMDO TEI (P5) via
a series of transformations. The file is a .xml file. The transcription is only as
correct as the underlying TCP transcription (which contains gaps, errors, and normalized
long s characters). The TCP metadata is retained.
TCP-TEI_INP
The text has been programmatically converted from TCP TEI P4 to LEMDO TEI (P5) via
a series of transformations. The file is in the process of being carefully corrected
and proofed by a LEMDO Research Assistant. The file is a .xml file.
TCP-TEI_proofed
The text has been programmatically converted from TCP TEI P4 to LEMDO TEI (P5) via
a series of transformations and carefully corrected and proofed by a LEMDO Research Assistant. The file is a .xml
file. The transcription has been corrected; gaps have been supplied; the long s has
been restored. The TEI tagging has been checked and corrected by a LEMDO RA.
TEI_INP
The text is being encoded in TEI.
TEI_proofed
The text is finished in TEI and proofed.
published
Files that have been published.
converted
Files that have been converted
draft
Files that are being drafted.
empty
Files that are empty.
deprecated
This document is no longer relevant, but is being preserved for archival purposes.
Note
Note: Files in /main/ should not have the status “empty” or “draft”. They all contain
converted text. Eventually, these files might have the status published or deprecated.
Track Your Work
Rationale
We use the
<change>
element to track our work because it gives credit where credit is due and allows
future encoders to see what has been done to the file before they continue work on
it.
Practice: Start Your Work
When you begin encoding a file, add a
<change>
element as a child of the
<revisionDesc>
. Put the
@who attribute, the
@when attribute, and the
@status attribute on the
<change>
element. These indicate who you are, the date you started remediating, and the status
of the file. Add a statement for what you are doing in the file in the text node (e.g.,
began remediating file). For example:
When you have completed working on your file, add a
<change>
element. Put important and relevant information about your file in the text node
of this
<change>
element, such as what you have completed and what must still be done in it. This
protocol ensures that another encoder will be able to efficiently continue work on
your file if there is any further work that must be done on it. For example:
<change who="pers:GALL2" when="2022-12-13">Completed the feedback from JENS1 and finished up the file. There are no facsimile
links because it is only on EEBO. EEBO has been acknowledged in the source desc.</change>
Special Case: Track Significant Writing or Encoding Tasks
You may add other
<change>
elements during the encoding process to indicate when you complete significant writing
or encoding tasks. For example, you may want to make note of when you finished numbering
<lb>
elements in semi-diplomatic transcriptions. While it is optional to add
<change>
elements for completing significant writing or encoding tasks, it is good encoding
practice to do so. You should always add a
<change>
element when you begin work on a file and when you finish work on it.
For example:
<change who="pers:GALL2" when="2022-06-06">Fixed the wlns following Hinmans rationale. This file should be ready for JENS1 to
review.</change>
TEI allows the
<title>
element to appear in both the
<teiHeader>
(the metadata) and the
<text>
which might seem redundant, but actually allows you to make a helpful distinction
between the title of your XML file and the title of the text that you want to give
to the text captured in the XML file.
For primary texts, LEMDO aims to standardize filenaming practice while still allowing
some editorial flexibility in play titles.
For born-digital texts, the title in the
<titleStmt>
is the title that will show up on the webpage.
Note that the title of the XML file, captured in the
<titleStmt>
section of the metadata, is distinct from the filename, which is generally truncated and must exactly match the xml:id of the file.
Practice
LEMDO uses the
<title>
element in several ways. Many types of files have titles:
born-digital documents captured in XML files
transcriptions of primary texts captured in XML files, which themselves have title
pages (in the
<front>
element) and embedded titles:
titles on titlepages of primary texts transcribed and captured in XML files
titles on the first page of primary texts transcribed and captured in XML files
modernized primary texts, captured in XML files
edition pages, which aim to give a title to the entire edition (think of edition pages
as a hybrid of the title page of a printed book and the table of contents)
anthologies, which aim to give a title to the entire anthology
You will find more information about titles for the various components of your edition
in the relevant chapter for each component.
Overview
Title Type
Title Location
Parent Element
Element
Attribute
Values
Example
Documentation
Born Digital
TEI Header
<titleStmt>
<title>
@type
main
Textual Introduction
Titles of Transcriptions
TEI Header
<titleStmt>
<title>
@type
main
Northward Ho, Quarto 1
Titles on Title Pages (semi-diplomatic transcription)
Because of the hierarchical organization of elements, it is sometimes impossible to
capture fragmented parts of a text within a single element. Thus, we need a way to
connect these parts even when they are wrapped in separate elements. The linking attributes
@next and
@prev attributes are one way to point to discontinuous segments. Combined with the
@xml:id attribute, they can be used to connect parts of the text that are separated because
of the need to maintain the hierarchy of TEI elements.
Practice: Encode Split Elements
You will most commonly use the
@next and
@prev attributes to connect quotations that span multiple lines and letters that are interrupted
by dialogue, but they are not limited to these usages.
When adding the
@next and
@prev attributes to a series of elements, you must give each element a value on the
@xml:id attribute that is unique to the file it is in. The
@xml:id value for each element includes the prefix emd, which stands for “early modern drama”; the abbreviation of the play; the name of
the element that the
@xml:id attribute is on; and a number unique to the file, all connected by underscores. By
making each element in a series unique, we can assign them a place in the sequence
in relation to the other unique elements.
After assigning
@xml:id values to each element in the series of fragments you want to connect, add the
@next and
@prev attributes to the elements. We use the
@next attribute to indicate, by assigning it a unique number, which element follows the
one that the
@next attribute is on. Similarly, the
@prev attribute includes the unique number of the preceding line to indicate which element
comes before the one that the
@prev attribute is on.
Note that the first fragment in the sequence that you assign a
@next attribute to will not have a
@prev attribute because there is no previous fragment in the series to point to. Likewise,
the last fragment in the series will have no
@next attribute. All fragments between the first and last must have both the
@next and
@prev attributes.
This is a list of elements which most commonly need the
@next and
@prev attributes, with some scenarios in which they will be required:
<lg>
: In modernized texts, a character may interrupt a song or verse letter with prose.
We want to indicate that the separate lines of the song are part of a larger whole
of the song, so we must link the
<lg>
elements with
@next and
@prev.
<p>
: Similarly, a character in a modernized text may interrupt their reading of a prose
letter, and we want to indicate that the lines of the letter are part of a larger
whole.
<q>
and
<quote>
: In modernized texts, characters’ quotations may span multiple lines, although quotation
elements cannot, so we need to link the separate quotation elements together.
Although split lines are also cases of fragmented pieces of the text being linked
together through encoding, we do not use
@next and
@prev to link split lines. Since split lines are so common, we developed a unique encoding
practice for them.
For split lines, we add the
@part attribute and one of three values, I, M, or F, to the
<l>
element wrapping the split line. The I, M, and F values stand for initial,medial, and final respectively and indicate the line’s position in the sequence of split lines:
Sometimes a character’s quotation spans more than one line of verse, however, the
hierarchical structure of XML means that quotation elements cannot span multiple
<l>
elements. In this case, you must use the
@next and
@prev attributes to connect the fragments of the quotation:
<lg> <!-- ... -->
<l> <q xml:id="emd2H4_M_q_16" next="#emd2H4_M_q_17">Happy am I that have a man so bold</q> </l> <l> <q xml:id="emd2H4_M_q_17" prev="#emd2H4_M_q_16" next="#emd2H4_M_q_18">That dares do justice on my proper son</q> </l> <l> <q xml:id="emd2H4_M_q_18" prev="#emd2H4_M_q_17" next="#emd2H4_M_q_19">And no less happy having such a son</q> </l> <l> <q xml:id="emd2H4_M_q_19" prev="#emd2H4_M_q_18" next="#emd2H4_M_q_20">That would deliver up his greatness so</q> </l> <l> <q xml:id="emd2H4_M_q_20" prev="#emd2H4_M_q_19">Into the hands of justice.</q> You did commit me,</l> <!-- ... --> </lg>
<lg> <!-- ... -->
<l> <quote>Poor deer</quote>, quoth he, <quote xml:id="emdAYL_M_quote_3" next="#emdAYL_M_quote_4">thou mak’st a testament</quote> </l> <l> <quote xml:id="emdAYL_M_quote_4" prev="#emdAYL_M_quote_3" next="#emdAYL_M_quote_5">As worldlings do, giving thy sum of more</quote> </l> <l> <quote xml:id="emdAYL_M_quote_5" prev="#emdAYL_M_quote_4">To that which had too much</quote>. Then, being there alone,</l> <!-- ... --> </lg>
Editor Tools
LEMDO has created a few tools to make your encoding work easier. This documentation
will guide you through using our file templates and transformations. Another useful
tool (keyboard shortcuts) is documented in Keyboard Shortcuts and Special Characters.
You can use LEMDO’s file templates when creating new files for your edition. These
files are created and maintained by the LEMDO Team and provide you with metadata,
basic file structure, necessary elements, and helpful information and documentation
links for the type of file that you are creating. For example, our critical paratext
template gives the metadata required for critical paratexts, sample
<div>
and
<p>
elements, and sample block quotes (using the
<cit>
and
<quote>
elements).
To create a file using a template, follow these steps:
At the top of your Oxygen window, click File and then select New from the drop down menu.
In the window that pops up, scroll down to the Framework templates folder. Click on the LEMDO subfolder. This will show you a list of the templates that we have created.
Select the template that you wish to use.
At the bottom of the New file window, select Save as. If you know the pathway down which you wish to save your file, you can type it into
the available field (i.e., lemdo/data/texts/{your edition abbreviation}/{the appropriate folder}). Otherwise, click on the folder to the right of the text field and browse for the
correct directory. Name your file according to LEMDO’s naming conventions.
Click Create.
Follow the instructions outlined in your newly created file. We use XML comments liberally
in template files to provide you with instructions and helpful tips. You may delete
comments as you complete the tasks therein.
Transformations
In addition to making templates to create new files, LEMDO has written XSLTs (eXtensible
Stylesheet Language Transformations) to help you complete encoding tasks. Some are
designed to create a new file from an existing one (e.g., our transformation to create
a baseline modernized text from semi-diplomatic transcriptions), while some are simply
meant to complete repetitive tasks (e.g., our transformation to number
<lb>
elements with @type="wln" in semi-diplomatic transcriptions). Regardless, these transformations are meant to
save you time and effort so that you can focus on other editorial tasks.
Step-By-Step: Run Transformations on Your Files
Running transformations is generally fairly straightforward. Follow these steps:
In Oxygen’s project view, find the file that you wish to run a transformation on.
Right click that file.
Hover your mouse over Transform.
Select Transform with…
Scroll down the list to find the transformation that you are interested in. Select
that transformation.
Click Apply selected scenarios (1). If there is a number greater than 1 in the parentheses on that button, your file
likely has other associated transformations. Generally, we do not want this. Unselect
any transformations that you do not want to run before clicking to apply the selected
scenarios.
Open the file that you have run a transformation on. Check that the transformation
has worked.
Validate your file.
Commit your file.
Example: Number Lines Using a Transformation
This example will show the process for running a transformation. It will number
<lb>
elements with a
@type value of wln in the file emdH4_F1.xml.
The first step is to right click on the file in Oxygen’s project view:
Here, we want to transform emd1H4_F1.xml, which lives in data/texts/1H4/main.
Next, hover over Transform and select Transform with…:
Note that we generally do not need to configure transformation scenarios for specific
files. This will permanently associate a specific transformation with the file that
you are working on. Most of the time, we only need to use a transformation once on
a file and we do not want it to be associated with the file long-term as we do not
want to repeatedly apply the same transformation.
When you click Transform with…, a window will open allowing you to select the appropriate transformation:
In this case, we want to number line beginnings in a semi-diplomatic transcription,
so we will select lemdo_number_wlns_lb_in_semi-dip. If you are uncertain which transformation to use, or you want us to add a new transformation
to our list, please email lemdo@uvic.ca.
After clicking the Apply selected scenarios button, we open the file to check that the transformation has worked as expected:
The
<lb>
elements with @type="wln" now have consecutively numbered
@n attributes. The transformation has successfully worked as expected.
As always, the last step in Oxygen is to validate the file.
Notes
1.Most anthologies do not require modernized title pages. Check with your anthology
lead.↑
3.These files are rooted on
<teiCorpus>
and have two
<teiHeader>
elements.↑
Prosopography
Isabella Seales
Isabella Seales is a fourth year undergraduate completing her Bachelor of Arts in
English at the University of Victoria. She has a special interest in Renaissance and
Metaphysical Literature. She is assisting Dr. Jenstad with the MoEML Mayoral Shows
anthology as part of the Undergraduate Student Research Award program.
Janelle Jenstad
Janelle Jenstad is a Professor of English at the University of Victoria, Director
of The Map of Early Modern London, and Director of Linked Early Modern Drama Online. With Jennifer Roberts-Smith and Mark Kaethler, she co-edited Shakespeare’s Language in Digital Media: Old Words, New Tools (Routledge). She has edited John Stow’s A Survey of London (1598 text) for MoEML and is currently editing The Merchant of Venice (with Stephen Wittek) and Heywood’s 2 If You Know Not Me You Know Nobody for DRE. Her articles have appeared in Digital Humanities Quarterly, Elizabethan Theatre, Early Modern Literary Studies, Shakespeare Bulletin, Renaissance and Reformation, and The Journal of Medieval and Early Modern Studies. She contributed chapters to Approaches to Teaching Othello (MLA); Teaching Early Modern Literature from the Archives (MLA); Institutional Culture in Early Modern England (Brill); Shakespeare, Language, and the Stage (Arden); Performing Maternity in Early Modern England (Ashgate); New Directions in the Geohumanities (Routledge); Early Modern Studies and the Digital Turn (Iter); Placing Names: Enriching and Integrating Gazetteers (Indiana); Making Things and Drawing Boundaries (Minnesota); Rethinking Shakespeare Source Study: Audiences, Authors, and Digital Technologies (Routledge); and Civic Performance: Pageantry and Entertainments in Early Modern London (Routledge). For more details, see janellejenstad.com.
Joey Takeda
Joey Takeda is LEMDO’s Consulting Programmer and Designer, a role he assumed in 2020
after three years as the Lead Developer on LEMDO.
Mahayla Galliford
Project manager, 2025-present; research assistant, 2021-present. Mahayla Galliford
(she/her) graduated with a BA (Hons with distinction) from the University of Victoria
in 2024. Mahayla’s undergraduate research explored early modern stage directions and
civic water pageantry. Mahayla continues her studies through UVic’s English MA program
and her SSHRC-funded thesis project focuses on editing and encoding girls’ manuscripts,
specifically Lady Rachel Fane’s dramatic entertainments, in collaboration with LEMDO.
Martin Holmes
Martin Holmes has worked as a developer in the UVic’s Humanities Computing and Media
Centre for over two decades, and has been involved with dozens of Digital Humanities
projects. He has served on the TEI Technical Council and as Managing Editor of the
Journal of the TEI. He took over from Joey Takeda as lead developer on LEMDO in 2020.
He is a collaborator on the SSHRC Partnership Grant led by Janelle Jenstad.
Navarra Houldin
Training and Documentation Lead 2025–present. LEMDO project manager 2022–2025. Textual
remediator 2021–present. Navarra Houldin (they/them) completed their BA with a major
in history and minor in Spanish at the University of Victoria in 2022. Their primary
research was on gender and sexuality in early modern Europe and Latin America. They
are continuing their education through an MA program in Gender and Social Justice
Studies at the University of Alberta where they will specialize in Digital Humanities.
Nicole Vatcher
Technical Documentation Writer, 2020–2022. Nicole Vatcher completed her BA (Hons.)
in English at the University of Victoria in 2021. Her primary research focus was women’s
writing in the modernist period.
Tracey El Hajj
Junior Programmer 2019–2020. Research Associate 2020–2021. Tracey received her PhD
from the Department of English at the University of Victoria in the field of Science
and Technology Studies. Her research focuses on the algorhythmics of networked communications. She was a 2019–2020 President’s Fellow in Research-Enriched
Teaching at UVic, where she taught an advanced course on Artificial Intelligence and Everyday Life. Tracey was also a member of the Map of Early Modern London team, between 2018 and 2021. Between 2020 and 2021, she was a fellow in residence
at the Praxis Studio for Comparative Media Studies, where she investigated the relationships
between artificial intelligence, creativity, health, and justice. As of July 2021,
Tracey has moved into the alt-ac world for a term position, while also teaching in
the English Department at the University of Victoria.