This document offers some instructions on how to convert an old IML file into TEI.
Note that this does not give a full explanation of how these processes work; instead, this file gives a summary of how a file can be converted
into TEI. However, since the encoding of the source files can differ, the conversion
is, in a sense, non-deterministic: each file, edition, or set of files may require
editing—either in the source file itself or in the transformation files—in order to
get the desired results. Note, as well, that the conversion is not meant to make perfect
output TEI; instead, it is meant to give a workable copy of the file that is valid
against the full TEI schema and must then be evaluated by a (human) XML editor to
check for any inconsistencies or problems.
Types of conversion
There are four different types of documents that are converted from their respective
formats into TEI:
The playtexts (IML)
The apparatus, including the collations and the annotations (IML-XML)
The critical paratexts (XWiki)
The supplementary paratexts (XWiki with some in IML)
However, it often necessary to convert more than one of these documents at a time
(e.g. the IML file and its associated critical paratexts). The instructions below
detail how to convert an individual file, an edition (a set of files associated with a work), and a set of files or a collection.
Requirements
The LEMDO repository
If converting more than one document, then the old ISE SVN repository.
The IML to TEI conversion is set up in modules, which call each other. It can be represented
like so:
Flowchart representing the SGML conversion.
What this means is that you can convert the following to TEI:
A single IML playtext (buildSingleFile.xml)
A collection of IML playtexts (buildSgml.xml)
One or more apparatus files (buildApparatus.xml)
A collection of critical paratexts (buildXWiki.xml)
An entire edition, which includes all playtexts, apparatus, and critical paratext documents (buildEverything.xml)
Each step of this process is described in detail below.
Converting a Single File
Oxygen
Open code/conversion/sgml/buildSingleFile.xml
Press the Apply Transformation Scenario button.
Oxygen will ask you to provide two properties:
The full path to the input IML file on your system (for example, /home/mholmes/lemdo/folder/doc_AYL_M.txt)
The work identifier (e.g. AYL, Leir, 1H4).
Terminal
Change directories into the project directory: cd path/to/lemdo
Call the ant transformation using the ant command and supply the lib, thisWork, and sgml.file properties: ant -f code/conversion/sgml/buildSingleFile.xml -lib lib -DthisWork=AYL -Dsgml.file=/the/path/to/eg/doc_AYL_M.txt
Apparatus
There is currently no way to create a single apparatus file.
XWiki
There is currently no way to create a single XWiki file.
Converting a Work
In the IML, documents were organized by work, which referred to the abstract idea of a text. For example, the work Hamlet refers to the material books of Hamlet, the scanned facsimiles, the edited texts, a production of the work, source materials,
and adaptations. For the purposes of the conversion, the concept of a work helps to provide a set of files for conversion. For example, converting the IML for
the work of As You Like It means to convert all of the IML files related to As You Like It (e.g. the folio, quarto, and modern files).
Note that converting a work requires the files to exist in the pre-existing ISE Subversion
repository, and for that repository to be checked out on your computer. If the files
for that work do not exist in the repository (i.e. it is a work whose files were not
added to the ISE repository before the move to TEI), then you will need to convert
each file individually.2
Oxygen
Before you start, make sure you know the full path to the ISE repository checkout
on your computer. For example, it may be something like /home/you/ise2. Inside that directory, you’ll need to find the content folder, which will be somewhere
like /home/you/ise2/trunk/eXist/db/apps/iseapp/content. You will need to supply this path to the transformation.
Open lemdo/code/conversion/sgml/buildSgml.xml
Press the Apply Transformation Scenario button.
When Oxygen asks, supply the identifier for the work you want to convert, and the
path to the content folder.
Terminal
Change directories into the project directory: cd path/to/lemdo
Call the ant transformation using the ant command and supply the lib and thisWork properties: ant -f code/conversion/sgml/buildSgml.xml -lib lib -DthisWork=AYL -Dcontent.path=/path/to/ise/content
Converting a Collection
Oxygen
Before you start, make sure you know the full path to the ISE repository checkout
on your computer. For example, it may be something like /home/you/ise2. Inside that directory, you’ll need to find the content folder, which will be somewhere
like /home/you/ise2/trunk/eXist/db/apps/iseapp/content. You will need to supply this path to the transformation.
Open lemdo/code/conversion/buildEverything.xml
Identify the set of work identifiers you would like to convert. You will need to supply
these in comma-separated form (for example, AYL,1H4).
Press the Apply Transformation Scenario button.
Supply the parameter values when Oxygen requests them.
Terminal
Change directories into the project directory: cd path/to/lemdo
Call the ant transformation using the ant command and supply the lib, worksToBuild, and content.path properties: ant -f code/conversion/buildEverything.xml -lib lib -DthisWork=AYL,MV,H5 -Dcontent.path=/path/to/ise/content
Post-Conversion
If everything goes okay, then the result files should be placed in the location code/out/{$thisWork}/main/emd{$thisWork}_M.xml. The files should be valid TEI, but they are not necessarily valid LEMDO TEI. Since
the IML is differently structured than standard TEI, there are often errors in the
document that need to be resolved by hand before they can go into the LEMDO repository.
Open the file in Oxygen and check the file’s validity. If the file is valid, then
you can move the file into its proper place within data/texts/ (remember to svn add it to the LEMDO repository). If it is invalid, then you will need to resolve the
invalidities. In all cases, if you are unsure as to how best to fix the file, consult
with the Coordinating Editor. Mostly, the invalidities are as follows:
Schematron Error
Cause
Possible Solutions
Modern verse lines should be capitalized
This is an issue with how the flat structure of lineation in the IML is converted
to the TEI. Occasionally, editor line beginnings (the IML’s <ln> element) were added to the middle of lines accidently.
This usually involves moving words into the preceding or following line, so it must
be done with caution.
Do not use square brackets for editorially supplied segments
The practice in the IML files was to place editorial additions in square brackets;
this error should only be raised in modernized texts almost exclusively within act
or scene headings as well as stage directions.
Any text that is contained with a square bracket should be replaced with a TEI
<supplied>
tag.
Do not tag stage directions as verse lines
This occurs when stage directions are tagged as a
<l>
element, usually with no surrounding context.
In most cases, the wrapping
<l>
can be removed.
Don’t use explicit angle brackets in text. If you want to reference an element, use
the gi element or the code element.
Angle brackets (i.e. < and >) primarily occur only in instances where an IML file had incorrectly added an additional
angle bracket to a tag (something like <</L>).
These can almost certainly be removed in the texts; however, it might also indicate
that an element was incorrectly typed in the source and has thus been dropped from
the TEI encoding. Instances where these symbols occur must be checked against the
original IML file to ensure that nothing has been lost in the conversion.
Use the em-dash character (—), not double hyphens or en-dashes.
The IML Editorial Guidelines allowed for the use of double-hyphens or en-dashes in
modernized texts.
As per the DRE Editorial Guidelines, all double dashes should be converted to em-dashes
in modernized texts.
This
<l>
has part value I but no following M or F.3
This happens when part verse lines have been tagged as an initial line, but has no
medial or final line. Usually, this is a case where the medial or final line (i.e.
@part="F") has been erroneously omitted in the encoding.
Usually the next line is a medial or final line, and thus you can add a
@part="F" to the following line, but you may need to consult with the Coordinating Editor to
determine the proper solution.
Troubleshooting
While the IML to TEI conversion is fairly robust, IML files can differ in subtle ways
that may cause the build to fail. Below are some common errors that might occur and
the steps that a programmer or encoder can take to resolve any issues.
Problem
Cause
Possible Solution
The build broke because the source file was invalid
Usually this means that there was something wrong in the source file itself. Sometimes
the IML files are missing a closing <L> tag or an incorrectly nested <SP>.
Investigate the source file by cross-referencing the TLN where the invalidity occurs
with the source IML file. If it is clear that it is a simple wrapping error, then
resolve the problematic tagging; otherwise, consult with the editor.
The build says that /db/apps/iseapp/content/documents/iml does not exist
The build cannot find the ISE Subversion repository.
First check that you have a local copy of the ISE2 repository. If you do, then check
that the path to the SVN repository as declared in the ANT property content.path correctly points to your copy. If it does, then check whether or not your local file
structure is different (i.e. you have checked out only /documents/iml rather than
the entire repository).
Notes
1.Note that Oxygen, Ant, Ant-contrib, and OSX should be available on all HCMC machines.
The easiest way to install these on a Mac is to use Homebrew, which is a command line open-source package manager. The packages for ant and ant-contrib
are their names (i.e. brew install ant and brew install ant-contrib; OSX is part of open-sp: brew install open-sp.↑
2.A simple bash script would probably do the trick: for s in sgmlFiles; do ant -lib lib code/conversion/buildSingleFile.xml -DthisWork=work
-Dsgml.file=$s; done.↑
3.Note that there are a few variations of this error (e.g. This
<l>
has a part M but no following F or preceding I), which all require similar a similar approach.↑
Prosopography
Janelle Jenstad
Janelle Jenstad is a Professor of English at the University of Victoria, Director
of The Map of Early Modern London, and Director of Linked Early Modern Drama Online. With Jennifer Roberts-Smith and Mark Kaethler, she co-edited Shakespeare’s Language in Digital Media: Old Words, New Tools (Routledge). She has edited John Stow’s A Survey of London (1598 text) for MoEML and is currently editing The Merchant of Venice (with Stephen Wittek) and Heywood’s 2 If You Know Not Me You Know Nobody for DRE. Her articles have appeared in Digital Humanities Quarterly, Elizabethan Theatre, Early Modern Literary Studies, Shakespeare Bulletin, Renaissance and Reformation, and The Journal of Medieval and Early Modern Studies. She contributed chapters to Approaches to Teaching Othello (MLA); Teaching Early Modern Literature from the Archives (MLA); Institutional Culture in Early Modern England (Brill); Shakespeare, Language, and the Stage (Arden); Performing Maternity in Early Modern England (Ashgate); New Directions in the Geohumanities (Routledge); Early Modern Studies and the Digital Turn (Iter); Placing Names: Enriching and Integrating Gazetteers (Indiana); Making Things and Drawing Boundaries (Minnesota); Rethinking Shakespeare Source Study: Audiences, Authors, and Digital Technologies (Routledge); and Civic Performance: Pageantry and Entertainments in Early Modern London (Routledge). For more details, see janellejenstad.com.
Joey Takeda
Joey Takeda is LEMDO’s Consulting Programmer and Designer, a role he assumed in 2020
after three years as the Lead Developer on LEMDO.
Mahayla Galliford
Project manager, 2025-present; research assistant, 2021-present. Mahayla Galliford
(she/her) graduated with a BA (Hons with distinction) from the University of Victoria
in 2024. Mahayla’s undergraduate research explored early modern stage directions and
civic water pageantry. Mahayla continues her studies through UVic’s English MA program
and her SSHRC-funded thesis project focuses on editing and encoding girls’ manuscripts,
specifically Lady Rachel Fane’s dramatic entertainments, in collaboration with LEMDO.
Martin Holmes
Martin Holmes has worked as a developer in the UVic’s Humanities Computing and Media
Centre for over two decades, and has been involved with dozens of Digital Humanities
projects. He has served on the TEI Technical Council and as Managing Editor of the
Journal of the TEI. He took over from Joey Takeda as lead developer on LEMDO in 2020.
He is a collaborator on the SSHRC Partnership Grant led by Janelle Jenstad.
Navarra Houldin
Training and Documentation Lead 2025–present. LEMDO project manager 2022–2025. Textual
remediator 2021–present. Navarra Houldin (they/them) completed their BA with a major
in history and minor in Spanish at the University of Victoria in 2022. Their primary
research was on gender and sexuality in early modern Europe and Latin America. They
are continuing their education through an MA program in Gender and Social Justice
Studies at the University of Alberta where they will specialize in Digital Humanities.
Tracey El Hajj
Junior Programmer 2019–2020. Research Associate 2020–2021. Tracey received her PhD
from the Department of English at the University of Victoria in the field of Science
and Technology Studies. Her research focuses on the algorhythmics of networked communications. She was a 2019–2020 President’s Fellow in Research-Enriched
Teaching at UVic, where she taught an advanced course on Artificial Intelligence and Everyday Life. Tracey was also a member of the Map of Early Modern London team, between 2018 and 2021. Between 2020 and 2021, she was a fellow in residence
at the Praxis Studio for Comparative Media Studies, where she investigated the relationships
between artificial intelligence, creativity, health, and justice. As of July 2021,
Tracey has moved into the alt-ac world for a term position, while also teaching in
the English Department at the University of Victoria.
Orgography
LEMDO Team (LEMD1)
The LEMDO Team is based at the University of Victoria and normally comprises the project
director, the lead developer, project manager, junior developers(s), remediators,
encoders, and remediating editors.
Metadata
Authority title
Convert IML to LEMDO TEI
Type of text
Documentation
Publisher
University of Victoria on the Linked Early Modern Drama Online Platform
Released with Linked Early Modern Drama Online 1.0
Encoding description
Encoded in TEI P5 according to the LEMDO Customization and Encoding Guidelines
Document status
prgGenerated
Funder(s)
Social Sciences and Humanities Research Council of Canada
License/availability
This file is licensed under a CC BY-NC_ND 4.0 license, which means that it is freely downloadable without permission under the following
conditions: (1) credit must be given to the author and LEMDO in any subsequent use
of the files and/or data; (2) the content cannot be adapted or repurposed (except
in quotations for the purposes of academic review and citation); and (3) commercial
uses are not permitted without the knowledge and consent of the editor and LEMDO.
This license allows for pedagogical use of the documentation in the classroom.