LEMDO: LEMDO Diagnostics

LEMDO Diagnostics

Prior Reading

Schematron and Validation Errors

Rationale

Although many errors can be caught by the Schema and by Schematron and flagged for you in Oxygen, some errors cannot. Even when we can check an error via the Schema or via Schematron, we sometimes choose not to break the build over the error, typically because there are too many instances of the error to fix, such as old TLN links. Our normal workflow is to write a Diagnostic to catch and list these errors. Once we have cleared the Diagnostic report, we will write Schematron to prevent such errors occurring in the future and then retire the Diagnostic. In other cases, we use Diagnostics as a mechanical copyeditor to catch and enforce decisions that cannot be enforced by Schematron.

Practice: Check LEMDO Diagnostics

To navigate to LEMDO diagnostics from the LEMDO-dev site, click on the Resources tab in the top navigation bar and select Diagnostics. Your browser will open the LEMDO Diagnostics Web page.

Diagnostics are under the Consistency Checks tab of the LEMDO Diagnostics page; by default, the page opens with this tab expanded. Each diagnostic has its own collapsable tab. Each diagnostic check that does not currently find any errors across the LEMDO repository is coloured green and has the number zero in brackets beside the diagnostic name. Each diagnostic check that does find errors is coloured red and has the number of errors found by the diagnostic check in brackets beside the diagnostic name.

You can filter the diagnostics to show errors from just one edition by typing emd followed by your edition abbreviation in the filter text box and clicking Filter. For example, if you were working on the Henry V edition, you would type emdH5 into the filter box. You can also search for diagnostics in a specific file by typing the full file name into the filter box. For example, you can type emdHam_EM_collation to find errors in just that file.

For instructions for fixing errors flagged by the diagnostic checks, see the relevant section below on the type of error that you wish to fix.

In addition to the consistency checks, there is a statistics tab on the diagnostics Web page. This tab is closed by default, but you can click on the right-facing arrow to expand the statistics tab. The statistics include counts of files in the LEMDO repository, of total xml:ids across the repo, and of the number of facsimile files that we have stored on our facsimile server. If other statistics would be useful to you, please let us know. We will add them if we can.

Excluding a Portfolio or File from Diagnostics

Sometimes one portfolio or file can generate many errors in the diagnostics without any prospect of their being fixed in the near future, and all of those results may swamp and obscure important errors that need to be addressed quickly. In such cases, a programmer or project administrator can add a file id or a partial file id to this text file in the SVN repository: jenkins/diagnostics_exclusions.txt. Our diagnostics build process ignores files listed in this file and does not report errors. The LEMDO team will review the diagnostics_exclusions.txt file periodically to check that it contains only inactive files.

Files Containing Bad Facsimile Links Diagnostic

LEMDO stores facsimile images on an HCMC server. They are too big and too numerous to include in LEMDO’s Subversion repository. We create XML files in the facs folder the the LEMDO repo in order to encode the metadata for the images and to give each image an xml:id. LEMDO editors and encoders can then point to facsimile images from their semi-diplomatic transcription files. Because this linking process is relatively complex, there are sometimes errors in linking from files in the LEMDO SVN repo to the server containing the facsimile images. This diagnostic catches these errors by finding links to images that do not exist.

If you are working on facsimile files in the facs folder, you should regularly check this diagnostic to ensure you do not introduce any errors.

If there is an error in this diagnostic (i.e., a file has tried to link to a non-existant facsimile file), you must correct the values for the


                                       @url

attributes on the


                                          <graphic>

elements of your facsimile file. Follow the instructions in Encode Images in Facsimile Files.

If you cannot find the error, check the value of the


                                       @url

attribute against the URI of the facsimile images on the facsimile server. To navigate to the facsimile server, click the Resources tab on the top navigation bar of the LEMDO-dev website and select Facsimiles. Click on the link for the copy that you are working with to be brought to a list of the relevant URIs.

Bad Internal Links Diagnostics

LEMDO has two diagnostics for bad internal links: an urgent diagnostic and a non-urgent diagnostic. Both find broken internal links (i.e., links from a LEMDO file to an entity within the LEMDO repo). The urgent diagnostic highlights errors in files with status values indicating they are close to publication. The non-urgent diagnostic finds internal link errors in all other LEMDO files.

Under the bad internal link diagnostic, each file with one or more bad link is listed along with the link that is broken. To fix the bad link, go to the file that contains it and search for the bad link. Fix the link following the instructions for encoding links as given in Chapter 5. Making Links.

Pointers to External Anchors Diagnostic

Using a pointer to link to an anchor in another edition is not forbidden (yet), but it is not as stable as linking to structural entities.¹ When you make a link from your edition to another edition, you should use


                                          <ref>

elements to link to structural elements with xml:ids (such as acts, scenes, speeches, or paragraphs).

To resolve this diagnostic, search in your file for the anchor ID that is linked to and replace it with a link to a structural entity following the directions in Encode Reference Links.

Missing Speaker Elements Diagnostic

All speeches in modernized texts should have a speech prefix encoded with the


                                          <speaker>

element. This diagnostic finds speeches in modernized texts that do not have speech prefixes. (Note that some speeches in semi-diplomatic transcriptions do not have speech prefixes.)

To resolve this diagnostic, add speech prefixes to your modernized text where they are missing. See Encode Speakers in Modernized Texts. If you have a compelling reason not to add a speech prefix, discuss the matter with your anthology lead, who will in turn take up the matter with the LEMDO team.

Texts Lacking Authors Diagnostic

All semi-diplomatic and modernized texts should have an author identified in their metadata. This diagnostic identifies plays, shows, and poems that do not have a


                                          <respStmt>

for an author in their


                                          <titleStmt>

elements.

To add an author to the metadata for your file, follow the instructions in Encode Responsibility Statements. In cases where the work’s author is unknown, add a


                                          <respStmt>

for the author and link to the Anonymous entry in PROS1 as follows:

<respStmt>
<resp ref="resp:aut">Author</resp>
<persName ref="pros:ANON1">Anonymous</persName>
</respStmt>

Old TLN Links Diagnostic

Files that began as IML files that have not been completely remediated have links to targets beginning with "tln:". These correspond to the old through line numbers used by the Internet Shakespeare Editions. Old TLN links will be removed during the remediation process. Remediators should delete links to TLNs once they have replaced them with functioning links to the LEMDO edition.

Bad Documentation Resp Pointers Diagnostic

We give credit to the people who have worked on documentation using the


                                       @resp

attribute on the root


                                          <div>

element of documentation files. All


                                       @resp

values in documentation must link to a


                                          <respStmt>

element in the ODD file (lemdo.odd) and must prefixed by "or:" (e.g., "or:odd_JENS1_wtm").

To fix this error, check that all


                                       @resp

values match a


                                          <respStmt>

in the ODD file. If there is no


                                          <respStmt>

for the person who needs credit for a particular role, ask a member of the LEMDO team with read/write permissions over the ODD file to add the


                                          <respStmt>

. For more information, see Give Credit for Documentation Files.

Unlinked Documentation Files Diagnostic

LEMDO’s many documentation files are included in our Encoding Guidelines only if they are listed in the ODD file (lemdo.odd). We do not want to have documentation files in the repo that are not listed in the ODD file. If there is a documentation file in the repo that is not listed in the ODD file, this diagnostic will flag it.

To clear this diagnostic, either list the documentation file in the ODD file or move deprecated documentation files to the data/obsolete/oldDocumentation folder. To avoid this diagnostic error, we recommend writing content for new documentation files as soon as you create them so that they are ready to add to the ODD file as soon as they are added to the repo. Do not make XML files as placeholders for text you have not yet written.

Links Using the Role: Prefix to Empty Roles Diagnostic

LEMDO allow you to make links from your collation, annotations, and critical paratexts to characters in your character list(s). To make such links, encoders use a


                                          <ref>

element with a


                                       @target

attribute. The value of the


                                       @target

attribute must be prefaced by "role:" and must link to a


                                          <person>

element in a character list.

The intention of these links to characters in character lists is to provide additional information or context about the character by linking to the location of their character note. There is no point in linking to a


                                          <person>

element that does not have a child


                                          <note>

element because there will be no information about the character. In cases where a


                                          <person>

element does not have a child


                                          <note>

element, the diagnostic will flag the redundant link.

To clear this diagnostic, simply remove the link to the cast list. Alternatively, add a note to the character list.

Broken Link Chains Diagnostic

When we have split elements (e.g., a quote that spans multiple verse lines, so must be split into multiple


                                          <quote>

elements), we use the


                                       @next

and


                                       @prev

attributes to link to the other parts of the element. If the links do not correctly go to an xml:id either before or after the element, the diagnostic will flag it. For more information, see Encode Split Elements.

To resolve this diagnostic, check that the numbering is correct for each part of the split element. Then, check that the value of each


                                       @prev

and


                                       @next

attribute links to an existing xml:id.

Tags with Bad `@xml:lang` Values

LEMDO tags foreign languages using the


                                       @xml:lang

attribute and a set of allowed values listed in our ODD file. Each value corresponds to an IANA value for a specific language. For more information on encoding foreign languages, see Encode Foreign Languages.

To resolve this diagnostic, ensure that the value you give the


                                       @xml:lang

attribute is one of the ones listed in the dropdown when you add the


                                       @xml:lang

attribute in Oxygen. You can also view the language values in tabular form in IANA Values for Specific Languages. If you are encoding text in a language that is not included in our allowed languages list, contact the LEMDO team to have the language added.

Duplicate Bibls Diagnostic

The LEMDO bibliography serves all the anthologies and editions therein. As a consquence, there are thousands of entries spread across the 26 BIBL1 files (BIBL1_A, BIBL1_B, and so on), which are regularly updated and expanded by the LEMDO team. We have occasionally created a duplicate entry. This diagnostic uses a similarity metric to check the BIBL1 files and flag


                                          <bibl>

entries that appear to be similar. If you are an RA checking the general diagnostics report, you will want to clear this diagnostic regularly so that you catch duplicates shortly after the second one has been created.

If two entries do refer to the same source, search the repository to see which xml:id has been cited the most often in


                                          <ref>

elements. If you are checking Diagnostics regularly, you will normally find that the duplicate id has been used just once or twice. Standardize the


                                          <ref>

elements so that they all point to the most-used xml:id. Delete the duplicate


                                          <bibl>

entry.

If the diagnostic has flagged two similiar entries that are not duplicates, we have a mechanism for telling the similarity metric to ignore pairs (or trios) of entries. Add a


                                       @corresp

attribute to the


                                          <bibl>

element of each one. The value of the


                                       @corresp

attribute is "not:" followed by the xml:id of the entry: e.g., "not:CONN2". Note that the


                                       @corresp

can have multiple space-separated values.

In the examples below, the similarity metric has flagged three editions contributed by Francis X. Connor to the New Oxford Shakespeare. To each entry, we have added the


                                       @corresp

attribute to indicate that the entry is not a duplicate of either of the other two.

<bibl xml:id="CONN3" corresp="not:CONN2 not:CONN10">
<editor>Connor, Francis X.</editor>, ed. <title level="m">Lucrece</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <publisher>Oxford University Press</publisher>, <date>2016</date>. 673–721. WSB <idno type="WSB">aaag2304</idno>.</bibl>

<bibl xml:id="CONN10" corresp="not:CONN3 not:CONN2">
<editor>Connor, Francis X.</editor>, ed. <title level="m">The Tragedy of Coriolanus</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <publisher>Oxford University Press</publisher>, <date>2016</date>. 2723–2813. WSB <idno type="WSB">aaag2304</idno>.</bibl>

<bibl xml:id="CONN2" corresp="not:CONN3 not:CONN10">
<editor>Connor, Francis X.</editor>, ed. <title level="m">Venus and Adonis</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <publisher>Oxford University Press</publisher>, <date>2016</date>. 639–672. WSB <idno type="WSB">aaag2304</idno>.</bibl>

Unknown Old IML Characters Diagnostic

When files were converted from IML to TEI, some special characters were not transformed into TEI. These characters were not recognized in the transformation, and so we wrote a diagnostic to identify characters that need to be remediated manually.

To resolve this diagnostic, open the file containing the old IML character search for it using Ctrl+F (Cmd+F on Mac). Check the transcription agains the facsimile of the text and add the correct character. For information on encoding glyphs and ligatures in TEI, see Encode Glyphs and Ligatures in Semi-Diplomatic Transcriptions.

In 2026, we still have over 20,000 characters to remediate. A glorious day is coming when all IML files will be fully remediated, and then we will retire this diagnostic with fanfare and relief.

Files Containing TEI `<persName>` Elements without `@ref` Attributes Diagnostic

LEMDO uses the


                                          <persName>

element to tag people’s names in metadata and primary texts. In order to identify the person, we put a


                                       @ref

attribute on


                                          <persName>

linking to either PERS1 (for contributors to LEMDO) or PROS1 (for historical people). Our processor cannot do anything with a


                                          <persName>

element that has no


                                       @ref

attribute. This diagnostic finds instances of


                                          <persName>

elements with no


                                       @ref

attribute.

To resolve this diagnostic, add


                                       @ref

attributes to all


                                          <persName>

elements. If the person is a LEMDO contributor, give


                                       @ref

a value of "pers:" followed by the person’s xml:id as given in PERS1. If the person is historical, give


                                       @ref

a value of "pros:" followed by the person’s xml:id as given in PROS1. If the person does not already have an xml:id in PERS1 or PROS1, contact the LEMDO team to add one.

LocalCit Pointers to Divs without Heads Diagnostic

Within an edition, you may use the


                                          <ptr>

element to link to


                                          <div>

elements that have child


                                          <head>

elements. The text node of the head element is used to create a linkable string of text. This diagnostic checks that the


                                          <div>

has a child


                                          <head>

To resolve this diagnostic, ensure that you are using the


                                          <ptr>

only as allowed in the LEMDO project: use it only to link within your edition and only to link to acts, scenes, speeches, stage directions, or


                                          <div>

elements that have a


                                          <head>

. If you have linked to a


                                          <div>

without a


                                          <head>

, add a


                                          <head>

element. Adding a


                                          <head>

will not only clear the diagnostic, but will also add the


                                          <div>

to the page’s table of contents.

For information about when to use the


                                          <ptr>

element, see Choose Linking Mechanisms. For information about making links with the


                                          <ptr>

element, see Encode Pointer Links.

`<catRef>` Elements whose Target Does Not Match Their Scheme Diagnostic

LEMDO uses the


                                          <catRef>

element to specify important information about the nature of a file so that that our processor is able to treat the file according to its purpose, source, content, and editorial treatment. There are two attributes on the


                                          <catRef>

element:


                                       @scheme

and


                                       @target

. Each has a list of allowed values. Each of the


                                       @target

values belongs to one of the schemes. It is important for our processor that the value of


                                       @target

belong to the scheme given as the value of the


                                       @scheme

attribute. This diagnostic finds instances where the


                                       @target

value does not belong to the scheme indicated in the value of the


                                       @scheme

attribute.

To resolve this diagnostic, ensure that the values of


                                       @scheme

and


                                       @target

match. The values for the


                                       @target

attribute each begin with the initialism for the associated scheme. For example, the targets for the scheme "emdBookFormats" all begin with "lbf" (i.e., “LEMDO Book Formats”).

Backwards Annotation and Collation Spans Diagnostic

When linking annotations and collation to modernized texts, we link to anchors on either side of the span that we are annotating or collating. It is important that we link first to the anchor before the lemma and then to the anchor after the lemma. This diagnostic finds instances where anchors are linked to in the wrong order.

To resolve this diagnostic, open the file for your modernized text. Run edition diagnostics by clicking on the red play button at the top of the Oxygen window. Doing so will open a diagnostics page in your Web browser that includes the diagnostic Annotations and collations whose Pointers are in the wrong order. If any are found, go into your annotations or collation file to find and correct the order of links. LEMDO recommends running edition diagnostics regularly while you work on annotations and collation to avoid this issue.

For detailed instructions about running edition diagnostics, see Edition Diagnostics.

Possible Missing Spaces Diagnostic

In most cases,


                                          <anchor>

elements should have a space on one side when they are between two words. If there is not a space on either side of an


                                          <anchor>

between two words, the rendered page will run words together. This diagnostic flags instances where there is no space on either side of an


                                          <anchor>

element so that there are no missing spaces in your rendered edition.

To resolve this diagnostic, find the


                                          <anchor>

element and add a space if needed.

Other Resources

Martin Holmes and Joseph Takeda, Beyond Validation: Using Programmed Diagnostics to Learn About, Monitor, and Successfully Complete your DH Project, Digital Scholarship in the Humanities 34 (2019); DOI: 10.1093/llc/fqz011.

LEMDO YouTube video: Putting It All Together (Editorial)

LEMDO YouTube video: Releasing Your Anthology (Editorial)

LEMDO YouTube video: Releasing Your Anthology (Technical)

Notes

1.We have a tool that sweeps an edition for unused anchors before publication. If the tool does not find any pointers within the edition to a particular anchor, we will delete the anchor. A link to an anchor in another edition is therefore fragile.↑

Prosopography

Anonymous

Illya

Illya has a BA in English and Sociocultural Anthropology and an MA in English. Prior to joining the HCMC, he was a PhD candidate in English and Book History at the University of Toronto and worked on Records of Early English Drama and on the Modernist Archives Publishing Project. His work at the HCMC focuses on creating web-based applications for research projects led by members of the faculty of Humanities at the University of Victoria. This involves creating schemas for new and existing datasets, writing XSLT and build files to transform datasets into structured TEI and HTML formats, implementing staticSearch, and ensuring that new projects are Endings Principles compliant.

Janelle Jenstad

Janelle Jenstad is a Professor of English at the University of Victoria, Director of The Map of Early Modern London, and Director of Linked Early Modern Drama Online. With Jennifer Roberts-Smith and Mark Beatrice Kaethler, she co-edited Shakespeare’s Language in Digital Media: Old Words, New Tools (Routledge). She has edited John Stow’s A Survey of London (1598 text) for MoEML and is currently editing The Merchant of Venice (with Stephen Wittek) and Heywood’s 2 If You Know Not Me You Know Nobody for DRE. Her articles have appeared in Digital Humanities Quarterly, Elizabethan Theatre, Early Modern Literary Studies, Shakespeare Bulletin, Renaissance and Reformation, and The Journal of Medieval and Early Modern Studies. She contributed chapters to Approaches to Teaching Othello (MLA); Teaching Early Modern Literature from the Archives (MLA); Institutional Culture in Early Modern England (Brill); Shakespeare, Language, and the Stage (Arden); Performing Maternity in Early Modern England (Ashgate); New Directions in the Geohumanities (Routledge); Early Modern Studies and the Digital Turn (Iter); Placing Names: Enriching and Integrating Gazetteers (Indiana); Making Things and Drawing Boundaries (Minnesota); Rethinking Shakespeare Source Study: Audiences, Authors, and Digital Technologies (Routledge); and Civic Performance: Pageantry and Entertainments in Early Modern London (Routledge). For more details, see janellejenstad.com.

Joey Takeda

Joey Takeda is LEMDO’s Consulting Programmer and Designer, a role he assumed in 2020 after three years as the Lead Developer on LEMDO.

Mahayla Galliford

Project Manager, 2025-present; Assistant Project Manager, 2024-2025; Research Assistant, 2021-present. Mahayla Galliford (she/her) graduated from the University of Victoria with a BA (honours with distinction) in 2024, and an MA English in 2026. Mahayla’s undergraduate research explored early modern stage directions and civic water pageantry. Her SSHRC-funded MA thesis project focuses on transcribing, editing, and encoding early modern girls’ manuscripts, specifically Lady Rachel Fane’s May Masque in collaboration with LEMDO.

Martin Holmes

Martin Holmes has worked as a developer in the UVic’s Humanities Computing and Media Centre for over two decades, and has been involved with dozens of Digital Humanities projects. He has served on the TEI Technical Council and as Managing Editor of the Journal of the TEI. He took over from Joey Takeda as lead developer on LEMDO in 2020. He is a collaborator on the SSHRC Partnership Grant led by Janelle Jenstad.

Navarra Houldin

Training and Documentation Lead 2025–present. LEMDO project manager 2022–2025. Textual remediator 2021–present. Navarra Houldin (they/them) completed their BA with a major in history and minor in Spanish at the University of Victoria in 2022. Their primary research was on gender and sexuality in early modern Europe and Latin America. They are continuing their education through an MA program in Gender and Social Justice Studies at the University of Alberta where they will specialize in Digital Humanities.

Samuel Seaberg

Samuel Seaberg, a University of Victoria English undergrad, enjoys riding his bike. During the summer of 2025, he began working with LEMDO as a recipient of the Valerie Kuehne Undergraduate Research Award (VKURA). Unfortunately, due to his summer being spent primarily in working to establish an edition of Thomas Heywood’s If You Know Not Me, You Know Nobody, Part 2 and consequently working out how to represent multi-text works in a digital space, his bike has suffered severely of sheltered seclusion from the sun. Note: Samuel now works for LEMDO as the Assistant Project Manager, much to his bike’s chagrin.

Tracey El Hajj

Junior Programmer 2019–2020. Research Associate 2020–2021. Tracey received her PhD from the Department of English at the University of Victoria in the field of Science and Technology Studies. Her research focuses on the algorhythmics of networked communications. She was a 2019–2020 President’s Fellow in Research-Enriched Teaching at UVic, where she taught an advanced course on Artificial Intelligence and Everyday Life. Tracey was also a member of the Map of Early Modern London team, between 2018 and 2021. Between 2020 and 2021, she was a fellow in residence at the Praxis Studio for Comparative Media Studies, where she investigated the relationships between artificial intelligence, creativity, health, and justice. As of July 2021, Tracey has moved into the alt-ac world for a term position, while also teaching in the English Department at the University of Victoria.

Orgography

LEMDO Team (LEMD1)

The LEMDO Team is based at the University of Victoria and normally comprises the project director, the lead developer, project manager, junior developers(s), remediators, encoders, and remediating editors.

Metadata

Authority title	LEMDO Diagnostics
Type of text	Documentation
Publisher	University of Victoria on the Linked Early Modern Drama Online Platform
Series	Linked Early Modern Drama Online
Source	TEI Customization created by Martin Holmes, Joey Takeda, and Janelle Jenstad; documentation written by members of the LEMDO Team
Editorial declaration	n/a
Edition	Released with Linked Early Modern Drama Online 1.0
Encoding description	Encoded in TEI P5 according to the LEMDO Customization and Encoding Guidelines
Document status	prgGenerated
Funder(s)	Social Sciences and Humanities Research Council of Canada
License/availability	This file is licensed under a CC BY-NC_ND 4.0 license, which means that it is freely downloadable without permission under the following conditions: (1) credit must be given to the author and LEMDO in any subsequent use of the files and/or data; (2) the content cannot be adapted or repurposed (except in quotations for the purposes of academic review and citation); and (3) commercial uses are not permitted without the knowledge and consent of the editor and LEMDO. This license allows for pedagogical use of the documentation in the classroom.