Chapter 8. Validation and Diagnostics
This chapter of our documentation is still in beta. We welcome feedback, corrections,
and questions while we finalize the page in our 2024–2025 work cycle.
Introduction to Validation and Diagnostics
The documentation in this chapter is for all repo users. It is relevant for editors,
encoders, remediators, and anthology leads, containing foundational information about
ensuring your files are error-free and have functional links.
Rationale
The LEMDO TEI schema is supported by a Schematron file that enforces project-specific
rules that are not governed by TEI. For example, LEMDO requires curly apostrophes.
If you type a straight apostrophe, you will get a Schematron error as you encode to
remind you to use the curly apostrophe.
We also have robust diagnostics that can generate reports and flag potential problems.
Between LEMDO’s constrained schema, the Schematron, and our Diagnostics, you have
plenty of checks to help you get the encoding right and finish all the parts of your
file.
The pages in this chapter describe the permanent diagnostics. We often add temporary
diagnostics to help us clear up a pattern of errors before we write Schematron to
prevent those errors in the future.
Learning Outcomes
This chapter is designed to support you throughout the process of encoding files and
preparing them for publication. By the time you have worked through this chapter,
you will:
Know how to fix Schematron errors.
Know where to find and how to use our general diagnostics.
Know how to check diagnostics for your specific edition or anthology.
Know how to check links.
Contents
| Section | Description |
Schematron and Validation Errors |
Learn how to parse and fix validation errors in Oxygen |
LEMDO Diagnostics |
Learn about LEMDO’s general diagnostics and how to identify and fix diagnostic errors in your files |
Schematron and Validation Errors
Introduction
Schematron is the language that LEMDO uses to write rules specific to LEMDO’s encoding. Schematron, alongside our schema, ensures that encoding is consistent and correct throughout the LEMDO project. If
your encoding does not follow one of the Schematron rules that we have written, then
you will get a validation error. This will prompt you to go back and correct your encoding. It is important that
you fix validation errors as soon as you get them.
If you commit an invalid file, it will
break the build.This means that our Jenkins Continuous Integration Server is unable to finish
serving upa new version of the LEMDO-dev website. When the build is broken, nobody can see the work that they have recently committed rendered in HTML. If you inadvertently break the build, a member of the LEMDO team will contact you so that you can fix the error causing the build break.
If there is an error that is frequently occurring that is not currently prevented
by Schematron, we will write a new Schematron rule in the ODD file (lemdo.odd). You must
svn up regularly to ensure that you get any new Schematron rules that we add.Step-by-Step: Check Validity
Click the validation button at the top of your Oxygen window (it resembles a piece
of paper with a checkmark on it).
Check for the validation message at the bottom of your Oxygen window. It will say
either
Validation successfulor
Document contains errors.
If your validation is successful, you can either continue working or save and commit
your file.
If your validation is not successful, you must fix the error. Never commit an invalid
file.
For more detailed instructions for validating a file, see
Validate Files.
Practice: Fix Validation Errors
To fix a validation error, look at the error message at the bottom of your Oxygen
window. In most cases, we have written instructions for how to fix Schematron errors.
For example, if you have a straight apostrophe in your file, you will get an error
message that says:
ERROR: Straight apostrophes are not allowed in text. Use curly apostrophes instead. The shortcut to add a curly apostrophe is ctrl+shift+’ (on PC or Unix) and command+shift+’ (on iOS).
If you are unable to see the entire message because it is cut off, you can pull up
a window with the full message by double clicking on the message text.
If you are unable to fix the error yourself, contact the LEMDO team for help. Do not commit your file while it is invalid.
LEMDO Diagnostics
Rationale
Although many errors can be caught by Schematron in Oxygen, some errors are not. In
some cases, this is because Oxygen’s validator is incapable of checking for a specific
issue, as in the case of duplicate xml:ids occurring in different files. In other
cases, it is because we do not want the build to break over an error, typically because
there are too many instances of the error to easily fix, as in the case of
old TLN links.No matter the reason that they are not picked up as errors by Schematron, we can catch these issues using our general LEMDO diagnostics.
Practice: Check LEMDO Diagnostics
Navigate to LEMDO diagnostics from the LEMDO-dev site by clicking on the
Resourcestab in the top navigation bar and selecting
Diagnostics.This will bring you the the LEMDO Diagnostics page.
Diagnostics are under the automatically-open
Consistency Checkstab of the LEMDO Diagnostics Web page. Each diagnostic has its own collapsable tab. Those that do not currently find any errors across the LEMDO repository are coloured green and have the number zero in brackets beside the diagnostic name. Those that do find errors are coloured red and have the number of errors found by diagnostics in brackets beside the diagnostic name.
You can filter the diagnostics to only show errors from your edition by typing
emdfollowed by your edition abbreviation in the filter text box and clicking
Filter.For example, if you were working on the H5 edition, you would type
emdH5into the filter text box. You can also search for diagnostics in a specific file by typing the full file name into the filter text box.
For instructions for fixing diagnostic errors, see the relevant section below on the
type of error that you wish to fix.
In addition to the consistency checks, there is a statistics section of the diagnostics
Web page. The statistics include counts of files in the LEMDO repository, of total
xml:ids across the repo, and of the number of facsimile files that we have stored
on our facsimile server.
Files Containing Bad Facsimile Links Diagnostic
LEMDO stores facsimile images on an HCMC server. We create XML files in the facs folder the the LEMDO repo in order to encode the metadata for the images and to give
each image an xml:id. LEMDO editors and encoders can then point to facsimile images
from their semi-diplomatic transcription files. Because this linking process is relatively
complex, there are sometimes errors in linking from the LEMDO repo to the server containing
the facsimile images. This diagnostic catches these errors by finding links to images
that do not exist.
If you are working on facsimile files in the facs folder, you should regularly check this diagnostic to ensure you do not introduce
any errors.
If there is an error in this diagnostic (i.e., a non-existant facsimile file has been
linked to), you must correct the values for the
@url attributes on the
<graphic>
elements of your facsimile file. Follow the instructions in Encode Images in Facsimile Files.
If you cannot find the error, check the value of the
@url attribute against the URI of the facsimile images on the facsimile server. To navigate
to the facsimile server, click the Resourcestab on the top navigation bar of the LEMDO-dev website and select
Facsimiles.Click on the link for the copy that you are working with to be brought to a list of the relevant URIs.
Bad Internal Links Diagnostics
LEMDO has two diagnostics for bad internal links: an urgent diagnostic and a non-urgent
diagnostic. Both find broken internal links (i.e., links from a LEMDO file to an entity
within the LEMDO repo). The urgent diagnostic highlights errors in files with statuses
indicating they are close to publication. The non-urgent diagnostic finds internal
link errors in all other LEMDO files.
Under the bad internal link diagnostic, each file with one or more bad link is listed
along with the link that is broken. To fix the bad link, go to the file that contains
it and search for the bad link. Fix the link following the instructions for encoding
links as given in Chapter 5. Making Links.
Pointers to External Anchors Diagnostic
While using a pointer to link to an anchor in another edition is not forbidden, it
is not as stable as linking to structural entities. In most cases when linking from
one edition to another, you should link to structural elements with xml:ids (such
as acts, scenes, speeches, or paragraphs) rather than
<anchor>
elements which may be removed.To resolve this diagnostic, search in your file for the anchor ID that is linked to
and replace it with a link to a structural entity following the directions in
Encode Reference Links.
Missing Speaker Elements Diagnostic
All speeches in modernized texts should have a speech prefix encoded using the
<speaker>
element. This diagnostic finds speeches in modernized texts that do not have speech
prefixes. (Note that some speeches in semi-diplomatic transcriptions do not have speech
prefixes.)To resolve this diagnostic, add speech prefixes to your modernized text where they
are missing. See
Encode Speakers in Modernized Texts.
Texts Lacking Authors Diagnostic
All semi-diplomatic and modernized texts should have an author identified in their
metadata. This diagnostic identifies plays, shows, and poems that do not have a
<respStmt>
for an author in their
<titleStmt>
elements.To add an author to the metadata for your file, follow the instructions in
Encode Responsibility Statements.In cases where the work’s author is unknown, add a
<respStmt>
for the author and link to the Anonymousentry in PROS1 as follows:
<respStmt>
<resp ref="resp:aut">Author</resp>
<persName ref="pros:ANON1">Anonymous</persName>
</respStmt>
<resp ref="resp:aut">Author</resp>
<persName ref="pros:ANON1">Anonymous</persName>
</respStmt>
Silly Div Types Diagnostic
The
<div>
element has some
@type values that are expected by our processor. All other values are caught by this diagnostic.
In cases where the
@type value is not used by our processor or useful to their texts, remove the
@type value.Old TLN Links Diagnostic
Files that began as IML files that have not been completely remediated have links
to targets beginning with tln:. These correspond to the old
through line numbersused by the Internet Shakespeare Editions. Old TLN links will be removed during the remediation process. Remediators should delete links to TLNs once they have replaced them with functioning links to the LEMDO edition.
Bad Documentation Resp Pointers Diagnostic
We give credit to the people who have worked on documentation using the
@resp attribute on the root
<div>
element of documentation files. All
@resp values in documentation must link to a
<respStmt>
element in the ODD file (lemdo.odd) and must prefixed by or: (e.g., or:odd_JENS1_wtm).To fix this error, check that all
@resp values match a
<respStmt>
in the ODD file. If there is not a
<respStmt>
for the person that you are giving credit to, ask a member of the LEMDO team with
read/write permissions over the ODD file to add the
<respStmt>
. For more information, see Give Credit for Documentation Files.
Unlinked Documentation Files Diagnostic
LEMDO has a large collection of documentation that is only included in our
Encoding Guidelinesif they are linked to from the ODD file (lemdo.odd). We do not want to have lots of documentation files in the repo that are not linked to from the ODD file. If there is a documentation file in the repo that is not linked to from the ODD file, this diagnostic will flag it.
To clear this diagnostic, either link to documentation files from the ODD file or
move deprecated documentation files to the data/obsolete/oldDocumentation folder. To avoid this diagnostic error, we recommend writing content for new documentation
files as soon as you create them so that they are ready to add to the ODD file as
soon as they are added to the repo.
Links Using the Role: Prefix to Empty Roles Diagnostic
LEMDO allows links from apparatus texts and critical paratexts to characters in character
lists. To make these links, encoders use a
<ref>
element with a
@target attribute. The value of the
@target attribute must be prefaced by role: and must link to a
<person>
element in a character list.The intention of these links to characters in character lists is to provide additional
information or context about the character by linking to the place where their character
note is. There is no point in linking to a
<person>
element that does not have a child
<note>
element with information about the character. In those cases, this diagnostic flags
the redundant link.To clear this diagnostic, simply remove the link to the cast list.
Broken Link Chains Diagnostic
When we have split elements (e.g., a quote that spans multiple verse lines, so must
be split into multiple
<quote>
elements), we use the
@next and
@prev attributes to link to the other parts of the element. If the links do not correctly
go to an xml:id either before or after the element, this diagnostic will flag it.
For more information, see Encode Split Elements.
To resolve this diagnostic, check that the numbering is correct for each part of the
split element. Then, check that the value of each
@prev and
@next attribute links to an existing xml:id. Tags with Bad
@xml:lang Values
LEMDO tags foreign languages using the
@xml:lang attribute. There are a set of allowed values for the
@xml:lang attribute which the LEMDO team curates in our ODD file. Each value corresponds with
an IANA value for a specific language. For more information on encoding foreign languages, see
Encode Foreign Languages.
To resolve this diagnostic, ensure that the value you give the
@xml:lang attribute is one of the ones listed in the dropdown when you add the
@xml:lang attribute in Oxygen. You can also this list in a table in IANA Values for Specific Languages.If you are encoding text in a language that is not included in our allowed languages list, contact the LEMDO team to have the language added.
Duplicate Bibls Diagnostic
The LEMDO bibliography serves all the anthologies and editions therein. As a consquence,
there are thousands of entries spread across the 26 BIBL1 files (BIBL1_A, BIBL1_B, and so on), which are regularly updated and expanded by the LEMDO RAs. We have occasionally
created a duplicate entry. This diagnostic uses a similarity metric to check the BIBL1 files and flag
<bibl>
entries that appear to be similar. If you are an RA checking the general diagnostics
report, you’ll want to clear this diagnostic regularly so that you catch duplicates
shortly after the second one has been created.If two entries do refer to the same source, search the repository to see which xml:id
has been cited the most often in
<ref>
elements. If are checking Diagnostics regularly, you’ll normally find that the duplicate
id has been used just once or twice. Standardize the
<ref>
s so that they all point to the most-used xml:id. Delete the duplicate
<bibl>
.If the diagnostic has flagged two similiar entries that are not duplicates, we have a mechanism for telling the similarity metric to ignore pairs
(or trios) of entries. Add a
@corresp attribute to the
<bibl>
element of each one. The value of the
@corresp attribute is not: followed by the xml:id of the entry: e.g., not:CONN2. Note that the
@corresp can have multiple space-separated values.In the examples below, the similarity metric has flagged three editions contributed
by Francis X. Connor to the New Oxford Shakespeare. To each entry, we have added the
@corresp attribute to indicate that the entry is not a duplicate of either of the other two.
<bibl xml:id="CONN3" corresp="not:CONN2 not:CONN10">
<editor>Connor, Francis X.</editor>, ed. <title level="m">Lucrece</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford University Press</publisher>, <date>2016</date>. 673–721. WSB <idno type="WSB">aaag2304</idno>.</bibl>
<editor>Connor, Francis X.</editor>, ed. <title level="m">Lucrece</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford University Press</publisher>, <date>2016</date>. 673–721. WSB <idno type="WSB">aaag2304</idno>.</bibl>
<bibl xml:id="CONN10" corresp="not:CONN3 not:CONN2">
<editor>Connor, Francis X.</editor>, ed. <title level="m">The Tragedy of Coriolanus</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford University Press</publisher>, <date>2016</date>. 2723–2813. WSB <idno type="WSB">aaag2304</idno>.</bibl>
<editor>Connor, Francis X.</editor>, ed. <title level="m">The Tragedy of Coriolanus</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford University Press</publisher>, <date>2016</date>. 2723–2813. WSB <idno type="WSB">aaag2304</idno>.</bibl>
<bibl xml:id="CONN2" corresp="not:CONN3 not:CONN10">
<editor>Connor, Francis X.</editor>, ed. <title level="m">Venus and Adonis</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford University Press</publisher>, <date>2016</date>. 639–672. WSB <idno type="WSB">aaag2304</idno>.</bibl>
<editor>Connor, Francis X.</editor>, ed. <title level="m">Venus and Adonis</title>. By <author>William Shakespeare</author>. <title level="m">The New Oxford Shakespeare</title>. Ed. <editor>Gary Taylor</editor>, <editor>John Jowett</editor>, <editor>Terri Bourus</editor>, and <editor>Gabriel Egan</editor>. <pubPlace>Oxford</pubPlace>: <publisher>Oxford University Press</publisher>, <date>2016</date>. 639–672. WSB <idno type="WSB">aaag2304</idno>.</bibl>
Unknown Old IML Characaters Diagnostic
When files were converted from IML to TEI, some special characters were not transformed
into TEI. These characters were not recognized in the transformation, and so flag
a diagnostic for us to resolve as part of remediation.
To resolve this diagnostic, open the file that the old IML character is in and search
for it using Ctrl+F (Cmd+F on Mac). Check the transcription agains the facsimile of
the text and add the correct character. For information on encoding glyphs and ligatures
in TEI, see
Encode Glyphs and Ligatures in Semi-Diplomatic Transcriptions.
Files Containing TEI
<persName>
Elements without
@ref Attributes Diagnostic
LEMDO uses the
<persName>
element to identify people. In order to identify them, we put a
@ref attribute on
<persName>
linking to either PERS1 (for contributors to LEMDO) or PROS1 (for historical people).
Our processor cannot do anything with a
<persName>
element that has no
@ref attribute. This diagnostic finds instances of
<persName>
elements with no
@ref attribute.To resolve this diagnostic, add
@ref attributes to all
<persName>
elements. If the person is a LEMDO contributor, give
@ref a value of pers: followed by the person’s xml:id as given in PERS1. If the person is historical, give
@ref a value of pros: followed by the person’s xml:id as given in PROS1. If the person does not already
have an xml:id in PERS1 or PROS1, contact the LEMDO team to add one.LocalCit Pointers to Divs without Heads Diagnostic
LEMDO only uses the
<ptr>
element to link to acts, scenes, speeches, and stage directions (A.S.Sp. and A.S.SD.
in modernized texts and semi-diplomatic transcriptions or to
<div>
elements that have child
<head>
elements. In cases where a
<ptr>
link points to a
<div>
element, this diagnostic checks that the
<div>
has a child
<head>
. If it does not, it will flag a diagnostic error.To resolve this diagnostic, ensure that you are only using the
<ptr>
as allowed in the LEMDO project: use it only to link within your edition and only
use it to link to acts, scenes, speeches, stage directions, or
<div>
elements that have a
<head>
. If you have linked to a
<div>
without a
<head>
, add a
<head>
element. This will not only clear the diagnostic, but will also make the rendered
page more easily navigable and will add the
<div>
to the page’s table of contents.For information about when to use the
<ptr>
element, see Choose Linking Mechanisms.For information about making links with the
<ptr>
element, see Encode Pointer Links.
Prosopography
Anonymous
Janelle Jenstad
Janelle Jenstad is a Professor of English at the University of Victoria, Director
of The Map of Early Modern London, and Director of Linked Early Modern Drama Online. With Jennifer Roberts-Smith and Mark Beatrice Kaethler, she co-edited Shakespeare’s Language in Digital Media: Old Words, New Tools (Routledge). She has edited John Stow’s A Survey of London (1598 text) for MoEML and is currently editing The Merchant of Venice (with Stephen Wittek) and Heywood’s 2 If You Know Not Me You Know Nobody for DRE. Her articles have appeared in Digital Humanities Quarterly, Elizabethan Theatre, Early Modern Literary Studies, Shakespeare Bulletin, Renaissance and Reformation, and The Journal of Medieval and Early Modern Studies. She contributed chapters to Approaches to Teaching Othello (MLA); Teaching Early Modern Literature from the Archives (MLA); Institutional Culture in Early Modern England (Brill); Shakespeare, Language, and the Stage (Arden); Performing Maternity in Early Modern England (Ashgate); New Directions in the Geohumanities (Routledge); Early Modern Studies and the Digital Turn (Iter); Placing Names: Enriching and Integrating Gazetteers (Indiana); Making Things and Drawing Boundaries (Minnesota); Rethinking Shakespeare Source Study: Audiences, Authors, and Digital Technologies (Routledge); and Civic Performance: Pageantry and Entertainments in Early Modern London (Routledge). For more details, see janellejenstad.com.
Joey Takeda
Joey Takeda is LEMDO’s Consulting Programmer and Designer, a role he assumed in 2020
after three years as the Lead Developer on LEMDO.
Mahayla Galliford
Project manager, 2025-present; research assistant, 2021-present. Mahayla Galliford
(she/her) graduated with a BA (Hons with distinction) from the University of Victoria
in 2024. Mahayla’s undergraduate research explored early modern stage directions and
civic water pageantry. Mahayla continues her studies through UVic’s English MA program
and her SSHRC-funded thesis project focuses on editing and encoding girls’ manuscripts,
specifically Lady Rachel Fane’s dramatic entertainments, in collaboration with LEMDO.
Martin Holmes
Martin Holmes has worked as a developer in the UVic’s Humanities Computing and Media
Centre for over two decades, and has been involved with dozens of Digital Humanities
projects. He has served on the TEI Technical Council and as Managing Editor of the
Journal of the TEI. He took over from Joey Takeda as lead developer on LEMDO in 2020.
He is a collaborator on the SSHRC Partnership Grant led by Janelle Jenstad.
Navarra Houldin
Training and Documentation Lead 2025–present. LEMDO project manager 2022–2025. Textual
remediator 2021–present. Navarra Houldin (they/them) completed their BA with a major
in history and minor in Spanish at the University of Victoria in 2022. Their primary
research was on gender and sexuality in early modern Europe and Latin America. They
are continuing their education through an MA program in Gender and Social Justice
Studies at the University of Alberta where they will specialize in Digital Humanities.
Tracey El Hajj
Junior Programmer 2019–2020. Research Associate 2020–2021. Tracey received her PhD
from the Department of English at the University of Victoria in the field of Science
and Technology Studies. Her research focuses on the algorhythmics of networked communications. She was a 2019–2020 President’s Fellow in Research-Enriched
Teaching at UVic, where she taught an advanced course on
Artificial Intelligence and Everyday Life.Tracey was also a member of the Map of Early Modern London team, between 2018 and 2021. Between 2020 and 2021, she was a fellow in residence at the Praxis Studio for Comparative Media Studies, where she investigated the relationships between artificial intelligence, creativity, health, and justice. As of July 2021, Tracey has moved into the alt-ac world for a term position, while also teaching in the English Department at the University of Victoria.
Glossary
schema
“A schema is a set of rules governing the use of TEI elements in a particular project.
XML languages are all governed by a small set of shared principles; any document that
follows these principles, even if it makes up its own elements, is well-formed XML.
TEI is a formal language that is designed to comply with the principles of XML. TEI
offers many elements and attributes in its XML-compliant language. But most projects
still need to customize the TEI for their own purposes, by prescribing how and where TEI elements and attributes
are to be used, precluding some elements and attributes, making other elements and
attributes optional, making child elements required or optional, and defining allowed
and required values for attributes. The schema captures the project’s requirements,
prohibitions, and standards. We use a RelaxiNG schema at LEMDO. The main schema for
LEMDO is lemdo-all.rng (where the .rng file extension indicates the schema type).
The schema is responsible for generating the error messages in Oxygen when encoders
break one or more of the rules associated with it. (Read more about schemas in the
TEI Guidelines.)”
Schematron
“Schematron is an open-source language for ensuring that certain patterns are present
in XML documents. For example, it can insist upon certain spellings, enforce curly
apostrophes, and limit the use of elements to specific contexts. It is the feather dusterof an XML project. See An Overview of Schematron.”
Metadata
| Authority title | Chapter 8. Validation and Diagnostics |
| Type of text | Documentation |
| Publisher | Linked Early Modern Drama |
| Series | |
| Source | |
| Editorial declaration | |
| Edition | |
| Encoding description | |
| Document status | prgGenerated |
| License/availability |