Schedule for Saturday
Session details
Tutorial: Invisible XML Case Studies
Invisible XML is a language and process for automatically adding markup to textual documents. It has had a stable specification since 2022, and there are now a half dozen implementations.
Like any language, using it means learning techniques and idioms in the language, and how to approach particular problems. This tutorial presents a number of case studies of ixml grammars, and will illustrate such topics as ambiguity, non-context-free languages, dealing with spacing, permissive and strict grammars, designing for existing markup formats, and the design of the ixml grammar for ixml itself.
The materials will be available after the tutorial for self-study.
XQuery 4 Tutorial
Are you curious what the next version of XQuery has to offer? This tutorial will cover the new concepts, data structures, language features and functions that were added in version 4.0 of the specification. It will also explain in short how to find and navigate the new specification, its history and the status quo of both specification efforts and adoption.
There will be a number of exercises that can be solved by participants in interactive notebooks for all features that are already implemented in the XQuery processor in use.
Building an open, accessible digital reading ecosystem: inside EDRLab’s work
Gautier Chomel
The European Digital Reading Lab (EDRLab) is a non‑profit membership association dedicated to an open, interoperable and accessible digital publishing ecosystem for text, audio and image. Over the last decade,EDRLab has become a key technical actor behind the evolution of the EPUB standard, the development of Readium reading toolkits, the Thorium Reader reading application, and the Readium LCP interoperable DRM used by public libraries and trade publishers worldwide.
This talk will give a guided tour of EDRLab’s activities, focusing on how XML technologies are core to producing born‑accessible ebooks. It will cover:
- The importance of XML syntax in production and validation pipelines for the publishing industry.
- EDRLab’s role in the evolution and implementation of EPUB 3 within W3C and industry groups.
- Current R&D efforts around accessibility, audio, and integration with
library and education platforms.
The session targets developers, solution architects, and production experts who work with XML and want to understand how these technologies empower the digital publishing industry today and tomorrow.
Gautier Chomel works at EDRLab on open-source reading technologies and EPUB accessibility, collaborating with W3C publishing and accessibility groups. With a strong background in publishing workflows, Gautier regularly speaks about practical, standards-based solutions in the publishing industry.
XProc-Baseline: Designing a Portable Regression Testing Library for XProc Pipelines
Tomos Hillman
Real-world XProc pipelines do far more than transform data. They orchestrate file operations, create deliverables, manage archives, and produce outputs with non-deterministic content. This paper identifies the need for an automated regression testing framework for XProc pipelines and file-based workflows, in addition to complementary testing libraries such as XSpec.
We discuss the core challenges of testing file and archive management, configurable canonicalization of non-deterministic content, and integration with modern CI/CD platforms. We present our approach to building XProc-Baseline: a portable, reusable library that addresses the gaps in current XML testing tools by providing configurable canonicalization, manifest-based comparison, and seamless CI/CD integration.
Multiplatform Publishing With The Atom Syndication Format 30 publishing
Schimon Jehudah
TBD
Enabling AI Across the XML Technologies via XPath Functions
George Bina
XPath is the shared expression language that underlies XSLT, XProc, XQuery, Schematron, and virtually every other XML technology. XPath functions are a natural integration point for AI in the XML world. Adding a few XPath functions, and large language models (LLMs) become available across all XML technologies with no per-technology integration work required. We present the idea, explore its implications for real-world XML workflows, and demonstrate it with working examples: hybrid XProc pipelines that orchestrate AI and conventional steps together, Schematron rules paired with AI-powered quick-fixes, cost-aware XSLT and XQuery refactoring scripts that invoke AI only where it is needed, and XSpec test suites that treat AI prompts as testable units.
Comparative study of “PDF to AI format” converters
Elena Montero Maousidou
This talk presents a systematic benchmark study evaluating current approaches for converting PDF documents into structured formats such as XML and Markdown. A specially designed benchmark document simulates real-world publishing scenarios, including complex structures like chapters, footnotes,bibliographies, tables, and mathematical content. The study compares different tools and pipelines with respect to structural accuracy, completeness, and semantic quality.
Jewels in Plain Sight
Eamonn Neylon
The experience of using a large language model to help build a conformant web-based validator for the Character Repertoire Description Language, an accompanying library of over 80 character repertoire schemas, and a Schematron-based quality tool is described and reflected on. Consideration is given to the value of precise formal specifications for code generation. The potential for AI-assisted tooling to surface defects in standards under development, and the changed economics that make niche technical work newly viable are also discussed.
Excel to XML for Financial Report
Tony Graham
This short talk covers some of the more interesting XML and XSLT aspects of a recent project for a commercial bank to convert Excel worksheets into XML for uploading to a Central Bank portal to comply with anti-money laundering regulations.
Fento, an adjusted approach for xml/java object binding
Jorge Sanchez Rodriguez
TBD
The story of Gerald Cinamon’s germandesigners.net: Transforming a complex MS Word manuscript into an interactive web experience
Matthew Patterson
The renowned Graphic Designer and Author Gerald Cinamon wrote a comprehensive biographical dictionary about graphic designers working in Germany during the Nazi regime, ‘German graphic designers during the Hitler period’. The book was written as a manuscript in Microsoft Word,containing over 900 entries across several separate files.
When turning the manuscript into a viable print-published book seemed unlikely, I was approached about whether it would be possible to publish it as a website.
This is the story of how we were able to combine the author’s Word files, metadata annotations in Word, and XSLT to create well-structured, richly annotated markup that could, in turn, be transformed into a work of Hypertext.
We’ll cover:
- How the source material was prepared in Word itself, and what techniques we used to annotate and enrich the text.
- How the Word files were processed into usable, structured, XML.
- How that XML was turned into a website.
- What’s being done to combine modern browser features with modern XSLT to provide a richer, more useful, experience for users of the site.



