Day 3

Schedule for Saturday

9:00	Conference site opens
9:30	Morning tutorials: Invisible XML Case Studies (room Vencovského aula) XQuery 4 Tutorial (room D, next to Vencovského aula)
11:00	Coffee break
11:20	Opening and sponsor presentations
11:30	Building an open, accessible digital reading ecosystem: inside EDRLab’s work Gautier Chomel
12:00	XProc-Baseline: Designing a Portable Regression Testing Library for XProc Pipelines Tomos Hillman
12:30	The story of Gerald Cinamon’s germandesigners.net: Transforming a complex MS Word manuscript into an interactive web experience Matthew Patterson
13:00	Lunch
14:30	Sponsor presentations
14:40	Enabling AI Across the XML Technologies via XPath Functions George Bina
15:10	Fento, an adjusted approach for xml/java object binding Jorge Sanchez Rodriguez
15:40	Ignite talks/late breaking news
16:00	Coffee break
16:30	Ignite talks/late breaking news
16:45	XML Differencing Engine: an open architecture for comparing changes in XML documents Liam Quin, Hauke Brandes and Nico Kutscherauer
17:15	Closing of the conference

Session details

Tutorial: Invisible XML Case Studies

Steven Pemberton

Invisible XML is a language and process for automatically adding markup to textual documents. It has had a stable specification since 2022, and there are now a half dozen implementations.

Like any language, using it means learning techniques and idioms in the language, and how to approach particular problems. This tutorial presents a number of case studies of ixml grammars, and will illustrate such topics as ambiguity, non-context-free languages, dealing with spacing, permissive and strict grammars, designing for existing markup formats, and the design of the ixml grammar for ixml itself.

The materials will be available after the tutorial for self-study.

XQuery 4 Tutorial

Juri Leino

Are you curious what the next version of XQuery has to offer? This tutorial will cover the new concepts, data structures, language features and functions that were added in version 4.0 of the specification. It will also explain in short how to find and navigate the new specification, its history and the status quo of both specification efforts and adoption.

There will be a number of exercises that can be solved by participants in interactive notebooks for all features that are already implemented in the XQuery processor in use.

Building an open, accessible digital reading ecosystem: inside EDRLab’s work

Gautier Chomel

The European Digital Reading Lab (EDRLab) is a non‑profit membership association dedicated to an open, interoperable and accessible digital publishing ecosystem for text, audio and image. Over the last decade,EDRLab has become a key technical actor behind the evolution of the EPUB standard, the development of Readium reading toolkits, the Thorium Reader reading application, and the Readium LCP interoperable DRM used by public libraries and trade publishers worldwide.

This talk will give a guided tour of EDRLab’s activities, focusing on how XML technologies are core to producing born‑accessible ebooks. It will cover:

The importance of XML syntax in production and validation pipelines for the publishing industry.
EDRLab’s role in the evolution and implementation of EPUB 3 within W3C and industry groups.
Current R&D efforts around accessibility, audio, and integration with
library and education platforms.

The session targets developers, solution architects, and production experts who work with XML and want to understand how these technologies empower the digital publishing industry today and tomorrow.

Gautier Chomel works at EDRLab on open-source reading technologies and EPUB accessibility, collaborating with W3C publishing and accessibility groups. With a strong background in publishing workflows, Gautier regularly speaks about practical, standards-based solutions in the publishing industry.

XProc-Baseline: Designing a Portable Regression Testing Library for XProc Pipelines

Tomos Hillman

Real-world XProc pipelines do far more than transform data. They orchestrate file operations, create deliverables, manage archives, and produce outputs with non-deterministic content. This paper identifies the need for an automated regression testing framework for XProc pipelines and file-based workflows, in addition to complementary testing libraries such as XSpec.

We discuss the core challenges of testing file and archive management, configurable canonicalization of non-deterministic content, and integration with modern CI/CD platforms. We present our approach to building XProc-Baseline: a portable, reusable library that addresses the gaps in current XML testing tools by providing configurable canonicalization, manifest-based comparison, and seamless CI/CD integration.

Enabling AI Across the XML Technologies via XPath Functions

George Bina

XPath is the shared expression language that underlies XSLT, XProc, XQuery, Schematron, and virtually every other XML technology. XPath functions are a natural integration point for AI in the XML world. Adding a few XPath functions, and large language models (LLMs) become available across all XML technologies with no per-technology integration work required. We present the idea, explore its implications for real-world XML workflows, and demonstrate it with working examples: hybrid XProc pipelines that orchestrate AI and conventional steps together, Schematron rules paired with AI-powered quick-fixes, cost-aware XSLT and XQuery refactoring scripts that invoke AI only where it is needed, and XSpec test suites that treat AI prompts as testable units.

Fento, an adjusted approach for xml/java object binding

Jorge Sanchez Rodriguez

Java XML binding tools tend to look for the same data structures and approach on the object and document side. They also tend to use Xml as a mere tool for object serialization.

We will share a proposal to enhance the document side of Java / Xml binding. The proposal relies on a more suitable tool for referencing document elements (modern XPath), a non volatile approach on document contents, focusing on the meaningful parts of the information while avoiding unnecessary mappings on the object side. We will walk through examples to show the approach use cases, virtues, and also its limitations.

XML Differencing Engine: an open architecture for comparing changes in XML documents

Liam Quin, Hauke Brandes and Nico Kutscherauer

Comparing two XML documents and reporting differences between them is a difficult problem. The best solution depends on the context or environment, and sometimes, within that, the situation.

The authors of this paper evaluated existing research, and, not finding an existing solution whose architecture met the identified needs, decided to write yet another XML diff program, heavily influenced by the paper “Change Detection in Hierarchically Structured Information.”

The story of Gerald Cinamon’s germandesigners.net: Transforming a complex MS Word manuscript into an interactive web experience

Matthew Patterson

The renowned Graphic Designer and Author Gerald Cinamon wrote a comprehensive biographical dictionary about graphic designers working in Germany during the Nazi regime, ‘German graphic designers during the Hitler period’. The book was written as a manuscript in Microsoft Word,containing over 900 entries across several separate files.

When turning the manuscript into a viable print-published book seemed unlikely, I was approached about whether it would be possible to publish it as a website.

This is the story of how we were able to combine the author’s Word files, metadata annotations in Word, and XSLT to create well-structured, richly annotated markup that could, in turn, be transformed into a work of Hypertext.

We’ll cover:

How the source material was prepared in Word itself, and what techniques we used to annotate and enrich the text.
How the Word files were processed into usable, structured, XML.
How that XML was turned into a website.
What’s being done to combine modern browser features with modern XSLT to provide a richer, more useful, experience for users of the site.