[1] Flexible structural constraints in XQuery Full-Text
Emanuele Panzeri, Pasi Gabriella (University of Milano Bicocca)
The XQuery and XPath languages have been extended by the XML Query working
group of W3C with Full-Text capabilities. Information Retrieval (IR) scoring techniques
have been integrated for scored nodes retrieval based on node’s textual content:
while approximate matching is applied to textual node contents via an IR-style
constraints (keyword-based), what about coping with structure-related user-uncertainty
via user guided approximate structural matching?
The increasing number of XML document collections, with their heterogeneous (and complex)
structures requires the users to be aware of the exact document structure to be able to
formulate a query in XQuery or XPath language. However such user knowledge is not trivial.
In [1] a novel approach to express user vagueness in structural constraints was proposed, as
an extension of XQuery Full-Text, to allow users to specify flexible structural constraints
which require an approximate matching of XML nodes, in combination of standard
XPath/XQuery Full-Text language predicates.
The proposed extension allows to obtain: (1) a ranking based on content predicates evaluation only (as in the original XQuery Full-Text), (2) a ranking based on the flexible structural constraints evaluation (based on our proposal), or (3) a ranking based on a linear combination of the two above scores, which the user may also specify, as an example, via the order-by clause.
This poster will first introduce the proposed flexible approach and the constraints Near and
Below with their semantics and syntax. Second the poster will discuss the constraints
implementation on top of the BaseX query engine with a preliminary evaluation.
Further details about the flexible extension of XQuery Full-Text can be found in:
[1] E. Panzeri, G. Pasi, An approach to define flexible structural constraints in XQuery; AMT – 2012
[2] Automated Visual Regression Testing of PDF Output from XML or other Input
Michael Miller (Antenna House)
Groups producing PDF documents from XML have a universal requirement to do regression testing of the visual output of their systems. This is necessary every time software is upgraded or a system is changed in some way to ensure that the system, with changes, continues to produce the intended results and that no new problems such as incorrect formatting, corrupted graphics, missing content have resulted from the changes to the software and system. For want of an automated method and the lack of any available solution most organizations have implemented some sort of manual regression testing that involves visually comparing the new output of a small subset of documents to a known good previous output. Because it is a visual manual check subtle problems can easily go undetected. Also, this is a very tedious and time consuming task, but still one that is very necessary.
For over two years now Antenna House has been developing an automated regression testing suite that we can use internally to test the output from new releases of our software. This automated regression testing suite has now been in use internally at Antenna House for almost ten months. The benefits we’ve realized include: much more accurate regression testing; the ability to perform regression testing on a much more extensive collection of documents as opposed to a subset; a reduction of 80%+ in the people effort to conduct regression testing.
The poster will focus on the approach Antenna House has taken to automating this previously manual task.
Antenna House has been working on an automated regression testing tool for over two years. The results are a very usable solution that automates regression testing of visual output. The presentation covers what regression testing is and is not, the need for regression testing, the approach, benefits and future plans.
[3] XML vocabularies : interconnection of S1000D vocabularies with DITA ones
Jean-Jacques Thomasson
Proposed content for poster : Theory and concrete proof of concept : demo with mixed DITA topics and S1000D modules, inclusion of S1000D modules in DITA maps and inclusion of DITA topics in S1000D publication modules. Presentation on how we have produced S1000D compliant XML Schemas from DITA Relax NG models and DITA XSD models.
Interest of the subject : As industry equipments embed more and more interactivity and software, it becomes crucial for the technical documentation/manuals (Maintenance procedures, description of the equipments, data sheets, configuration and specifications, technical data…) to be both hardware and software oriented. Extending DITA to the Aerospace mechanical industry XML models and vice-versa can quickly and easily empowered these models. On software editor side, the interest is to provide one unique set of tools for both standards.
What will be presented/discussed : 1) the genealogy of this R&D work 2) the organisation of the group working on the subject 3) the current status of the work 4) presentation and demonstration of the real working schemas and tools with demo of DITA maps embedding topics and modules and Publication modules embedding modules and topics and 5) open discussion.
[4] In-Memory Representations of XML Documents with Low Memory Footprint
Stelios Joannou, Andreas Poyias, Rajeev Raman (University of Leicester)
The SiXML initiative at the University of Leicester
aims at using succinct, or space-efficient, data structures
to reduce the memory footprint needed to hold and process
XML documents in memory. Unlike normal data compression, SiXML’s
in-memory representation can often be manipulated in a
similar manner to a standard XML document representation
with very little slowdown.
We report on the latest release of SiXDOM (SiXDOM 1.2), which has
features over the last version (SDOM 1.0) presented at XML Prague 2010
including:
- Fast, memory efficient parsing, using the Xerces-C SAX parser.
- Cross platform support for languages like Java and Python using interface
bindings with experimental evaluation. - Upgrade to 64-bit, so has already been successfully tested to parse
documents of size 4x larger than the largest document parsed in SDOM 1.0.
We will show some experimental evaluation of our work for our native C++ library
as well as the bindings we provide for Java and Python as well as explain how we
can achieve this low memory footprint.
[5] Animo – Semantic of Action
Dmitriy Shabanov, Evgeny Gadovsky, Vasilii Startcev
Animo is a reference programming language. It is a tool for representing energy transformations and flows with respect to objects represented as data. Try to think of animo that way while you do programming and that is how will Animo help you.
In the animo framework information and their manipulation (through operators or manipulators) are seen as definitions (spatial concentrations or clusters). Information on its own can’t create new relationships (cannot change its space or location). That is done by manipulating them. Through manipulations relationships are created between two definitions and that will result in creating dependencies. That process has an interesting effect since you can check what is affected as information changes may be traced by dependencies created by manipulators. There are a lot more benefits, one is global caching to “Calculate ones, share & use”.