Xineo XIL Tutorial

Table of contents

Introduction

Xineo XIL (XML Import Langage) defines an XML language for transforming various record-based data sources into XML documents, and provides a fully functional XIL processing implementation. This implementation has built-in support for relational (via JDBC) and structured text (like CSV) sources, and is extensible thanks to its public API, allowing dynamic integration of new data source implementations. It also provides an abstraction over output format, and the Xineo implementation can generate output documents into stream or as DOM document.

Concepts and examples

The XML Import Language (XIL) defines a way to express transformations from various record-based data sources to XML. A transformation is expressed as a well-formed XML document, which may include both XIL defined elements and any other elements. XIL defined elements belong to the XIL namespace (http://www.xineo.net/XIL-1.0), which will be referred to using the "xil" prefix in the rest of this document.

The XIL language is not restricted to a specific data source, and gives access to all source types that are available in the actual implementation that is used to process the XIL document. The Xineo XIL implementation has built-in support for some source types, and is easily extensible via the Xineo XIL Java API. Xineo's implementation built-in data sources include :

As it was said before, a valid XIL document contains two kind of elements :

Defining data sources

The first step in the construction of an import sheet is to define the data sources that are needed to build the output document. Data sources are defined using the xil:source element. Each data source has a type (which defines the kind of data source) and is attributed a user-defined name (that will be used for future reference). Each data source can also be given a certain amount of properties (depending on the source type). A single import sheet can define as many data sources as needed.

Here is a example that defines a JDBC data source named "myDataSource". Properties are used to define the database connection parameters :

<?xml version="1.0" ?>

<xil:xil xmlns:xil="http://www.xineo.net/XIL-1.0">

    <xil:source type="sql" name="myDataSource">
        <xil:property name="url"> jdbc:mysql://myServer/myBase </xil:property>
        <xil:property name="userId"> myUserId </xil:property>		
    </xil:source>
    
    [...]

</xil:xil>

The complete reference of built-in data sources and corresponding properties is detailed in the input source reference.

Creating static output

Once the data sources are defined, output templates must be defined. These templates are usual XML nodes that will be instantiated by the XIL engine to produce the output document. Some of them are static and will be output as they are in the sheet, and other ones are dynamic and depend on the data exported from the data source.

Output templates are always defined inside a xil:output element. The following example show how to create some static elements (in this case, the addressBook element will be the root of the produced document) :

<?xml version="1.0" ?>

<xil:xil xmlns:xil="http://www.xineo.net/XIL-1.0">

    <xil:source type="sql" name="myDataSource"> [...] </xil:source> 
        
    <xil:output>
    
        <addressBook>
        
            <title> My address book <title;>
            
            [...]
        
        </addressBook>
    
    </xil:output>

</xil:xil>

Creating dynamic output

Let's consider that we want to pupulate our address-book using data available as a view of our relational database. We previously saw how to declare a JDBC input source that will allow us to query the database and construct XML elements from the obtained data.

To create dynamic nodes in the import sheet, we have to use the xil:node element, which itself contains two sub-elements :

The xil:input element must specify which input source to use (my its name) as well as the the query to access the data (using SQL for JDBC data sources). Each record returned by this query will be accessible to produce the XML output.

The xil:output element specifies an output template that will be instantiated for each record returned by the given query. The data of each record is available via a set of variables that will be properly substituted by the XIL engine when instantiating the templates.

When using a JDBC data source, each returned column creates a variable of the same name. For example the SELECT Id,Name FROM ... query would create two variables named Id and Name (case-sensitive). To substitute a variable, the following form must be used: ${variableName}. Such references to variables will be substituted in any attribute of the output template, as well as using the xil:subst element.

The following example show how to query the Person table of our database, and construct suitable XML elements to populate our address-book document.

<xil:output>

    <addressBook>
    
        <title> My address book <title>
        
        <xil:node>		
        
            <xil:input source="myDataSource">
                <xil:query> SELECT Id,Name,Address FROM Person </xil:query>
            </xil:input>
            
            <xil:output>
                <person id="${Id}">					
                    <name> <xil:subst value="${Name}"/> </name>
                    <address> <xil:subst value="${Address}"/> </address>
                </person>
            </xil:output>
            
        </xil:node>
    
    </addressBook>

</xil:output>

The following example shows a possible output of this import sheet :

<xml version="1.0">

<addressBook>

    <title> My address book <title>
    
    <person id="1">					
        <name> Miles Davis </name>
        <address> 42 Horn Street, New-York </address>
    </person>
    
    <person id="2">					
        <name> Tony Williams </name>
        <address> 21 Drum Street, London </address>
    </person>
            
    [...]

</addressBook>

More on variables

Variables range. As in most programming languages, variables in XIL have a range which depends on where they are defined. The scope of variables are created by a xil:query statement is the corresponding xil:output element. Variables defined at the sheet level (see below) are available globaly. When several variables of the same name exist at the same time, only the more local one is available, and more global ones become available again when exiting the local variable scope.

Variables everywere. As shown in previous examples, variables are primarily used in output templates. But variables can also be used in several other places.

First of all, variables are also substituted in queries (in xil:query elements). For example, when a xil:node element embeds another one, results from the first query can be used to construct another one (on the same or other data source).

Variables are also substituted in data source property definitions. Global variables can thus be used to make an import sheet externally configurable (see "Using the Xineo XIL processor").

Using the "regex" data source

Beside JDBC, the Xineo XIL processor provides an built-in data source for structured text files, based on regular expressions.

A regular expression based data source can be defined using the regex type. The file to be processed is specified by the inputSource property. The encoding of this input source may be specified by the inputEncoding property. Here is an example :

<xil:source type="regex" name="myDataSource">
	<xil:property name="inputSource"> inputfile.txt </xil:property>
	<xil:property name="inputEncoding"> ISO-8859-1 </xil:property>		
</xil:source>	

This data source can then be queried via a regular expression, specified in the xil:query element. Each line of the input source will be matched against the given regular expression (not matching lines will be silently ignored), and records will create variables for each parenthesis-captured group. Capturing groups are numbered by counting their opening parentheses from left to right. In the expression (A)(B(C)), for example, there are three such groups:

  1. (A)
  2. (B(C))
  3. (C)

Each group will create a variable named by its number, for example ${1}, ${2} and ${3}. Here is an example considering an input file where each line contains three tab-separated fields:

<xil:node>		
    <xil:input source="myDataSource">
        <xil:query> ([^\t]*)\t([^\t]*)\t([^\t]*) </xil:query>
    </xil:input>
    
    <xil:output>
        <person id="${1}">					
            <name> <xil:subst value="${2}"/> </name>
            <address> <xil:subst value="${3}"/> </address>
        </person>
    </xil:output>
</xil:node>	

The Regex input source implementation uses Java-builtin regex support. Please consult the relative documentation for more information on regular expressions syntax.

Using the "localRegex" data source

This data source type is a variant of the "regex" one :

Of course, it wouldn't be very useful if the input string was static, and this kind of source will generally be used to match patterns in some result of an upper-level query, for example to tokenize some field value.

In the following example, we first execute a query to get some fields from a JDBC table, before using the "localRegex" data source to tokenize the Name field and produce correspoding token XML Elements. Note that this example also demonstrates the possibility to set data source properties in the xil:input element instead of in the xil:source declaration (when the same property is defined in both locations, priority is given to the one defined in xil:input elements).

<xil:source type="sql" name="myDataSource">
    [...] 
</xil:source>

<xil:source type="localRegex" name="myLocalRegex" />

<xil:output>
    <xil:node>		    
        <xil:input source="myDataSource">
            <xil:query> SELECT ID, Name FROM Person </xil:query>
        </xil:input>        
        
        <xil:output>
            <person id="${ID}">	
                <name>				                
                    <xil:node>	                    
                        <xil:input source="myLocalRegex">
                            <xil:query> ([^\s]+) </xil:query>
                            <xil:property name="inputString">${Name}</xil:property>
                        </xil:input>     
                        
                        <xil:output>
                            <token><xil:subst value="${1}"/></token>
                        </xil:output>                        
                    </xil:node>		                
                </name>
            </person>
        </xil:output>        
        </xil:node>	
<xil:output>

Declaring new data source types

The Xineo XIL engine provides a set of data sources types by default, but this set can also be extended, since new data source types may easily be implemented conforming to the API.

To use a third-party data source type, you have to tell the engine about it. To do so, the xil:sourceType element may be used, which has just two attributes : the name attribute specifies the name of the newly registered data source type, ant the class attribute specifies the fully-qualified name of the data source Java class, which must be valid API-conforming data source implementation. Here is an example :

<xil:xil>

    <xil:sourceType name="mySourceType" class="com.foo.bar.MyDataSource"/>
    
    <xil:source type="mySourceType" name="mySource">
        <!-- Data source specific properties -->
    </xil:source>
    
    [...]
<xil:xil>

Note that data source types may also be registered programmatically using the SourceFactory class (see "The Xineo XIL API").

Using the Xineo XIL processor

Command-line usage

The simpliest way to run Xineo XIL as a command-line tool is: "java -jar xineo-xml-X.X.X.jar <parameters>". You may also put the ".jar" in your CLASSPATH and run the "net.xineo.xml.xil.Main" class.

The available parameters are : [inputFile [outputFile [outputEncoding [property=value ...]]]]. Omitting "inputFile" or "outputFile" will read/write from/to standard input/output. Using "-" instead of "inputFile" or "outputFile" will use respectively the standard input and/or output. Default output encoding is UTF-8.

The "property=value" couples may be used to bind a value to a variable which will be available globally in the import sheet. Here is a command line sample :

$ java -jar xineo-xil.jar myImportSheet.xil myOutputFile.xml userId=bob

The Xineo XIL processor can also be easily used from any Java program via its public API (see "The Xineo XIL API"). For example, it could be integrated in a J2EE application.

About CSV files

Comma Separated Value (CSV) files and variant are structured text files where each line contains a set of fields, usually separated by commas or ant other character like tabulations or columns. In many cases, those files may easily be handled in with XIL using the regex data source type, like demonstrated in this tutorial.

But in more complex cases, regular expressions will not be sufficiently powerfull, for example if you want to perform queries on CSV entries. In this case, you may use the CSV-JDBC driver, which is a simple read-only JDBC driver that uses (CSV) files as database tables.

The Xineo XIL API

This section has not been written yet. Please consult the Xineo XIL API documentation.

Conclusion

To learn more about Xineo XIL and how to use it in your applications, please refer to: