Using Xml, Xsd and XSLT Identity Transform to template and generate (Word) documents (part 1)

Welcome to my first blog post everyone! This blog post comes from the preparation work I have done for a training about xml, xsd’s xpath and xslt. I thought it would be cool to share my findings here Glimlach.

Today I want to talk about using a xslt technique called Identity Transform. According to Wikipedia an identity transform is a data transformation that copies the source data into the destination data without change. But we can use it for so much more than that!

First let’s show what the identity transform would look in xslt 2.0

image

This is a very simple stylesheet, all it does is copying the content from one document to another document. I will explain what happens. When you apply a xslt transform the xslt processor uses a couple of default templates, one of these templates matches the root node of your document and applies the rest of the xslt templates to the children of your root node.. The root node is not the node on the top most level of your xml document, that one is called the document node, the root node is more or less a virtual node that xpath places as the parent of your document node so you can use absolute paths to adress nodes.

The default template does an apply-templates instruction on all the children of the root node. It’s first child will be the document node of your document. Do we have a template that matches our document node? Yes we have ! Looking at the match attribute of our template, it species attribute() a nodetest that matches all attribute nodes and it specifies node() a node test that matches al nodes(text nodes, element nodes etc ) on the child axis.  The | (pipe) specifies an union operator in xpath.

Our template will match any node, make a shallow copy (only tags no content yet) of it, and as content of the shallow copied node it will apply our template again to all child attributes and other nodes, so all of those will be recursively copied as content, effectively creating a copy of our whole document. You have to specify the attribute() on the apply templates instruction because default, the apply templates will apply only on the child axis of the current context node. I put node() there to make sure all kind of child nodes are copied, the * only matches child element nodes but not text nodes for example.

Looking at this you will probably say to yourself ‘What’s the big deal?’ Let me come to thatGlimlach. The real fun begins when we start to add more templates to our xslt stylesheet, allowing us to copy nodes, but effectively changing or replacing other nodes. Allowing us to template certain documents. Too often on the job I see a lot of C# code to fill in placeholders in xml documents when Xslt really is intended for this kind of stuff. First let’s look at a sample xml file:

imageimage

The left is a xml file representing a couple of employees from Info Support. The document holds their names, their work address and the business unit they work for. Lets put some place holders in there so we can put their employers in there. Doing this we will get the document to the right. Now how to fill in values in the placeholders? Very simple, like this:

image

Applying this transform will result in the following file:

image

How does this work? Xslt has a fairly complex priority system, but its safe to say that more specific templates will get precedence over generic templates. For every node in the xml document only one template will fire. For us this results in ever node being copied, except when the apply-templates instruction is done from the employer node. The xslt processor will go through the childs of the employer node, and will find the valueholder node. Both of our templates will match this node, but since ‘valueholder’ is more specific than ‘attribute() | node()’ that template will get precedence and the valueholder node will not be copied, instead we create a text node in its place with the value InfoSupport.

This still does not seem like a really big deal, but keep in mind you can use this kind of xml templating for every xml document you can think of! Now the value ‘InfoSupport’ was contained in our stylesheet, but what if could obtain the value as a variable from outside the stylesheet? Or get a different value for every placeholder? This way we could populate our document with all kinds of data.

Word 2010 documents are also xml files. In part 2 of this article I will show you how you can create xml Word document templates, in those templates specify with place holders what kind of data you want, and from xslt fetch this data from .Net objects and fill it in the right places. The idea comes from a project I worked on, we had something similar but as I dove into xslt and its power I decided to build it from the ground up using Office 2010.

Stay tuned!

Greetings
Chris