Using Xml, Xsd and XSLT Identity Transform to template and generate (Word) documents (part 2)

In part one we talked about using xslt to create simple xml templates and fill them with a identity transform using xslt. We can use this approach for every xml document thinkable, but for now I will focus on Word documents as those are fairly common in our line of work Glimlach. The first thing to ask ourselves is, how do we get our own custom xml placeholders in Word, maybe we want different kinds of placeholders, like ones were a value will be put or ones that denote a conditional block of Word content.

The answer is simple, Word allows us to create our own xsd and hook that into a Word document, now this opens up all kinds of possibilities for custom functionality in our Word templates. So first, lets create a xsd. All the files shown will be attached for everyone to download in a .zip file so if you want to have a better look don’t worry. Our xsd looks like this:

image

This is a fairly simple xsd, ALWAYS provide a target namespace for your schema so your own elements can be identified as your own elements. This namespace will also be declared in our xsd with the prefix tns: so we can use this namespace to refer to our global types, global types and elements will belong to the targetNamespace by default.

Let’s look at the elements and types. First I have an element that will be applied to the entire Word document, this will be our document node, it is called WordTemplate. The type is templateType which is defined below it. This type specifies a xs:choice which contains a valueHolder element. Now why would we use xs:choice and not for example sequence? The answer is because I don’t want to add different kinds of placeholders in our Word document in a particular order. Now we have only one kind, but later on I may want to provide additional place holders. the only way to specify an unordered set of different kinds of optional elements in xsd is with a xs:choice which can occur multiple times. The valueHolder element will have one attribute named ‘query’. In our Word template this will be filled with a xpath query which defines which property value from .Net will be placed here.

Now how do we couple this xsd in our own Word template. When you open Word 2010 you need to enable the developer tab. Go to File –> Options –> Customize Ribbon and check the developer tab. From the developer tab you will see an option for schema. With this option you can add your own schema. Afterwards you can click the structure button and your Word document will look like this:

image

To the right you can see our own Schema elements, to only option we have at first is to apply our document node. So lets do that. In order to do so, all you need to do is to click on our element, Inside our document node we can apply our valueholder elements. I will apply it on places where I want values from .Net inserted. Afterwards our document will look like this:

image

You can see all our elements. When you press ctrl shift X, Word will toggle between this view and the view without the tags. To the right you can see yellow signs before the valueHolder elements this means those tags are invalid according to our schema. Word is right! Recall that those elements need to have a required attribute called query. So we right click all those elements and fill in the attributes like this:

image

Here I right clicked the valueHolder element next to Name: and I filled in a xpath query. From our template we need to know the structure of the .Net object we are going to work with. Specifying this in the template allows us to later add extra properties of the object, without modifying code but with only replacing the template. I will fill in the rest and save the document. You can save the document as a standard Word 2010 file, it will be .docx which is actually a zip file with other files in it. This approach will still work but in .Net we would have to use the packaging API to get the content.xml file and then transform it using xslt. For simplicity I will save the file as a Word 2010 xml file so all the info is contained in one file, no need for extracting.

Now where is the Identity transform I talked about? Well, right below:

   1: <?xml version="1.0" encoding="utf-8"?>

   2: <xsl:stylesheet version="1.0"

   3:     >="http://www.w3.org/1999/XSL/Transform"

   4:     >="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"

   5:     >="urn:Chris:demo:word"

   6:     >="http://schemas.openxmlformats.org/wordprocessingml/2006/main">

   7:&nbsp; 

   8:   <xsl:output method="xml" indent="yes"/>

   9:&nbsp; 

  10:   <xsl:template match="@* | node()">

  11:     <xsl:copy>

  12:       <xsl:apply-templates select="@* | node()"/>

  13:     </xsl:copy>

  14:   </xsl:template>

  15:&nbsp; 

  16:   <xsl:template

  17:      match="w:customXml[@w:element='valueHolder']/w:tc/w:p" >

  18:     <xsl:variable name="query" 

  19:     select="../../w:customXmlPr/w:attr[@w:name='query']/@w:val">

  20:     </xsl:variable>

  21:     <xsl:copy>

  22:       <xsl:apply-templates select="@* | node()"/>

  23:       <w:r>

  24:         <w:t>

  25:           <xsl:value-of 

  26:           select="wordgen:GetMessageValueByXpath($query)"/>

  27:         </w:t>

  28:       </w:r>

  29:     </xsl:copy>

  30:   </xsl:template>

  31: </xsl:stylesheet>

Above is a xslt stylesheet created in VS2010. What strikes is that the identity transform I used in part one was defined as match=&rsquo;attribute() | node()&rsquo;. Here it is not. This is because .Net 4 only supports xslt and xpath 1.0! So this is the way to do it in 1.0 because 1.0 does not support the extra node tests..

The first template is the copy action, the second template is more interesting. Recall that we put our own xml tags in our Word template. Office 2010 saves those in its own WordML xml as <customXml> tags with element attributes describing its name. The xpath query in the match translates to the following: Only match on w: p nodes which are child’s of w:tc nodes which are childs of customXml nodes whose name is valueHolder. I match the w: p nodes because that is where Word wants the text. If you put text in our Word document within a custom xml tag word puts your text within the w: p node which is a descendant of the custom xml node representing your tag. when we have a match on the w: p node we perform the following steps:

  • Extract the value of our query attribute.Word places it at a rather strange place, as a descendant of the customXml tag with a name and val attribute.
  • We put this in a variable so we can use this later on in our template
  • We recursively copy all the content of the p node to our output document so its completely intact when we add our own text.
  • We add our own value extracted from a .Net object within a couple of Word mark up tags. Line 26 is especially interesting. The function that gets called there gets our xpath expression as a parameter. Our xpath expression was defined with the valueHolder as an attribute in our Word template. So how does this xslt function uses this expression on a .Net object??

Because actually, it&rsquo;s a .Net function being called. And this is super cool, using so called xslt extensions we can call functions on .Net objects from xslt! How cool is this Glimlach. lets look at some code:

       1: namespace Chris.Demo.WordGenerator

       2: {

       3:     /// <summary>

       4:     /// This class will be the heart of our custom Word functionality

       5:     /// </summary>

       6:     class XsltExtension<TMessage>

       7:     {

       8:         XPathNavigator messageDoc;

       9:         public XsltExtension(TMessage message)

      10:         {

      11:             //pre

      12:             Guard.ArgumentsNotNull(message);

      13:&nbsp; 

      14:             XmlSerializer xs = new XmlSerializer(typeof(TMessage));

      15:             MemoryStream ms = new MemoryStream();

      16:             xs.Serialize(ms, message);

      17:             ms.Seek(0, SeekOrigin.Begin);

      18:             messageDoc = new XPathDocument(ms).CreateNavigator();

      19:         }

      20:         /// <summary>

      21:         /// This method will be called from XSLT

      22:         /// </summary>

      23:         /// <param name="xpath"></param>

      24:         /// <returns></returns>

      25:         public string GetMessageValueByXpath(string xpath)

      26:         {

      27:             if (string.IsNullOrWhiteSpace(xpath))

      28:             {

      29:                 return string.Empty;

      30:             }

      31:             else

      32:             {

      33:                 XPathNodeIterator ni = messageDoc.Select(xpath);

      34:                 ni.MoveNext();

      35:                 return ni.Current.ToString();

      36:             }

      37:&nbsp; 

      38:         }

      39:     }

      40: }

This class functions as the base for all of our extra xslt functions, for now there will only be one: GetMessageValueByXpath. This function is called from xslt to extract a value from a .Net object. The .Net object is supplied in the constructor of this extension, it is serialized to xml and converted to a XPathDocument, Once we have the navigator of the XpathDocument we can use every Xpath 1.0 function from our word template, including number formatting, date formatting and even our own defined xpath functions. For this example the object being passed into the xslt extension will be a PersonInfo object corresponding with the properties being specified in our Word template.

Lets the code that creates this extension and passes it to the xslt transform:

   1: using System;

   2: using System.Collections.Generic;

   3: using System.Linq;

   4: using System.Text;

   5: using Chris.Demo.WordGenerator.Contract;

   6: using System.Xml.Xsl;

   7: using System.Reflection;

   8: using System.Xml;

   9: using Chris.Demo.WordGenerator.Util;

  10: using System.IO;

  11:&nbsp; 

  12: namespace Chris.Demo.WordGenerator

  13: {

  14:     /// <summary>

  15:     /// A transformer that transforms the objects using XSLT

  16:     /// </summary>

  17:    class XslTransformer:IObjectDocumentTransformer

  18:     {

  19:        private XslCompiledTransform _xsltransform;

  20:        public XslTransformer()

  21:        {

  22:            _xsltransform = new XslCompiledTransform(false);

  23:            Stream stylesheet = Assembly.GetExecutingAssembly().GetManifestResourceStream(Constants.XSLTRESOURCE);

  24:            _xsltransform.Load(XmlReader.Create(stylesheet));

  25:        }

  26:         public System.IO.MemoryStream TransForm<TMessage>(TMessage message, System.IO.Stream template)

  27:         {

  28:             XmlReader templatedoc=XmlReader.Create(template);

  29:             MemoryStream returnStream = new MemoryStream();

  30:             XsltArgumentList arguments = new XsltArgumentList();

  31:             arguments.AddExtensionObject(Constants.DEMONAMESPACE, new XsltExtension<TMessage>(message));

  32:             _xsltransform.Transform(templatedoc, arguments, returnStream);

  33:             returnStream.Seek(0, SeekOrigin.Begin);

  34:             return returnStream;

  35:         }

  36:     }

  37: }

Line 31 is the line that creates the XsltExtension, passes it the message object (PersonInfo in our demo app that will follow), and transforms the passed template with our xslt stylesheet and our extension to a valid word document. The xslt extension needs to be in a separate xml namespace. The DEMOSPACE constant I am using here maps to:urn:Chris:demo:word. In our xslt stylesheet you can see I prefixed this same namespace with wordgen. That&rsquo;s why in the xslt stylesheet the function is called like wordgen:GetMessageValueByXpath($query).

This XslTransformer will be used by the WordGenerator class. The WordGenerator class is the class client applications will interact with. Its code is shown below:

   1: using System;

   2: using System.Collections.Generic;

   3: using System.Linq;

   4: using System.Text;

   5: using Chris.Demo.WordGenerator.Contract;

   6: using System.Xml.Xsl;

   7: using System.Xml;

   8: using Chris.Demo.WordGenerator.Util;

   9: using System.IO;

  10: using System.Reflection;

  11: using Conditions = System.Diagnostics.Contracts;

  12:&nbsp; 

  13: namespace Chris.Demo.WordGenerator

  14: {

  15:     /// <summary>

  16:     /// Our generator

  17:     /// </summary>

  18:     public class WordGenerator :BaseGenerator

  19:     {

  20:         public WordGenerator():this(new XslTransformer())

  21:         {

  22:         }

  23:         public WordGenerator(IObjectDocumentTransformer transformer):base(transformer)

  24:         {

  25:         }

  26:&nbsp; 

  27:         /// <summary>

  28:         /// Later there will be more functionality here, for now a very basic implementation

  29:         /// </summary>

  30:         /// <typeparam name="TMessageObject"></typeparam>

  31:         /// <param name="message"></param>

  32:         /// <param name="template"></param>

  33:         /// <returns></returns>

  34:         public override System.IO.MemoryStream GenerateLetter<TMessageObject>(TMessageObject message, Stream template)

  35:         {

  36:             //pre didnt use code contracts because of the extra req tooling

  37:             Guard.ArgumentsNotNull(message, template);

  38:             return Transformer.TransForm(message, template);

  39:         }

  40:     }

  41: }

It inherits functionality from its base class, functionality like the Transformer property and a GenerateAndSave file method which uses the overridden method here. For now it only adds a guard and delegates to the transformer object. Default it will use our xslt transformer but the constructor also accepts another transformer as long as it implements the IDocumentTransformer interface. From this code it is obvious that client applications who will be using this dll, must supply a .Net object that contains the data which you want to be shown on the Word letter and, the client app must supply a Word template. Each client app, or even different parts from one application can generate a lot of different Word letters using this approach, all they have to do is define a template, pass it and the data to the generator and we have a Word letter!

I created a demo app which uses the Word template we have defined earlier. It is a very basic Windows Form that asks for the info we wanted to show in the Word letter, and when we push the button it generates the letter. Lets have a look Glimlach

image

We fill in the data, we press generate and the screen below will show:

image

It asks us where to save the document. I choose my Desktop and presto !Glimlach We have a Word document, generated according to our template with the data we specified in ourForm.

image

Lets see how this works. Here is our Windows Form class, its very simple:

   1: using System;

   2: using System.Collections.Generic;

   3: using System.ComponentModel;

   4: using System.Data;

   5: using System.Drawing;

   6: using System.Linq;

   7: using System.Text;

   8: using System.Windows.Forms;

   9: using Chris.Demo.WordGenerator;

  10: using System.IO;

  11: using FEWordGenDemo.Properties;

  12: using Chris.Demo.WordGenerator.Contract;

  13:&nbsp; 

  14: namespace FEWordGenDemo

  15: {

  16:     public partial class Form1 : Form

  17:     {

  18:         private PersonInfo pi = new PersonInfo();

  19:         private ILetterGenerator _generator;

  20:         public Form1(ILetterGenerator generator)

  21:         {

  22:             InitializeComponent();

  23:             personInfoBindingSource.DataSource = pi;

  24:             _generator = generator;

  25:         }

  26:&nbsp; 

  27:         private void btnGenerate_Click(object sender, EventArgs e)

  28:         {

  29:             if (svfWordDoc.ShowDialog(this)==DialogResult.OK)

  30:             {

  31:                 Stream templateStream = new MemoryStream(Resources.WordTemplate);

  32:                 _generator.GenerateAndSaveLetter(pi, templateStream, svfWordDoc.FileName, true);

  33:             }

  34:         }

  35:     }

  36: }

The most exciting stuff happens from line 27 and downward. Here you will see:

  • The SaveFile Dialog.
  • Getting the Word template, in my app it was embedded as resource but imagine a scenario where you can download all the templates from a location. Whenever the templates need updating, you place them on the server and every client will download it automatically. The last scenario is very similar to a project I have worked on.
  • Asking our Word generator to generate a Word document save it and open it in the default application! You can see I use the constructor to inject the generator, this way I can easily switch implementations. I pass the generator our template as a stream, and the PersonInfo object which should contain al the data by use of databinding.

Conclusion

Xslt, xpath and xsd&rsquo;s in combination with the programming language of our choice enables us to very easily template and generate all kinds of xml documents. Here we have looked at Word generation. Some of you might wonder, why this way, why not just grab the Open Xml sdk to generate Word 2010 documents. This is a valid point, but generating our Word documents this way allows us to easily add extra functionality, think about Conditional place holders, all the xpath formatting that can be done from our templates, Iterative place holders that will generate rows based on a .Net array, stuff like that!

Xslt extensions in .Net are very powerful, you can even pass nodes from xslt to .Net and vice versa allowing us to so some very cool modifications on our message object before sending back to xslt and injecting it in the Word document. This also allows for a very standard way of generating documents as it can be applied to all kinds of xml documents.

All the files I used and created are in one Visual Studio solution. here is the download link:

Pfew! this was a big post with lots of info. I hope you liked reading it. Feel free to comment!

Greetings

Chris