Friday, November 23, 2007

Word 2003 docx format - mail merge


Hopefully i'm doing something wrong, but look at the screenshot to the left that displays the document.xml of a docx when a mail merge field has been defined. Notice the use of begin and end instead of making use of the hierarchical nature of xml.

But it gets even weirder; if you define a document in Word (i'm using 2003 with the docx plugin) with the mail merge fields and you do a preview before saving the document then the document.xml looks like this:


The xml hierarchy of the last one makes sense, but the first one definitely not. Why the two are completely different, i've no clue...

After doing some research I found out that others have run into a similar issue. See the hyperlink discussion on this blog.

3 comments:

Unknown said...

docx seems to be an xml representation of what word displays on-screen, complete with spelling highlights and all the other garbage that does not form part of the document itself.

Leo said...

sometimes a mergefield is stored as fieldChar-begin instr fieldChar-end sequence and sometimes (like it should) as a simpleField .
Any idea why?

Stephan Westen said...

I've given up on Word automation and am using OpenOffice as a service. This is very successful despite the cumbersome UNO api.