Tuesday, November 27, 2007

OpenOffice API sample

A sample on how to use the OpenOffice is described on this blog. It gives the impression that it is not headless (perhaps my mistake). I know OpenOffice can run headless.

Monday, November 26, 2007

PDF Export in Java

This site and Open Directory site mention a list of available Java libraries for PDF export (commercial and open source).

Alfresco uses PdfBox, POI and OpenOffice for its document mgt. (I suspect OpenOffice for the PDF conversion)

Sunday, November 25, 2007

WebDav

I did not realize that you could mount a webdav folder as a drive on your computer. For example WebDrive does this. According to this article even XP SP3 can do it out-of-the-box (SP2 has some issues, see here). This makes remote documents easy to understand for users. (In my pet applications the users are used to shared drive for all their documents. Versioning? They use backup & restore ;-)

I need a simple WebDav implementation that redirects to a folder on the server. This way I can easily link to files from my web application. My hope is that users can also save the files they open this way (should be the case). Perhaps I will use Tomcat's webdav sample application, although I need to direct the content to/from another folder as the web application's root - which is not that easy. According to this blog it is possible using a JNDI factory.

Instead of Tomcat I could use a library like JackRabbit or an application like Alfresco, but they are too heavy. (btw in my experience you should be very careful integration another application into your own). Missing using Tomcat's implementation would be a search engine - which is supported by the two above, although i'm not sure whether JackRabbit would index Word documents out of the box. (btw I do know that JspWiki can search through Word Documents using Lucene)

Friday, November 23, 2007

Executive UML

I do not believe in UML that can be executed one day. Perhaps for a very small, well defined domain it might work - and then only for the first 20%, the rest, the tweaking, small UI enhancements will require manual coding. Hopefully in a language that is powerful enough and does not require tons of code. (Java is an example of how it not should be done, with the writers, outputstreams, etc. You have always to pay the price for flexibility, certainly not a good trade-off)

Alternative:

From the requirements you distill the concepts. For example a requirement could be: "as end-user I should be able to add a contact person". Contact person would be a concept.
The requirements would be translated in automated acceptance tests in terms of these concepts. The automated acceptance test would talk to an API. (call addContactPerson and getAllContactPersons). Next step would be to translate these tests to UI level - probably requiring human interaction.

Obvious benefit would be that you could change the implementation and still verify whether the requirements are met or not.

Modern Times

Ain't it weird: we fear privacy intrusion, but everybody is writing blogs. We dislike traditions like female genital mutilation, but women these days do surgery on their body everywhere.

Word 2003 docx format - mail merge


Hopefully i'm doing something wrong, but look at the screenshot to the left that displays the document.xml of a docx when a mail merge field has been defined. Notice the use of begin and end instead of making use of the hierarchical nature of xml.

But it gets even weirder; if you define a document in Word (i'm using 2003 with the docx plugin) with the mail merge fields and you do a preview before saving the document then the document.xml looks like this:


The xml hierarchy of the last one makes sense, but the first one definitely not. Why the two are completely different, i've no clue...

After doing some research I found out that others have run into a similar issue. See the hyperlink discussion on this blog.

Tuesday, November 20, 2007

Partially copying XML using XSL

Subject: Sample xls that copies entire XML source tree, but skips an element as specified in the xls.

Note: the indentation is messed up due to the xml characters used.

The source xml file includes a link to the xsl for display purposes (xml and xls as stored in the same folder):

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="display.xsl"?>

<data>
<employee id="100">
<lastname fooAttr="123">Basta</lastname>
<insertion>van</insertion>
<firstname>Marco</firstname>
</employee>

<employee id="102">
<lastname fooAttr="123">Kruif</lastname>
<insertion></insertion>
<firstname>Jan</firstname>
</employee>

</data>


The xsl file (called display.xsl), the employee entry with firstname 'Marco' is skipped:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="*">
<xsl:copy> <xsl:apply-templates/> </xsl:copy>
</xsl:template>

<xsl:template match="/data/employee[firstname='Marco']"/>

</xsl:stylesheet>

Monday, November 19, 2007

Convert Word document to HTML

I've found two ways to convert a word document in docx format to HTML, both using XSLT.
This would make it possible to do some kind of mail merge preview, replacing the mail merge fields and showing the output both using XSLT.


Similar topic, but different implementation (not open source) from Softinterface. Not sure how they get this working without installing any additional software. (perhaps they use the OLE object's preview for images, but how about the entire Word document?).