Wednesday, December 19, 2007

Writing a test to verify ODT content

Sample to test the content of an Open Office document containing three lines ('1234', empty line, 'description'):

import javax.xml.parsers.DocumentBuilderFactory;
...
import com.artofsolving.jodconverter.DefaultDocumentFormatRegistry;
import com.artofsolving.jodconverter.DocumentConverter;
import org.w3c.dom.Document;
import org.w3c.dom.Element;

ByteArrayOutputStream output = templateManager.applyModelToTemplate(inputStream, map);

String content = getZipEntry(new ByteArrayInputStream(output.toByteArray()), "content.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document doc = dbf.newDocumentBuilder().parse(new ByteArrayInputStream(content.getBytes()));

Element docElement = doc.getDocumentElement();
assertEquals("office:document-content", docElement.getTagName());

Element bodyElement = (Element) docElement.getElementsByTagName("text:p").item(0);
assertEquals("1234", bodyElement.getTextContent());
// skip empty line
bodyElement = (Element) docElement.getElementsByTagName("text:p").item(2);
assertEquals("description", bodyElement.getTextContent());

Thursday, December 13, 2007

Getting the entry from a ZipInputStream

Finally I found the code to get an entry from an InputStream that contains zipped data. For a ZipFileInputStream this is easy, but not for a regular input stream because the size of the entry can be unknown (-1) and there is no alternative then just go through the stream.

sorry for the messy mark-up...

disclaimer: code is not optimized!

// based on http://java.sun.com/developer/technicalArticles/Programming/compression/

public static String unzipEntry(InputStream zippedInputStream, String entryName) throws Exception {
String result = null;
final int BUFFER = 2048;
BufferedOutputStream dest = null;
ZipInputStream zis = new ZipInputStream(new BufferedInputStream(zippedInputStream));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) {
int count;
byte data[] = new byte[BUFFER];
StringOutputStream fos = new StringOutputStream();
dest = new BufferedOutputStream(fos, BUFFER);
while ((count = zis.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, count);
}
dest.flush();
dest.close();
if (entryName.equals(entry.getName())) {
result = fos.toString();
}
}
zis.close();
return result;
}

public class StringOutputStream extends OutputStream {

// This buffer will contain the stream
protected StringBuffer buf = new StringBuffer();

public StringOutputStream() {}

public void close() {}

public void flush() {}

public void write(byte[] b) {
String str = new String(b);
this.buf.append(str);
}

public void write(byte[] b, int off, int len) {
String str = new String(b, off, len);
this.buf.append(str);
}

public void write(int b) {
String str = Integer.toString(b);
this.buf.append(str);
}

public String toString() {
return buf.toString();
}

public int contains(String string) {
return StringUtils.countOccurrencesOf(buf.toString(), string);
}


}



Pairs

While driving to work I realized something funny:

If I have to work on something, for example produce an article, I will write it, put it away and pick it up a couple of days later to look at it with a fresh mind.

Within our team we also (sometimes) practice pair programming. The idea is that two know and see more than one (the power of interdependence ;-).

Effectively these two practices are one and the same; in the first case you use yourself as the second person. Time will make you a different person than you were. The two practices differ in their usage of the dimensions time and resource. Tradeoffzz..

Monday, December 10, 2007

Red/green/refactor & spikes

I've developed a new way of coding new functionality. In the past I distinguished between spikes and production code development. The spike was meant for prototype code, throwaway code.

Today I do it like this:
I start with a new test that has some basic code that will do the basics of the job i'm after. I can very easily run this test from my IDE (IntelliJ in my case). After a while the test(s) will succeed. This proves that I understand the basics of the code needed. Next step is to move some code to the main folder - to production level. I will use Extract methods refactoring to do this.

This way I slowly, but steadily move from prototype to production without redoing work.

Friday, December 7, 2007

The Inner-Platform Effect

The Inner-Platform Effect anti-pattern is a nice description of what I try to avoid with document generation. The task of making a document template becomes so complicated that only an expert can use it.

In a way it reminds me of XP (extreme programming); keep it as simple as possible. Wait till the latest moment adding flexibility till is really needed.

Another analogy is the J2EE / EJB programming model; programmers should focus on writing business logic. However tying the whole thing together on an application server like WAS was a nightmare and required more expertise than solving the original business problem.

Thursday, December 6, 2007

RTFTemplate

RTFTemplate is the Java library that does what i was looking for; it allows replacing MS-Word mail merge fields by actual data on the server side. So users can still define their document in the traditional sense using mail merge, save it as RTF and the server use it as a template to generate new documents.

The issue however is how scalable this solution is in the long term; for example how easy is to add pieces of text to documents? Converting to PDF, which is necessary in my case, is also still not solved. Using Open Office as a templating engine would solve this. However running OO as a (headless) service complicates things; it is an external process, i'm not sure about memory leaks and/or how robust it is.

Another con of using OO is that - since Word will still remain the document editor for the user - the fields have to be typed in as regular test with some markers, for example ${lastname}. This makes it a little less robust, typos are easier made, deleted by mistake, control characters within the field.