Serving Up PDF – Another Way

A salient point in that post was also made about using the right tool for the job at hand. In this post, I thought I would offer another way. For the sake of argument, let’s assume that the challenge remains the same – dynamically generating formatted content. But in this case, let’s also assume that the touch-point we are serving is facilitated using the ubiquitous Portable Document Format.

Many applications today are capable of creating PDF right from the desktop. This convention presumes that the application will ‘set’ the content in a style or form and the PDF will reflect the content in that very form. Well, there are a number of emerging tools (nay, toolkits) that can be used to encapsulate all of the styles that may be enshrined for the enterprise – the branding.

Essentially, these toolkits provide the means of creating a library of page templates and paragraph styles so that they can be applied to content using a number of rules and yielding PDF without any human intervention. The initial investment might be significant (just like XML) but if your publishing requirements are demanding, a rules-based formatting engine might well offer an attractive return on that investment.

It seems as though all of today’s relevant development languages have there share of such PDF toolkits. On the open source front, Perl has its PDF-API2 while Python has its ReportLab based on PLATYPUS (Page Layout And TYPography Using Scripts). A commercial library is also available from PDFlib and it supports these and others including everything from C++ through to Java, PHP and Ruby.

Here’s how it might work for you. Whether Perl, PHP, Python or Ruby, the development environment is the enabler that will do all of the work. From a Web interface, you can use it to establish the context for your targeted document. What is the person buying, or reading, or filling in? Does the person have a profile on the site aiding in determining their membership in one of your market segments? Have they visited before? These are all considerations in the scope of collective intelligence for another day.

Once you have gleaned as much context as possible, the next step is to gather all of the necessary content that is required to reach out in a compelling way. These development environments are all accomplished at packaging information whether it includes boilerplate text, XML, database records and/or multi-media. Unlike the desktop approach to PDF creation where the content needs to be assembled into a homogenous view, these PDF toolkits are capable of rendering heterogeneous content. Text blocks, tables and image objects can all be placed programmatically into a PDF file object. In many respects, it works just like a funnel. You pass one block of content after another into the page template and let the formatting engine apply the ‘keep’ rules that you have specified for pagination.

This gets really personal when you consider that you can call different styles based on the context that you have discerned. If the audience is artistic, have the program apply your ‘artsy’ suite of styles with bold colours and funky borders. If the audience is professional or business oriented you might have the program stick with a default style suite based on Arial fonts. Everything and anything in the way of formatting can be mixed up to better address the audience at hand. I humbly suggest that you enlist the services of a creative designer as you build your library of styles and templates especially if you need to comply with a corporate brand.

The toolkits that I have investigated all provide incredible performance so that you can scale up the tool to meet your enterprise content formatting requirements. And it doesn’t end there; these toolkits offer superior methods for placing complex content like tables, images, auto-numbers, references and tables of contents. They typically provide additional facilities for including whole PDF pages and other PDF special processing.

The only difficulty that I encountered was working with content within a paragraph block level. But this might have been a function of my own learning curve. Once I got accustomed to the process of inserting content objects, I was able to use the other rich tools available to me in the development environment to process the blocks recursively in order to deal with in-line objects.

My latest application generates personalized PDF documents replacing a catalogue of 150 printed pages with typically 3 pages of dynamic content. It strikes me with wonder how we have made such incredible progress when we first separated our content from its form and yet somehow we still want to have them both on our desktop in case we need to make changes. With a little time invested in planning and designing styles and pagination rules, we should be able to have the system do the work for us and move on.

Philippe Robitaille is an Information Management consultant, project manager and founder of Best Document Practices, a small independent Canadian firm helping organizations structure content and innovate business processes. He is an XML pioneer as demonstrated by his early contributions in the SGML community and continues to this day in efforts to breakdown the barriers to open and accessible information.

web hit counter

No comments yet.

Leave a Reply

Powered by WordPress. Designed by Woo Themes