Serving Up PDF – Another Way

Posted on September 26, 2008
Category Best Practices, Tools | Leave a Comment

A salient point in that post was also made about using the right tool for the job at hand. In this post, I thought I would offer another way. For the sake of argument, let’s assume that the challenge remains the same – dynamically generating formatted content. But in this case, let’s also assume that the touch-point we are serving is facilitated using the ubiquitous Portable Document Format.

Many applications today are capable of creating PDF right from the desktop. This convention presumes that the application will ‘set’ the content in a style or form and the PDF will reflect the content in that very form. Well, there are a number of emerging tools (nay, toolkits) that can be used to encapsulate all of the styles that may be enshrined for the enterprise – the branding.

Essentially, these toolkits provide the means of creating a library of page templates and paragraph styles so that they can be applied to content using a number of rules and yielding PDF without any human intervention. The initial investment might be significant (just like XML) but if your publishing requirements are demanding, a rules-based formatting engine might well offer an attractive return on that investment.

It seems as though all of today’s relevant development languages have there share of such PDF toolkits. On the open source front, Perl has its PDF-API2 while Python has its ReportLab based on PLATYPUS (Page Layout And TYPography Using Scripts). A commercial library is also available from PDFlib and it supports these and others including everything from C++ through to Java, PHP and Ruby.

Here’s how it might work for you. Whether Perl, PHP, Python or Ruby, the development environment is the enabler that will do all of the work. From a Web interface, you can use it to establish the context for your targeted document. What is the person buying, or reading, or filling in? Does the person have a profile on the site aiding in determining their membership in one of your market segments? Have they visited before? These are all considerations in the scope of collective intelligence for another day.

Once you have gleaned as much context as possible, the next step is to gather all of the necessary content that is required to reach out in a compelling way. These development environments are all accomplished at packaging information whether it includes boilerplate text, XML, database records and/or multi-media. Unlike the desktop approach to PDF creation where the content needs to be assembled into a homogenous view, these PDF toolkits are capable of rendering heterogeneous content. Text blocks, tables and image objects can all be placed programmatically into a PDF file object. In many respects, it works just like a funnel. You pass one block of content after another into the page template and let the formatting engine apply the ‘keep’ rules that you have specified for pagination.

This gets really personal when you consider that you can call different styles based on the context that you have discerned. If the audience is artistic, have the program apply your ‘artsy’ suite of styles with bold colours and funky borders. If the audience is professional or business oriented you might have the program stick with a default style suite based on Arial fonts. Everything and anything in the way of formatting can be mixed up to better address the audience at hand. I humbly suggest that you enlist the services of a creative designer as you build your library of styles and templates especially if you need to comply with a corporate brand.

The toolkits that I have investigated all provide incredible performance so that you can scale up the tool to meet your enterprise content formatting requirements. And it doesn’t end there; these toolkits offer superior methods for placing complex content like tables, images, auto-numbers, references and tables of contents. They typically provide additional facilities for including whole PDF pages and other PDF special processing.

The only difficulty that I encountered was working with content within a paragraph block level. But this might have been a function of my own learning curve. Once I got accustomed to the process of inserting content objects, I was able to use the other rich tools available to me in the development environment to process the blocks recursively in order to deal with in-line objects.

My latest application generates personalized PDF documents replacing a catalogue of 150 printed pages with typically 3 pages of dynamic content. It strikes me with wonder how we have made such incredible progress when we first separated our content from its form and yet somehow we still want to have them both on our desktop in case we need to make changes. With a little time invested in planning and designing styles and pagination rules, we should be able to have the system do the work for us and move on.

Philippe Robitaille is an Information Management consultant, project manager and founder of Best Document Practices, a small independent Canadian firm helping organizations structure content and innovate business processes. He is an XML pioneer as demonstrated by his early contributions in the SGML community and continues to this day in efforts to breakdown the barriers to open and accessible information.

web hit counter

Comments

Leave a Reply

You must be logged in to post a comment.

Find It


Subscribe to The Rockley Blog

    Follow Ann on Twitter

    Follow Ann Rockley on Twitter

  • Of Interest

    New books in the 101 Series

    Following on the heels of the successful DITA 101: Fundamentals of DITA for Authors and Managers, we're working on three new titles.

    Steve Manning is writing Publishing DITA 101, Charles Cooper is writing Metadata 101 and Scott Able is writing Social Networking 101.

    Each of these will be an easy to read, yet focused introduction to their own topic areas.

    For more information, click here!

     

    DITA 101
    DITA 101 - Click for more info

    If you're in the process of implementing DITA, expect to do so in the future, or just want to learn more about it without having to wade through technical specifications, this is the book for you.

    DITA 101 is designed for authors and managers. We've taken our years' of experience helping organizations to move to DITA and training our clients in creating DITA content and distilled it into an easy to read and understand format. Combined with our expertise in developing effective reuse strategies and adopting content management, this book covers everything you need to know to understand DITA from an authors or managers viewpoint.

    Now available in hardcopy at at Amazon.com. Click here to order.

    Available in hardcopy and PDF format at Lulu.com. Click here to order.

    Not sure? Want a peek inside? To get a copy of Chapter 3, Reuse: Today's best practice, click here.

     

     

    Managing Enterprise Content: A Unified Content Strategy

    "This book is destined to become the industry standard for implementing content management"
    Judith L. Glick-Smith, Amazon reviewer

    Buy it...

    For more information see the book website.

  • About

    The Rockley Group is a content management consultancy with an international reputation for developing effective customer-centric content management strategies.

    This blog focuses on content component management; the tools and technology, best practices, structured content, information architecture, customer centric design and the user experience.

    The entries are written by the Rockley Group consultants and invited guest bloggers.

  • Admin