Skip navigation.
KDE Developer's Journals

Researching the state of PDF manipulation tools in the world of Free Software (1)

pipitas's picture

Readers of my blog will know it already: Linux printing is geared to move towards PDF to make it its core spooling and job processing format. (This won't happen over night, and this won't make PostScript printing any harder, so don't worry). That was what the overall consensus was at last year's Linux Desktop Printing Summit in Atlanta, where developers from CUPS, Linuxprinting.org, FreeStandards.org, Freedesktop.org, OpenPrinting.org, OpenUsability.org, Ghostscript, Scribus, KDE, Gnome, Redhat, SUSE, Ricoh, Lanier, HP, Xerox, IBM, Mandriva, Debian, Mozilla and Sun sat together for 3 days, exchanged ideas and discussed how to move forward.

PDF is in some respects the blood child of PostScript anyway. The format has been developed by the same company, Adobe, and it is based on the same graphics and imaging model as PostScript is. PDF though, has been stripped off the features that make PostScript to be a fully-fledged programming language.

On the other hand, PDF's handling of advanced graphic objects, of fonts, of colors, of layers and of transparencies got very much fine-tuned over time.

The internals of a PDF file are quite complicated. The current PDF specification document encompasses 1200 pages (...of PDF, what else?). A PDF is not something that you can simply manipulate at will with a text editor, as much hacker as you may be. Well, PDFs where designed to be un-editable in the first place. They should pin down the page images they represent in a way that makes them print and view on screen in an excellent measure of high fidelity across different devices and computers and operating systems.

That design goal was ... hmm, not entirely reached in practice, as every Prepress professional will tell you. PDF file processing *still* requires a highly specialized knowledge, and a set of rules to be followed in order to make the complete professional printing process chain (from the designer of a page working on a Mac, to the print engineer overviewing a highspeed digital offset press) work reliably: let colors match exactly the shade and tone they are intended to match, and let the fonts look like they should.

So, in practice, the Prepress and DTP people in the industry *do* have an assortment of highly specialized tools that *can* lift the restrictions. They routinely open and manipulate PDF files to repair things that may prevent them printing as exactly as is needed: exchange fonts, remove shapes from individual pages, remove layers, correct typos and what not.

Anyway, as I already said above: PDF file internals are not straight-forward. They are in no way like ASCII text files (rather "flat"), or like XML files (more like "trees") -- they are organized in various elements that do reference each other, and they contain "streams" as specific parts which may discribe various graphic objects that are represented in the file. Even a "simple" PDF viewer is not easy to create. Let alone tool to manipulate a PDF without damaging its integrity....

Now, we don't have many (or even any) Free+free tools for that task yet, have we? The utilities to access a PDF in the way described above (an operation of its pumping heart, so to speak) are by and large only available for Mac OS X and MS Windows -- and they are rather expensive. We, in the FOSS world, can extract pages from a PDF, yes. Ghostscript can convert PDFs into different formats. pdftk can do quite some things in merging PDFs and adding a watermarks to its pages. But that's it. No changing of strings. No change of fonts. For users, no handling of layers. No scaling of individual objects on an arbitrary page. No moving of pictures on a page from one place to the next. No rotating of text boxes. No filling in of forms. No digital signatures for document exchange. (I'm mixing a few different requirements here, and I'm neglecting some rudimentary beginnings of some developments as well.)

No easy to use toolkit for developers either, that allow the creation of high-quality PDF output...

I was not able to follow FOSS developments closely in the second half of last year, so I may have missed a lot of announcements and initiatives. So I decided to turn to that little "gg:pdf manipulation" trick of Konqui and find out. And boy, was I surprised.

There came up, finally, two hits that look extremely promising. I'll describe them in my next two blogs (I need time to go through my notes and do a proper writeup first. So stay tuned.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
pinotree's picture

PDFEdit

There's an application on kde-apps called PDFEdit (http://www.kde-apps.org/content/show.php?content=51831). It embeds xpdf internally, plus some code from KPDF (iirc) and it can do some basics operation.
Maybe it should be worth asking the author to work in poppler, too.

martin's picture

Moving, resizing, cropping

A simpler tool than what you describe, that also seems to be missing, is one that "just" lets you move page contents around, choose the paper size of the printed page, resize and crop pages (with ease and precision). See Xerox FreeFlow Makeready.

pipitas's picture

Moving, resizing, cropping I agree, some of

Martin,

I agree, some of "simpler" manipulation tools are also missing.

  • move page contents around: I assume you mean the content of a *complete* page, in terms of creating a new layout? Something that is a bit more complicated than re-ordering of pages, and number-up imposition?
  • choose the papersize of the printed page: that can (to a degree) already be done by CUPS. Use this commandline:
        lp -d printername -o fitplot -o media=Custom.17x24cm /path/to/PDF.pdf
    The trick is "fitplot" which tells CUPS to scale [up or down] all page contents to fill the available "ImageableArea" for that size. The "Custom.17x24cm" example is to show you that you can define (and use) any paper size in CUPS (if your printer supports it).
  • resize and crop pages (with ease and precision): surely missing, esp. the "ease and precision" part

Thanks for the hint to the Xerox tool. I've heard of it, but never had a chance for a closer look. Will try to do so soonish...

martin's picture

tasks

Yes, I mean the contents of an entire page. The basic work flow and operations I have in mind are:

1) Load one or more PDF:s
2) Define a page range ("odd pages 5-47")
3) Tweak for this page range

  • Affine transform (translation, scaling, rotation)
  • Cropping
  • The media size

4) Display the results on a "light table"
5) Repeat from 2) until happy
6) Output a new PDF

A "light table" view allows you to see the alignment on paper of odd and even pages; is the right margin on odd pages equal to the left margin on even pages? An alternative is to have draggable guides that are "reflected" across that page (appear both on the left and on the right).

The CUPS "fitplot" command is nice, but it does not give you much control. Something like it could be integrated into a hypothetical "KDE MakeFit" tool, though, to optionally provide a starting point for fitting the contents to the page.

jamesots's picture

PDF Knowledge

By the way, in my last job (which I just left) I developed a PDF manipulation tool in Java, so I know the spec pretty much inside out, so I might be useful in helping with PDF stuff in Linux.

pipitas's picture

PDF Knowledge

jamesots,

that's really, really great! Looking at your nick, I assume your real name is easy to guess Smiling

We definitely need people who are willing to contribute this type of knowhow to Linux and to KDE.

In your case, does this also include a more intimate knowledge about job ticketing (for printing)?

Cheers,
Kurt

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.