2008-01-20

Help needed with PDF Processing Library...

After two days of searching, clicking through hundreds of links, downloading tens of files, trying, browsing, etc, I decided it was about time to give up and ask for help!

I need to process some PDFs, mostly "black-and-whitening" them, that is, converting color boxes and text into black and white. It's not the same as "printing" in black and white though, as I want to replace boxes filled with some color with ones with no color fills and change the corresponding white or light colored text with black text.

Example:

From this:Source to this: Destination

I'd like to do this in Delphi, but I can do in a million other languages such as Perl, PHP, Python, Java, C# or whatever. I have one restriction though: the library has to be either free or open-source, but this restriction can be waived if there's nothing free or open-source that I can use to solve the problem...

Does anyone know of a library that will allow me to do that? Open a PDF file, loop through all the objects and change characteristics as I see fit, and finally saving the edited file

If you know such a library/tool, please leave a comment... Thank you.

4 comments:

Cycle said...

After 2 days of googling you surely already knew it, but here is my 2 cents:
I'm very happy with this PDF viewer
http://blog.kowalczyk.info/software/sumatrapdf/index.html
that uses this 2 libraries both opensource:
http://ccxvii.net/apparition/
http://poppler.freedesktop.org/

in particular, from the muPDF description:
MuPDF also has an API to modify internal objects in the PDF files and write PDF files.

and...

pdftool is a commandline demo of this functionality; it is a portable pdf swiss army knife for fixing broken pdf files, changing permissions, merging and extracting pages, and examining the internal object structure of a PDF file.

Hope this helps.

Cycle

Fernando Madruga said...

"After 2 days of googling you surely already knew it,"

Nope, I didn't! :)

Googling stuff is an art, not a science: sometimes you're inspired and get it right first time, some other times inspiration is not upon you and you just can't find stuff under your own nose! :)

Thanks for the links, I'll give it a try.

Yogi Yang said...

check ISEDQuickPDf it is a gem of a library and is writtent in Delphi itself. It is currently not actively developed but is yet very very usable.

Fernando Madruga said...

Thks Yogi, but that seems to be a very *weird* library to say the least!

The "official" web page is not working since 2005, apparently one can buy it through some back-doors and downloaded it through someone else... VERY, VERY dubious stuff there from a legal point of view. It may be the best library around, but this kills it... And in top of that, there's not even a trial version available!

Thanks anyway for the pointer, but this one is in a too much weird state to be worth investigating further...