2008-01-20

Help needed with PDF Processing Library...

After two days of searching, clicking through hundreds of links, downloading tens of files, trying, browsing, etc, I decided it was about time to give up and ask for help!

I need to process some PDFs, mostly "black-and-whitening" them, that is, converting color boxes and text into black and white. It's not the same as "printing" in black and white though, as I want to replace boxes filled with some color with ones with no color fills and change the corresponding white or light colored text with black text.

Example:

From this:Source to this: Destination

I'd like to do this in Delphi, but I can do in a million other languages such as Perl, PHP, Python, Java, C# or whatever. I have one restriction though: the library has to be either free or open-source, but this restriction can be waived if there's nothing free or open-source that I can use to solve the problem...

Does anyone know of a library that will allow me to do that? Open a PDF file, loop through all the objects and change characteristics as I see fit, and finally saving the edited file

If you know such a library/tool, please leave a comment... Thank you.

2007-12-10

Even slimmer Delphi for Win32 2007

Slim Following on previous posts, I decided to remove the Windows SDK documentation from my Delphi 2007 installation. The main drive was for the added space: with those 250 MB less I would be able to fit the whole C: into a compressed ghost image file with less than 2 GB and that's with Windows XP SP2 + Office 2003 + Delphi 2007 + quite a few more programs and updates.

Why is it so important to be < 2GB? Because I can easily upload that single file to any FTP server without worrying about the server or client being unable to FTP files bigger than 2GB; because I can copy that to any disk partition/external drive without worrying about the 2GB FAT 32 limit; because it's a single file to keep track of, without risking to have no backup at all just because somehow one of the multi-part files didn't get copied for whatever reason and finally, because I can now fit the file into a 2 GB usb flash drive.

So, rather than just delete the files, I decided to check out the Help registration files and modify them as well. I needed to modify 3 files: h2reg.ini, Master.HxT and RADStudioFilter.xml (get all 3 from here).

What did I gain?

  1. 250 MB less of disk space;
  2. Going from 14 to 4 seconds on the first help call of each session;
  3. Getting less extraneous results;
  4. Being able to now fit my whole C: drive onto a single < 2GB file.

What did I loose?

  1. Basically, the Dinkumware, C++, Delphi.NET and Windows Platform SDK help.

Of these, the most "troublesome" will be the PSDK, but that can be easily accessed online directly from MSDN Library, and has the advantage of being more up to date...

How did I do it:

  1. Go into your "%ProgramFiles%\CodeGear\RAD Studio\5.0\Help\Doc" folder;
  2. h2reg -u
  3. copy the supplied files over the ones with the same name there;
  4. h2reg -r (or run the install_and_view.cmd file that is already there)

Now you can delete the PSDK and Dinkumware folders.

If you want to "play safe", backup the whole DOC folder to a CD or external drive, should you want to go back. To undo, h2reg -u, replace the folder, run the install_and_view.cmd.

BTW: disregarding the .NET pre-reqs, Delphi 2007 for Win32 uses around 400 MB of my disk space now, which is just about right for me!

Get rid of another 343 MB...

If you installed the Help Update, be aware that you now have an extra cached (yes, duplicate!) 343 MB worth of data in C:\Documents and Settings\All Users\Application Data\{15EDF4CD-698A-4E52-8278-2E25143AD95B} (change to wherever you have your user profiles and I don't know where that is in Vista, but if you scan your HDD for the folder named {15EDF4CD-698A-4E52-8278-2E25143AD95B}, you should find out easy!

Yep, that's CG again thinking that they ought to know better than us, developers, and treating us like "dumb users" and caching stuff without even asking for our consent...

343MB

For those thinking "when will this guy stop complaining about this?": you know the answer! When CG stops DOING it! :) I can understand some lack of choices for "normal" software, where the users may need to be protected of themselves, but as a developer, I like to be in control of my machine, rather than have it control me, or in this case, rather than have software waste disk space for no good reason other than because someone was lazy when creating the installer...

Sure, disk space is cheap, but what happens when you try to BACKUP that "cheap" disk space? You're left with no other choice than backing up to ANOTHER disk which has huge drawbacks such as now allowing you to (easily) keep a backup outside your installation so you can quickly recover from a building fire or something... And most 1-man shops just can't afford a fire-proof vault that can keep data backups safe because those things are just too expensive, not to mention bulky and heavy...

EDIT (for those that don't read the comments!): Chris Pattinson, from CodeGear, warns about not being able to run future help patches and needing a full re-install should you delete this folder, so, you can do two things if you still need the space:

1) Back up the folder to a CD/other disk prior to removing it;

2) If you have already deleted the folder, simply install on a Virtual Machine and copy the folder from there. Or, should you have multiple machines with Delphi 2007, just copy the folder from another one. As long as it's added to the same place, you should be fine.

2007-11-29

Delphi 2007 SP3 - Some quirks

On my Help Tests, I ran around some minor quirks and I decided to blog about them as well! (Whoa: 3 whole posts in one day? Don't worry, that's probably 3 more months without posting so you'll have enough time to recover!)

Don't get me wrong, these are minor quirks, but it's also the 4th release of Delphi 2007 for Win32 (Release, SP1, SP2, SP3), so these should have been caught and dealt with by now, unless the quality control processes are seriously flawed... And before someone else goes "Oh, but the product is VERY good if this is all you can say about it", let me just point out that, no, this is not all (I've placed over 40 QC reports back when I did care about doing it), these are just those that immediately jump on you seconds after you start using the product, which doesn't give a good idea about the overall product quality... First impressions usually take longer to disappear...

1) Even selecting "Just me", the shortcut for the RAD Studio Documentation is installed for All Users.

In all honesty, I can't say whether this is a bug from the default SP3 install or caused by the Help Update. In either case, it should have been child's play to both detect and fix this, so there's no real excuse for letting something as simple as this slip through...

2) Minor toolbar sizing errors:
Toolbars

3) Personality Icon not showing (it appears to show only when a project is loaded, which is a bit odd for a single personality product)
Personality

4) Help Improve Visual Studio. WHY? If you buy a Volvo, will you get a form to fill out and return to FORD about how pleased you are with their engine? It doesn't make sense and for a team of developers, it shouldn't be hard to determine what registry key is needed to stop that from showing. Creates a wrong impression, when one does NOT buy Visual Studio and instead opts for buying a CodeGear product and then sees that "Help Improve Visual Studio"!

5) What's with this dull launch screen? So much space and the only thing that changes is a couple lines at the bottom? I used to like the previous launch screen better. Maybe it gets "filled" when you have some optional dotNET "modules", but as it is, it's plain dull...

Dull

Like I said above, all minor quirks, but also all first wrong impressions with a product...

Delphi 2007 Improved Help (?)

If you recall from previous posts (yes, I know it has been a while, but I've been busy!), I was frankly disappointed at the so-called "Improved Help".

The major flaws

Back on release of Delphi 2007 for Win32, the major flaws I found were the following (listed in no particular order, but numbered for easier reference):

  1. Crashing the IDE when requesting help on a menu;
  2. Failing to retrieve help for common Pascal/Delphi keywords;
  3. Giving "precedence" in searches to results pertaining to VB, VC++, Anything else under the moon and, in the last spot, Delphi;
  4. Taking up a huge amount of space with non Delphi for Win32 help contents;
  5. Using an Help engine that would consistently remain in memory leaking resources;
  6. Failure to retrieve help for components in forms or giving too many pointless options to choose from;

What has changed on SP3

#1 was solved with SP1, IIRC;

Help on End#2 Simple test program loaded from the demo ones. Pressed help on unit, interface, uses. All of these would pop a few options to choose from, but among those was the "correct" one. Then press F1 over type: only two options given and neither very useful... Class, procedure: again a few choices; private: a few choices too and among them the "good" one, but the one that I clicked first because it looked as the most promising, took me instead to the C++ reference section. No, that's not C++.net, but rather RAD Studio C++ reference. Odd for a "Delphi for Win32" product. Yes, I know that they have common bases for all their products, but that's THEIR choice and US, END-USERS, should NOT be bothered with that. If I'm programming in Delphi (Pascal), I should NOT need C++ help on my system and certainly not being offered that help when I press F1 on a Delphi keyword... The image on the right shows a list of the choices given when pressing F1 on end (but a similar one shows up for begin as well)...

#3 From the tests above, it appears that this has been nicely worked out: many VC++, VB or J# entries still show up, but usually towards the END of the list, rather than being at the top of it.

#4 remains the same and I don't expect it to change seeing that it's a common base for their Win32+dotNET products;

#5 I couldn't get a single DEXPLORER instance to stay in memory. Maybe the conditions that cause that are rather peculiar, but in all the open/closes I did, not once did I see it left or even another instance being loaded while another one was already in memory.

#6 I had to try REALLY hard to find a component that would not get me to that component's reference help. Almost every single component, from those in a form, to even those in the component selector, got me to the proper page after pressing F1 with no further questions asked.

Conclusions

1) Definitely a much welcome improvement, but still needs a lot of work. It's sad that 9 months after the initial release that touted "Improved Help" as one of the key factors for purchasing the new version, it still fails to live to that promise. Maybe by the time they get to Delphi 2008 the help is then at the standards they said it was back on release of Delphi 2007. Let's just hope they keep improving it, because it certainly looks as they still have a lot of work to do, but as long as they keep doing it, maybe in the future we will get a decent help!

2) There are still quite a few "place-holders" like the one in this example:
Place Holder

3) I was, towards the end of my tests, prompted with this:
Local Help

I opted to use Local Help as the primary source and haven't tested with the default option given as shown above. Seemed to work pretty much the same but only did a few more tests.

Delphi 2007 for Win32 SP3

This week I've been a bit sick. Nothing serious, just a nasty cold with the nasty side effect of going through paper tissues like there was no tomorrow.

So, rather than trying really hard to stay focused on debugging tasks at hand when I had to interrupt every 30 seconds to wipe my nose, I decided it was about time to do something I've been wanting to do for a long time and just couldn't afford the time to do it: upgrade my Delphi 2007 SP1 with both the SP3 and the improved help files that were released after that. That's something that doesn't require much concentration and is "compatible" with using paper tissues every 30 seconds...

So, I start by going to the Delphi registered user's downloads and download two files:

Armed with those files, I reset my computer to a ghost file with nothing but Windows XP SP2 + patches + drivers. After a few hours of installing more patches and all the software I use *except* Delphi (a process that is now much faster as I keep many utilities pre-installed on my D:\Utils folder), it was time to tackle the main procedure: Install Delphi 2007 SP3.

So, I start by creating a new ghost image to have a more recent "fall-back" should something fail, and I mount the ISO on a Virtual CloneDrive.

My first impression is not good when I double-click the drive:

Error01 Error02 Error03

I then proceed to open the drive instead and manually click the install file which brings me to this:

InstallLauncher

Pre-Reqs

Pre-Reqs Disk Space From this point on, everything runs much smoother: I proceed to install the pre-reqs, which I intentionally had not installed before, and then a reboot is in order.

The pre-reqs took an astonishing 2 GB of my C: drive as you can see from the image on the left. After closer scrutiny, you can easily see that it still suffers from the same "bugs" as the original install, namely, caching the same install files in several different places. That's 1.1 GB worth of .NET SDK install files in those two folders marked in the image.

This whole process did complete without any problems and within approximately 10 minutes, after which I had to reboot to proceed with the install.

 

 

Install

Delphi Disk Space Then, on to installing Delphi 2007 for Win32 SP3. After inputting my registration data, I selected to install everything available to my PRO SKU, including Rave Reports. The whole process took around 15 minutes and again proceeded with no problems.

As you can see, Delphi itself, discounting the pre-reqs, requires around 1.5 GB in the Professional Win32 SKU. That's roughly 450 MB for cached install files, 340 MB for help files and 690 MB for CodeGear files in either the program folder or common files.

I could have ran the IDE, but instead I opted for installing the Help Update. The one minor quirk about it, is that you need to go to a command line and type HelpSetup /upgrade. But the upgrade ran smoothly and this time I didn't time it, but it was around 10 or 15 minutes. Not worth another space screenshot as it only differed in a couple MB.

 

Full space report

Full  Disk Space To the left you can find a FULL disk space report before installing even the pre-reqs and after installing Delphi and the help update. As you can see, that's 3.7 GB worth of stuff, from pre-reqs to cached files to more cached files to the proper program files. Not really an improvement over the original setup, IIRC... However, since I believed that at least 1.1 GB of this can be safely shaved, that's what I'll did next. So, I removed the first of the two folders marked in the Pre-Reqs image above and was surprised to find out that the 2nd one was no longer there. However, the total space used was not consistent with that 2nd folder being removed, so after a quick search, I found out that it had been instead moved into my own Local Settings rather than the All Users' one. That's consistent with the choice I was given of installing for myself only (which I chose), or installing for all users. So, I deleted that 2nd folder and then ran Windows Update to download the latest .NET 2 security updates. As expected, it worked flawlessly, and I don't expect to be running into problems by not having those files around.  Of course, Windows Update doesn't update the SDK itself, so this was not a big test. But if you want to play safe, just burn those two folders to a DVD should they be required at some upgrade point in the future...

So, the (current) final space used is down to 2.5 GB, but that's without having yet followed my own guide on how to trim some more MBs out of it. That's intentional, as I want to test the help file and see if it has been REALLY improved without using my own tweaks. That will be the subject of my next post which should be posted either later today or tomorrow, if all goes as planned.

Conclusion

The whole process ran very smoothly, even if CodeGear still thinks that, just because disk space is cheap, they can waste as much as they want. Sure, disk space IS cheap, but if you want to create a ghost image of your main working drive and burn that to a DVD, having 2 GB worth of data or having 2 GB worth of data + 2 GB of worthless cached files DOES make a big difference in making it all fit onto a DVD or not. The way I do it, if anything other than an hardware failure goes very wrong, I can be up and running with a working configuration by popping a DVD into my drive and restoring my system to a full working condition in under 20 minutes, instead of re-installing everything for the better part of 1 or 2 days. The same problem applies to creating and backing up a Virtual Machine with a full working install, so, please CodeGear, stop wasting one's disk space just because it's "apparently" cheap!

2007-09-24

Windows Live Writer and Pascal Code Formatting (updated)

Since Steve Dunn has updated his WLW add-in, I checked it and found out that it still doesn't use the fixed pascal definition file that I sent Actipro some months ago.

So, if you want proper pascal syntax highlighting, download his control, follow his instructions and afterwards copy this file over the same named one in the languages folder.

Among other fixes that I recall are some missing keywords (unit for instance) and support for // comments and proper nested comment / string support.