The project to devise a way to clean up the initial scans of a box of slides is described, along with the sometimes inordinate dif ficulties encountered before success was achieved. In summary, it‘s easy when you know how.


What needs to be done

After processing a box of slides, we have a set of images with all the imperfections of the scanning intact; the scanned part of the mount is dark, but definitely not black, and the ragged edge of the mount (if cardboard) seems much more conspicuous on the computer screen than it did when the slides were projected. It’s time to set some image processing software to work. We need to be able to do at least the following:


Choosing the software

Candidates for the job are, in the closed-source world, Photoshop, and in the open-source world, the GIMP. The former gets ruled out early because of its high price. By all accounts, it’s a substantial piece of work, but I don’t want to spend that much money to find out down the road that it doesn’t suit me. Anyway, all I have is Windows 98, where recent versions won’t run. So it’s the GIMP that is going to be looked at, and I don’t need to do anything to get hold of it; it comes with most major Linux distros. That means there are four versions on my multiboot machine at the time… version 1.2.3 on Red Hat 9, version 2.2.13 on SuSE 10.2, version 2.4.0-rc3 on Kubuntu 7.10 and version 2.6.1 on Ubuntu 8.10.

The obvious one to use would be the latest version, but there is a problem. Ubuntu 8.10 isn’t working that well for me. The GUI goes down sporadically, leaving only the mouse pointer reacting; the keyboard is disabled, and clicking doesn’t work any more. I have to go to another machine and do a remote shutdown. Out of the other distros, version 1.2.3 can be dismissed because its scripting has no file-glob procedure, so processing of multiple files is impossible. Also, it pops up no less than six windows when activated on an image (…what were they thinking of?). 2.2.13 has gotten over that, but the version of scheme is old, and has some technical details I don’t fully understand. So 2.4.0-rc3 it’s going to be. Kubuntu 7.10 is my workhorse distro, the only one I have had up to that point which does everything I need to do, and never crashes. It’s running more than ninety per cent of the time, so I won’t have to reboot.

For documentation, we have the in-program help, the GIMP project’s Web documentation and tutorials and whatever else we can find on the Net.

It is the work of a few minutes to fire up GIMP 2.4.0-rc3 and verify that the first three requirements in the task list above are satisfied. You can do all kinds of selections within the image, you can “bucket fill” black into a selected area and you can store text wherever you want. In fact, using the GIMP interactively is quite impressive, if not very intuitive. Evidently a lot of work has been put in to provide a huge array of features. What about batch processing, though?


Initial Scripting Attempts

We need first of all to get to know the scripting language. Go to the the GIMP Web tutorials to see what help there is. Sure enough, there is something, though meagre. Two examples only, the second being a example of how to do the first repeatedly over a set of images. That’s the one of interest. Here it is, copied verbatim from their site…

  (define (batch-unsharp-mask pattern
  (let* ((filelist (cadr (file-glob pattern 1))))
    (while (not (null? filelist))
           (let* ((filename (car filelist))
                  (image (car (gimp-file-load RUN-NONINTERACTIVE
                                              filename filename)))
                  (drawable (car (gimp-image-get-active-layer image))))
             (plug-in-unsharp-mask RUN-NONINTERACTIVE
                                   image drawable radius amount threshold)
             (gimp-file-save RUN-NONINTERACTIVE
                             image drawable filename filename)
             (gimp-image-delete image))
           (set! filelist (cdr filelist)))))

Good, let’s get hold of that second one and run it. That means copying it into the GIMP’s script directory under the user’s home directory so that the GIMP can “see” it when it starts up.

The language in which the script is written is Scheme, a close relative of LISP. LISP is a relatively technical language, popular among academics and software theoreticians because they can prove theorems about it. The choice of a language like this for scripting closes off the writing of scripts to all but a tiny minority of prospective GIMP batch users in my opinion. All the same, taking a closer look at the script, using a bit of imagination and some familiarity with Unix concepts such as “globbing”, an only moderately software-savvy person can guess that file-glob is going to give us some kind of list of files matching a pattern like "*.png", then filelist will hold this list and the while loop will process one file, chop it off the list and continue until the list is empty. So… go to a directory with some images in it and tell the GIMP to execute the command exactly as given on the Web page. Oops.

> gimp -i -b '(batch-unsharp-mask "*.png" 5.0 0.5 0)' -b '(gimp-quit 0)'
batch command: experienced an execution error.

Very helpful, eh? Well, there is a script-fu “console” which can be called up from the GIMP itself, and where the error reporting is said to be better, so fire up the console, dig the main command out of the first pair of quotes above, and enter it. Result:

> (batch-unsharp-mask "*.png" 5.0 0.5 0)
Error: car: argument 1 must be: pair 

Which “car”? There are lots of them. Here we go again… start removing parts of the job until the error goes away. Well, the end result of a lot of groping around can be summarised by the following command results on the console, starting with the file-glob excerpt from the script:

> (file-glob "*.jpg" 1)
(2 #( "Richq.jpg" "GateFS.jpg" ))

> (cadr (file-glob "*.jpg" 1))
#( "Richq.jpg" "GateFS.jpg" )           <-- filelist gets this

> (car (cadr (file-glob "*.jpg" 1)))
Error: car: argument 1 must be: pair    <-- (car filelist) fails

So file-glob is returning a list, but not directly of matching file names. Rather, it is a list containing a count and then another list, this one with the file names. But… I thought that lists were defined in Scheme by (item1 item2 …). What is this hash character? More groping around with Google, unproductive because the word “hash” generates too many false positives. Finally I got lucky with a site which talks about vectors, of whose existence I was unaware. The author cleverly hid from my searches by referring to # as a “mesh” character! Now we can see what is happening. file-glob returns a list; cadr returns the item which at the head of the rest of that list, namely the thing with the hash, and that’s not a list, but a vector. car functions expect a list, so the one inside the while loop fails as above.


Help Wanted

So now I know why the script doesn’t work. But in another sense, I don’t know at all. Why does the flagship example of a batch processing script given on the GIMP’s own Web site not work? A little more research with Google reveals the truth: it did work on previous releases of the GIMP, but I’ve got version 2.4.0-rc3, which came with the distro I’m currently using. As is evident from the version, it is not a final release, but “release candidate 3”, and somebody did something to file-glob, so that it returns a vector instead of a list. Nobody updated the example batch script, or at least made a remark on the Web page to warn users, and my distro has this as the latest version in its repository. Thus the waste of a couple of days.

The procedure browser provided is a good idea, sadly let down by failure to carry things through. Some of the descriptions contain useful information, but there are others like that for gimp-floating-sel-rigor. Its action is described as “Rigor the floating selection”. Its two parameters are listed, then the Additional Information is “This procedure rigors the floating selection”. Aha, so that’s what it does! The “additional information” sections are often fatuous anyway. Here are a few examples (in their individual totality):

The Script-Fu Console has its own help button. It leads to a page in the bundled GIMP help file entitled "Appendix D. Eeek! There is missing help", and exhorting you to "feel free to join us and fill the gap by writing documentation for the GIMP." Right... feel free to wash up and put out the empties after we’ve drunk all the beer.

This is sadly typical of the Open Source world in my experience.

Faced with the above tribulations, the instinct is to start Googling. That leads mostly nowhere either. The problem is best illustrated with an example. Let’s pick a procedure at random, say gimp-layer-add-mask, (which is actually better documented than some), and search for occurrences on the Web. Running Google with just the name as above gives 517 results (20 Jan 2009).

Another thing which held me up was that the Nikon software includes two images inside each TIFF file, the real one and a thumbnail, and my script was by default returning me the thumbnail. By the time I had guessed how to specify the layer with the real image, I realised that the image-get-layers procedure was returning a vector too, although the procedure browser entry talks of a “list”. Perhaps every function which returns a vector once returned a list? No, a little research with the Script-fu Console in older GIMP versions shows that image-get-layers at least used to return something like (2 #(3 2)#2"0300"). Eh?? Oh hell, let’s not go there…


Getting it working

Anyway, you have somehow finally worked out enough about some procedures to compose a first script, so you run it. In the normal course of events, it fails the first time. If you are a little lucky, the message may tell you that, e.g., you tried to do something with a layer that doesn’t exist, and it may helpfully suggest that, in this case, the reason could because the image was deleted, but most of the time what you see is:
batch command: experienced an execution error.
Not a word about what happened, or where. There’s only one way to cope with this… creep up on your solution by making the smallest changes possible between tests. That way you know exactly what little thing it was that stopped your script working.

Of course, just understanding the little example given above doesn’t mean you are able to write Scheme. The lightweight introduction to writing scripts given in the GIMP online manual is nowhere near enough; string manipulation is completely absent, for example. I was trying to guess the names of functions like string-append until I stumbled on the MIT Scheme Reference. This references a much broader implementation of Scheme, but the string manipulation and conversion routines it lists are defined in the GIMP Scheme specification also. Once I had found this, the way was open for processing file names the way I needed to.

It would be tedious to list all the steps on the journey to a working script; a lot of trial and error was needed. I was struck by the number of times when I thought I understood a procedure, only to get an unexpected result when I used it. This wasn’t entirely due to native stupidity; many of the GIMP’s modes of operation in conventional interactive use are less than intuitive. For example, the operator actions required to delete a layer’s transparency channel are hardly obvious, and this translates directly into the odd-looking procedure calls used for the purpose in the script. In the end, it wasn’t cleverness that was needed to complete the job, but patience.

With the script at last polished off and comprehensively tested on the first version of the GIMP, it’s time to try to make it useful to the maximum number of people by seeing if it works on some other versions. Reboot into Ubuntu 8.10 with the significantly newer version 2.6.1, run the script on the same collection of test images and…
batch command: experienced an execution error.
Groan. Are we going to start all over again? But luckily the first thing I try is to run file-glob in the GIMP console, and bingo!, it’s once again returning a list. A quick adjustment to the script to cope with both cases and, mirabile dictu, it works! A positive software experience for once.