Wednesday, 16 May 2018

Automatic processing of Imatest charts

It turns out that people sometimes want to process an image of an Imatest SFRplus type chart in MTF Mapper. Or at the very least, I have received requests about this.

When I first released MTF Mapper, I took it as a given that people would just print out the MTF Mapper charts if they wanted to use MTF Mapper. In the meantime I have gotten wiser, and I now know how hard (or expensive?) it is to print high-quality test charts. So it actually makes perfect sense to use a good quality chart that you already own (e.g., an SFRplus chart) with MTF Mapper.

Unfortunately the design of the SFRplus-style charts includes a black bar that runs through the top of the top row of target squares, like in this example:
An example of an SFRplus chart. I blatantly copied this example from Imatest's website (please don't sue me).

This black bar causes MTF Mapper to see the entire top row of squares plus the black bar as a single object, and since this compound object does not resemble a square, it ignores it. The same thing happens with the black bar at the bottom of the chart. As a result, MTF Mapper only processes the interior squares, like so:
Ignore the actual MTF50 values, but do note that only the interior square targets were detected automatically

Other than the obvious spurious detections on the non-target squares (which you can cover up with post-its or such if necessary) the output is usable, but you lose the top and bottom rows of squares, which is not ideal.

A simple solution is to just crop your image to exclude the bars, and then to process the cropped image with MTF Mapper's "-b" option. This works, but it is rather clunky. So I added a convenience feature that will do this automatically.

You can choose the new File/Open Imatest image(s) option in the GUI, or you can add the --imatest-chart option if you use the command-line interface. Because the gray target squares of the SFRplus chart cover more of the white background than a typical MTF Mapper chart, you probably have to adjust the "Threshold" value (-t on the CLI, or under Settings/Preferences/Advanced in the GUI) a little to detect all the target squares. The default Threshold is 0.55, and bumping it down to 0.4 should work a little better. For our test image above, we then get this:
Much better; all target squares detected
Note that the top edges of the top row of squares are tagged with "N/A" rather than MTF50 values; this is just MTF Mapper's way to indicate that these edges do not represent valid measurements. If you are parsing the "edge_mtf_values.txt" file produced by the "-q" output option, these edges will have an MTF50 value of 1.0 (which is an impossible / invalid MTF50 value). Or you could identify them by their pixel coordinates, which is probably the better way.

This feature is available from MTF Mapper 0.6.21 onward.

Sunday, 13 May 2018

Cropped single edge images now handled more elegantly in the GUI

A while back I wrote a post that described the "--single-roi" option of MTF Mapper. The "--single-roi" mode specifically allows you to feed MTF Mapper with images that have already been cropped to contain only a single edge, i.e., they look like this:

From MTF Mapper 0.6.20 onwards, you can now use the menu option File/Open single edge image(s) to load images that look like the one pictured above. Note that by opening an image using this new menu option MTF Mapper will automatically enable the outputs that make sense (Annotated image output, with the ability to click on edges to view the SFR curve), while silently ignoring all the output types that do not make sense if you only have one edge.

Monday, 19 February 2018

Journal paper on MTF Mapper's deferred slanted edge analysis algorithm now available!

A new paper on MTF Mapper's deferred slanted edge analysis algorithm has been published in the Optical Society of America's JOSA A journal. The paper describes one of the methods that MTF Mapper can use to compensate for radial lens distortion. The paper also covers the technique that MTF Mapper uses to process Bayer CFA subsets, e.g., when processing just the green Bayer channel of a raw mosaiced image.

The full reference:
F. van den Bergh, Deferred slanted-edge analysis: a unified approach to spatial frequency response measurement on distorted images and color filter array subsets, Journal of the Optical Society of America A, Vol. 35, Issue 3, pp. 442-451 (2018).

You can see the on-line abstract here. The full article is paywalled, but if you contact me by email I can send you an alternative document that covers  the same topic. I will probably post some articles on this topic here on the blog sometime too.

Saturday, 10 February 2018

Improved user documentation

If you are a long-time MTF Mapper user, then you are probably just as surprised by this unexpected turn of events as I am, but I have updated the docs. Actually, it gets better: I completely rewrote the user documentation to produce the new and improved (yes, it is new, and yes, it is an improvement on the old docs) MTF Mapper user guide.

You can grab a PDF copy of the user guide here. I have discovered a way to produce a decent-looking HTML version of the PDF documentation; by selecting Help in the GUI (MTF Mapper version 0.6.14 and later) you should see a copy of the user guide open in your system web browser. This is probably a better way to ensure that you are reading the latest version of the user guide.

I have tried to make the user guide more task-focused so that new users will be able to have a better idea of what they can do with MTF Mapper, as well as how they can get started. However, it took me about two weeks to write the new user guide, and it weighs in at 50+ pages, so it is probably still a little intimidating at first glance. If you are a new user, and you have any suggestions on ways in which I can improve the user guide, please let me know.

Unfortunately, even 50+ pages are not enough to really cover all the functionality, and the sometimes non-intuitive behaviour, of MTF Mapper so there is still some work to be done.

Thursday, 8 February 2018

Device profile support

From version 0.6.16 onwards, MTF Mapper now supports device profiles embedded in input image files. If you normally feed MTF Mapper with raw camera files (via the GUI), then this new feature will not affect you in any way.

If you have been feeding MTF Mapper with JPEG files, or perhaps TIFF files produced by your preferred raw converter, then this new feature could have a meaningful impact on your results. You can jump ahead to the section on backwards compatibility if you want the low-down.

To explain what device profiles are, and why they affect MTF Mapper, I first have to explain what linear light is.

Linear light

At the sensor level we can assume that both CCD and CMOS sensors have a linear response, meaning that a linear increase in the amount of light falling on the sensor will produce a linear increase in the digital numbers (DNs) we read from the sensor. This is true up to the point where the sensor starts to saturate, where the response is likely to become noticeably non-linear.

The slanted-edge method at the heart of MTF Mapper expects that the image intensity values (DNs) are linear. If your intensity values are not linear, then the resulting SFR you produce using the slanted-edge method is incorrect.

Gamma and Tone Reproduction Curves

Rather than exploring the history of gamma correction in great detail, I'll try to summarize: Back in the day of Cathode Ray Tube (CRT) displays they found that a linear change in the signal (voltage) sent to the tube did not produce a linear change in the display brightness. If you were to produce a graph of the input signal vs brightness, you would obtain something that looks like this:
Figure 1: A non-linear display response
If you fast-forward to the digital era, you can see how this non-linearity in the brightness response can become rather tiresome if you want to display, say, a grayscale image. If you took a linear light image from a CCD sensor, which we assume produced linear values in the range 0 to 255, and put that in the display adaptor frame buffer, then your image would appear to be too dark. The solution was to pre-distort the image with the inverse of the non-linear response of the display, i.e., using this function:
Figure 2: The inverse of Figure 1
If you take a linear signal and apply the inverse curve of Figure 2, then take the result and apply the non-linear display response curve of Figure 1, you end up with a linear signal again. The process of pre-distorting the digital image intensity values to compensate for the non-linear display response is called gamma correction.

This scheme worked well enough when you knew what the approximate non-linear display response was: on PCs the curve could be approximated as f(x) = x2.2. Things became a lot more complicated when you wanted to display an image on different platforms, for example, early Macintosh systems were characterized by a gamma of 1.8, i.e., f(x) = x1.8.

The solution the platform interoperability problem was to attach some metadata to your image to clearly state whether the digital numbers were referring to linear light intensity values, or whether they were pre-distorted non-linear values chosen to produce a perceived linear brightness image on a display. So how do you specify what your actual image represents? Do you specify the properties of the non-linear pre-distortion you applied to produce the values found in the image file, or do you instead specify the properties of the display device for which you image has been corrected? It turns out that both strategies were followed, with the PNG specification choosing the former, and most of the other formats (including ICC profiles) choosing the latter.

To cut to the chase: One important component of a device profile is its Tone Reproduction Curve (TRC); Figure 1 can be considered to be a TRC of our hypothetical CRT display device. Depending on the metadata format, you can either provide a single gamma parameter (γ) to describe the TRC as f(x) = xγ, or you can provide a look-up table to describe the TRC.

Colour spaces

The other component of a device profile that potentially has an impact on how MTF Mapper operates is the definition of the colour space. The colour space matters because MTF Mapper transforms an RGB input image to a luminance image automatically. The main reason for this is that the slanted-edge method does not naturally apply to colour images; you have to choose to either apply it to each of the R, G and B channels separately, or you have to synthesize a single grayscale image from the colour channels. For MTF Mapper I chose the luminance (Y) component of the image, as represented in the CIE XYZ colour space, because this luminance component should correlate well with how our human vision system perceives detail.

So what is a colour space? To keep the explanation simple(r), I will just consider tristimulus colour spaces; in practice, those that describe a colour as a combination of three primary colours such as RGB (Red, Green, and Blue). Now consider the digital numbers associated with a given pixel in a linear 8-bit RGB colour space, e.g., (0, 255, 0) would represent a green pixel. Sounds straightforwards, right? The catch is that we have not defined what we mean by "green". Two common RGB colour spaces that we encounter are sRGB and Adobe RGB; they have slightly different TRCs, but here we focus on their colour spaces. The difference between sRGB and Adobe RGB is that they (literally) have different definitions of the colour "green": Our "green" pixel with linear RGB values (0, 255, 0) in the sRGB colour space would have the values (73, 255, 10) in the linear Adobe RGB colour space because the Adobe RGB colour space uses a different green primary compared to sRGB. Note that the actual colour we wanted has not changed, but the internal representation has changed.

The nub of the matter is that our image file may contain the value (0, 255, 0), but we only really know what colour that refers to once we know in which colour space we are working. I hope you can see the parallel to the TRC discussion above: the image file contains some numbers, but we really do have to know how to interpret these numbers if we want consistent results.

ICC profiles

A valid ICC profile always contains both the TRC information and the colour space information that MTF Mapper requires to produce a linear luminance grayscale image. In fact, the ICC profile contains the matrix that tells use how to transform linear RGB values into CIE XYZ values adapted for D50 illumination (that is very convenient if you do not want to get into chromatic adaptation).

So if you provide MTF Mapper with a TIFF, PNG* or JPEG file with an embedded ICC profile, you can be sure that the resulting synthesized grayscale image will be essentially identical regardless of which colour profile you saved your image in.
*MTF Mapper 0.6.17 and later.

JPEG/Exif files

If your JPEG file has no ICC profile, and no Exif metadata, then MTF Mapper will just assume that your image is encoded in the sRGB profile. Most cameras appear to at least add Exif metadata, but that only helps a little bit, since the Exif standard only really has a definitive way of indicating that the image is encoded in an sRGB profile. If your JPEG file is encoded in the Adobe RGB space (most DSLRs allow you to configure the JPEG output this way), then MTF Mapper will try to infer this from the clues in the Exif data. 

MTF Mapper will use the appropriate TRC (either sRGB, or Adobe RGB), and the appropriate D50-adapted RGB-to-XYZ matrix will be selected for the luminance conversion.

Backwards compatibility (or the lack thereof)

Unfortunately the addition of device profile support has encouraged me to change the way in which JPEG files are converted to grayscale luminance images. In MTF Mapper version 0.6.14 and earlier, the JPEG RGB-to-YCrCb conversion was used to obtain a luminance image; from version 0.6.16 onwards the device profile conversion is used. In practice, this means that older versions would use
Y = 0.299R + 0.587G + 0.114B
regardless of whether the JPEG file was encoded in sRGB or Adobe RGB, which is clearly incorrect in the case of Adobe RGB files (regardless of the TRC differences). A typical weighting for an sRGB profile in version 0.6.16 and later would be
Y = 0.223R + 0.717G + 0.061B.

The practical implication is that results derived from JPEG files will be different between versions <= 0.6.14 and 0.6.16 and later. Figure 3 illustrates this difference on a sample sRGB JPEG file. To make matters worse, the difference will be exacerbated by lenses with significant chromatic aberration because the relative weight of the RGB channels have changed.
Figure 3: SFR difference on the same edge of a JPEG image (Green is v0.6.14, Blue is v0.6.16)
In Figure 3 the difference is small, but noticeable. For example, when rounded to two decimal places this edge will display as an MTF50 of 0.19 c/p on version 0.6.16, but 0.20 c/p on version 0.6.14. I expect that there will be examples out there that will exhibit larger differences than what we see here, but I do not expect to see completely different SFR curves.

More importantly, MTF Mapper's behaviour regarding TIFF files has changed. In version 0.6.14 and earlier, all 8-bit input images were treated as if they were encoded in the sRGB profile; this probably produced the desired behaviour most of the time. If, however, a 16-bit TIFF file is used as input in version 0.6.14 and earlier, then MTF Mapper assumed the file contained linearly coded intensity values (i.e., gamma = 1.0). This behaviour worked fine if you used "dcraw -4 ..." to produce the TIFF (or .ppm) file, but would not work on 16-bit TIFF files produced by most raw converters or image editors. From version 0.6.16 onwards all TIFF files with embedded ICC profiles will work correctly, whether they are encoded in linear, sRGB, Adobe RGB or ProPhoto profiles. Figure 4 illustrates the difference on a 16-bit sRGB encoded TIFF file.
Figure 4: SFR difference on the same edge of a 16-bit sRGB TIFF image (Green is v0.6.14, Blue is v0.6.16)

In Figure 4 we see much larger differences between the SFR curves produced by versions 0.6.14 and 0.6.16; this larger difference is because 0.6.14 incorrectly interpreted the sRGB encoded (roughly gamma = 2.2) values as if they were linear.

One last important rule: MTF Mapper version 0.6.16 still interprets all other 16-bit input files without ICC profiles (PNG, PPM) as if they have a linear (gamma = 1.0) encoding.

Summary recommendations

Overall, you should now be able to obtain more consistent results with MTF Mapper now that it supports embedded device profiles. For best results, choose to embed an ICC profile if your raw converter or image editor supports it. Given a choice, I still recommend using raw camera files directly with the GUI, or doing the raw conversion using "dcraw -4 -T ..." to convert your raw files if you use the command-line interface.

Tuesday, 22 August 2017

A brief overview of lens distortion correction

Before I post an article on the details of MTF Mapper's automatic lens distortion correction features, I would like to describe in some detail the lens distortion model adopted by MTF Mapper.

The basics

Radial lens distortion is pretty much as the name suggests: the lens distorts the apparent radial position of an imaged point, relative to its ideal position predicted by the simple pinhole model. The pinhole model tells us that the position of a point in the scene, P(x, y, z) [assumed to be in the camera reference frame], is projected onto the image plane at position p(x, y) as governed by the focal length f, such that
    px = (Px - Cx) * f/(Pz - Cz)
    py = (Py - Cy) * f/(Pz - Cz)
where C(x, y, z) represents the centre of projection of the lens (i.e., the apex of the imaging cone).

We can express the point p(x, y) in polar coordinates as p(r, theta), where r2 = px2 + py2; the angle theta is dropped, since we assume that the radial distortion is symmetrical around the optical axis.

Given this description of the pinhole part of the camera model, we can then model the observed radial position rd as 
    rd = ru * F(ru)
where the function F() is some function that describes the distortion, and ru is the undistorted radial position, which we simply called "r" above in the pinhole model.

Popular choices of F() include:
  • Polynomial model (simplified version of Brown's model), with
    F(ru) = 1 + k1 * ru2 + k2 * ru4
  • Division model (extended version of Fitzgibbon's model), with
    F(ru) = 1 / (1 + k1 * ru2 + k2 * ru4)
Note that these models are really just simple approximations to the true radial distortion function of the lens; these simple models persist because they appear to be sufficiently good approximations for practical use.

I happen to prefer the division model, mostly because it is reported in the literature to perform slightly better than the polynomial model [1, 2].

Some examples of radial distortion

Now for some obligatory images of grid lines to illustrate the common types of radial lens distortion we are likely to encounter. First off, the undistorted grid:

Figure 1: What the grid should look like on a pinhole camera
Add some barrel distortion (k1 = -0.3, k2 = 0 using division model) to obtain this:
Figure 2: Barrel distortion, although I think "Surface of an inflated balloon distortion" would be more apt.
Note how the outer corners of our grid lines appear at positions closer to the centre than we saw in the undistorted grid. We can instead move those corners further outwards from where they were in the undistorted grid to obtain pincushion distortion (k1 = 0.3, k2 = 0 using division model):
Figure 3: Pincushion distortion, although I would prefer "inaccurate illustration of gravitationally-induced distortion in space-time".
If we combine these two main distortion types, we obtain moustache distortion (k1 = -1.0, k2 = 1.1 using division model):
Figure 4: Moustache distortion.
We can swap the order of the barrel and pincushion components to obtain another type of moustache distortion, although I do not know if any extant lenses actually exhibit this combination (k1 = 0.5, k2 = -0.5 using division model):
Figure 5: Alternative (inverted?) moustache distortion.

Quantifying distortion

Other than using the k1 and k2 parameters (which might be a bit hardcore for public consumption), how would we summarize both the type and the magnitude of a lens' radial distortion? It appears that this is more of a rhetorical question than we would like it to be. There are several metrics currently in use, most of them unsatisfying in some respect or another.

One of the most widely used metrics is SMIA "TV distortion", which expresses distortion as a percentage in accordance with the following diagram:
Figure 6: Slightly simplified SMIA TV distortion
The SMIA TV distortion metric is just 100*(A - B)/B. If the value is negative you have barrel distortion, and positive values imply pincushion distortion. If you have moustache distortion like shown in Figures 4 and 5, then you could very likely obtain a value of 0% distortion. Whoops!

I only show SMIA TV distortion here to make a concrete link to the k1 parameter, and to highlight that SMIA TV distortion is not useful in the face of moustache distortion.

Using the division model

There is one subtlety that is worth pondering a while: are we modelling the forward distortion, i.e, the distortion model maps our undistorted pinhole projected points to their distorted projected points, or are we modelling the reverse mapping, i.e., we model the correction required to map the distorted projected points to their undistorted pinhole projected points?

The important point to note is that neither the polynomial model, nor the division model, compels us to choose a specific direction, and both models can successfully be applied in either direction by simply swapping rd and ru in the equations above. I can think of two practical implications of choosing a specific direction:
  1. If we choose the forward direction (such as presented above in "The basics") where rd = ru * F(ru), then we must have a way of inverting the distortion if we want to correct an actual distorted image as received from the camera. If we undistort an entire image, then we would prefer to have an efficient implementation of the reverse mapping, i.e., we require an efficient inverse function F-1() so that we may calculate F-1(rd) = rd/ru. In this form it is not immediately clear that we can find a closed-form solution to the reverse mapping, and we may have to resort to an iterative method to effect the reverse mapping. Depending on how we plan to obtain our distortion coefficients k1 and k2, it may be that the forward distortion approach could be far more computationally costly than the reverse distortion approach. To summarize: inverting the distortion model for each pixel in the image can be costly.
  2. The process of estimating k1 and k2 typically involves a non-linear optimization process, which can be computationally costly if we have to compute the reverse mapping on a large number of points during each iteration of the optimization algorithm. I have a strong aversion to using an iterative approximation method inside of an iterative optimization process, since this is almost certainly going to be rather slow. To summarize: inverting the distortion model during non-linear optimization of k1 and k2 can be costly.
Just how costly is it to compute the undistorted points given the distorted points and a forward distortion model?
  • Polynomial model:
    rd = r_u * (1 + k1 * ru2 + k2 * ru4), or after collecting terms,
    ru + ru * k1 * ru2 + ru * k2 * ru4 - rd = 0
    k1 * ru3 + k2 * ru5 + ru - rd = 0
    Since we are given rd, we can compute potential solutions for ru by finding the roots of a 5th-order polynomial.
  • Division model:
    rd = ru / (1 + k1 * ru2 + k2 * ru4), or
    rd * k1 * ru2 + rd * k2 * ru4 - ru + rd = 0
    This looks similar to the polynomial model, but at least we only have to find the roots of a 4th-order polynomial, which we can do using Ferrari's formula because the ru3 term has already been deflated.

In both cases we have to find the all the roots, including the complex ones, and then choose the appropriate real root to obtain ru given rd (I assume here that the distortion is invertible, which we can enforce in practice by constraining k1 and k2 as proposed by Santana-Cedres et al. [3]).
Alternatively, we could try a fixed-point iteration scheme, i.e., initially guess that ru = rd, substitute this into the equation ru  = rd / F(ru) to obtain a new estimate of ru, rinse and repeat until convergence (this is what OpenCV does). Both of these approaches are far too computationally demanding to calculate for every pixel in the image, so it would appear that we would be better off by estimating the reverse distortion model.

But there is a trick that we can employ to speed up the process considerably. First, we note that our normalized distorted radial values are in the range [0, 1], if we normalize such that the corner points of our image have r = 1, and the image centre has r = 0. Because the interval is closed, it is straightforward to construct a look-up table to give us ru for a given rd, using, for example, the root-finding solutions above. If we construct our look-up table such that rd is sampled with a uniform step length, then we can use a pre-computed quadratic fit to interpolate through the closest three rd values to obtain a very accurate estimate of ru. The combination of a look-up table plus quadratic interpolation is almost as fast as evaluating the forward distortion equation. The only limitation to the look-up table approach, though, is that we have to recompute the table whenever k1 or k2 changes, meaning that the look-up table method is perfect for undistorting an entire image for a given k1 and k2, but probably too expensive to use during the optimization task to find k1 and k2.

So this is exactly what MTF Mapper does: the forward distortion model is adopted so that the optimization of k1 and k2 is efficient, with a look-up table + quadratic interpolation implementation for undistorting the entire image.

Some further observations on the models

If you stare at the equation for the inversion of the division model for a while, you will see that 
    rd * k1 * ru2 + rd * k2 * ru4 - ru + rd = 0
neatly reduces to
    rd * k1 * ru2  - ru + rd = 0
if we assume that k2 = 0. This greatly simplifies the root-finding process, since we can use the well-known quadratic formula, or at least, the numerically stable version of it. This is such a tempting simplification of the problem that many authors [1, 2] claim that a division model with only a single k1 parameter is entirely adequate for modeling radial distortion in lenses.
That, however, is demonstrably false in the case of moustache distortion, which requires a local extremum or inflection point in the radial distortion function. For example, the distortion function that produces Figure 4 above looks like this:
Figure 7: The distortion function F() corresponding to Figure 4.
It is clear that the division model with k2 = 0 cannot simultaneously produce the local minimum observed at the left (rd = 0) and the local maximum to the right (rd ~ 0.65).

Similar observations apply to the polynomial model, i.e., we require k2 ≠ 0 to model moustache distortion.

Wrapping up

I think that covers the basics of radial distortion modelling. In a future article I will demonstrate how one would go about determining the parameters k1 and k2 from a sample image.

References

  1. Fitzgibbon, A.W., Simultaneous linear estimation of multiple view geometry and lens distortion, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.
  2. Wu, F, Wei, H, Wang, X, Correction of image radial distortion based on division model, SPIE Optical Engineering, 56(1), 2017.
  3. Santana-Cedres, D, et al., Invertibility and estimation of two-parameter polynomial and division lens distortion models, SIAM Journal on Imaging Sciences, 8(3):1574-1606, 2015.

Tuesday, 1 August 2017

Image interpolation: Fighting the fade, part 1

Over the last six months or so I kept on bumping into a particularly vexing problem related to image interpolation: the contrast in an interpolated image drops to zero in the worst case of interpolating at an exact half-pixel shift relative to the original image.

Consider the case where you are trying to co-register (align) two images, such as two images captured by the same camera, but with a small translation of the camera between the two shots. If we translate the camera purely in the horizontal direction, then the shift between the two images will be h pixels, where h can be any real number. The integer part of h will not cause us any trouble for reasonable values of h, such that the two images still overlap, of course. The trouble really lies in the fractional part of h, since this forces us to interpolate pixel values from the moving image if we want it to line up correctly with the fixed image.

The worst case scenario, as mentioned above, is if the fractional part of h is exactly 0.5 pixels, since this implies that the value of a pixel in the interpolated moving image will be the mean of the two closest pixels from the original moving image. Figure 1 illustrates what such an interpolated moving image will look like for a half-pixel shift; the edges are annotated with their MTF50 values.
Figure 1: Scaled (Nearest Neighbour interpolation) view of an interpolated moving image that experienced a half-pixel shift. The numbers are the measured MTF50 values, in cycles/pixel.
Looking closely at the vertical edges of the gray square, we can see that there are some visible interpolation artifacts manifesting as overshoot and undershoot. This image was interpolated using OMOMS cubic spline interpolation [2], which is best method that I am aware of. Linear interpolation would produce much more blurring (but no overshoot/undershoot). And of course we see a marked drop in MTF50 on the vertical edges!

At any rate, the MTF curves for the edges are illustrated in Figure 2. The blue curve corresponds to the horizontal edge (i.e., the direction that experienced no interpolation), and the orange curve corresponds to the vertical edge (an 0.5-pixel horizontal shift interpolation). The green curve was obtained from another simulation where the moving image was shifted by 0.25 pixels.
Figure 2: MTF curves of interpolated moving images corresponding to fractional horizontal shifts of zero (blue), 0.25 pixels (green), and 0.5 pixels (orange).

Certainly the most striking feature of the orange curve is how the contrast drops to exactly zero at the Nyquist frequency (0.5 cycles/pixel). The smaller 0.25-pixel shift (green curve) shows a dip in contrast around Nyquist, but this would probably not be noticeable in most images.

In Figure 3 we can see that this loss of contrast around Nyquist follows a smooth progression as we approach a fractional shift of 0.5 pixels.
Figure 3: MTF curves of interpolated moving images corresponding to fractional horizontal shifts of 0.25 pixels (blue), 0.333 pixels (green), and 0.425 pixels (orange).
The conclusion from this experiment is that we really want to avoid interpolating an image with a fractional shift of 0.5 pixels (in either/both horizontal and vertical directions), since this will produce a very noticeable loss of contrast at higher frequencies, i.e., we will lose all the fine details in the interpolated image.

Radial distortion lens correction

An applied example of where this interpolation problem crops up is when we apply a radial distortion correction model to improve the geometry of images captured by a lens exhibiting some distortion (think barrel or pincushion distortion). I aim to write a more thorough article on this topic soon, but for now it suffices to say that our radial distortion correction model specifies for each pixel (x, y) in our corrected image where we have to go and sample the distorted image.

I prefer to use the division model [1], which implies that for a pixel (x, y) in the corrected image, we go and sample the pixel at
    x' = (x - xc) / (1 + k1r2 + k2r4) + xc
where
    r = sqrt((x - xc)2 + (y - yc)2)
and (xc, yc) denotes the centre of distortion (which could be the centre of the image, for example).
The value of y' is calculated the same way. The actual distortion correction is then simply a matter of visiting each pixel (x, y) in our undistorted image, and setting its value to the interpolated value extracted from (x', y') in the distorted image.

The important part to remember here is that the value (x', y') can assume any fractional pixel value, including the dreaded half-pixel shift.

An example of mild pincushion distortion

In order to illustrate the effects of radial distortion correction, I thought it best to start with synthetic images with known properties. Figure 4 illustrates a 100% crop near the top-left corner of the reference image, i.e., what we would have obtained if the lens did not have any distortion.
Figure 4: The pure, undistorted reference image. Note that the closely-spaced black lines blur into gray bars because of the simulated Gaussian Point Spread Function (PSF) with an MTF50 of 0.35 c/p. If you squint hard enough, you can see some traces of the original black bars. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view)

I simulated a very mild pincushion distortion with k1 = 0.025 and k2 = 0, which produces an SMIA lens distortion figure of about -1.62%. This distortion was applied to the polygon geometry, which was again rendered with a Gaussian PSF with an MTF50 of 0.35 c/p. The result is shown in Figure 5. Keep in mind that you cannot really see the pincusion distortion at this scale, since we are only looking at the top-left corner of a much larger image.
Figure 5: Similar to Figure 4, but with about 1.62% pincushion distortion applied to the polygon geometry. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view)

We can see the first signs of trouble in Figure 5: Notice how the black/white bars appear to "fade out" at regular intervals. The straight lines of Figure 4 are no longer perfectly straight, nor are they aligned with the image rows and columns. The lines thus cross from one row (or column) to the next, and the gray patches correspond to the regions where the lines fell halfway between two rows (or columns), leading to the apparent loss of contrast.

It is important to understand at this point that the fading in Figure 5 is not a processing artifact; this is exactly what would happen if you were to photograph similar thin bars that are not aligned with the image rows/columns.

Finally, we arrive at the radial distortion correction phase. Figure 6 illustrates what the corrected image would look like if we used standard cubic interpolation to resample the image.
Figure 6: The undistorted version of Figure 5. Resampling was performed using standard cubic interpolation. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view).
We see some additional fading that appears in Figure 6. If you flick between Figures 5 and 6 (after clicking for 100% view) you will notice that an extra set of fading patches appear in between the original fading patches. These extra fades are the manifestation of the phenomenon illustrated in Figure 2: the contrast drops to zero as the interpolation sample position approaches a fractional pixel offset of 0.5. The interesting thing about these additional fades is that they are not recoverable using sharpening --- once the contrast reaches zero, no amount of sharpening will be able to recover it.

A potential workaround

The aim of radial distortion correction is to remove the long range (or large scale) distortion, since the curving of supposedly straight lines (e.g., building walls) is only really visible once the distortion produces a shift of more than one pixel. Unfortunately we cannot simply ignore the fractional pixel shifts --- this would be equivalent to using nearest-neighbour interpolation, with its associated artifacts.

Perhaps we can cheat a little: what if we pushed out interpolation coordinates away from a fractional pixel shift of 0.5? Let x' be the real-valued x component of our interpolation coordinate obtained from the radial distortion correction model above. Further, let xf be the largest integer less than x' (the floor of x'). If x' - xf < 0.5, then let d = x' - xf. (We can deal with the d > 0.5 case by symmetry).

Now, if d > 0.375, we compress the value of d linearly such that 0.375 <= d' <= 0.425. We can obtain the new value of x', which we can call x", such that x" = xf + (x' - xf ) * 0.4 + 0.225. Looking back at Figure 3, we see that a fractional pixel shift of 0.425 seems to leave us with at least a little bit of contrast; this is where the magic numbers and thresholds were divined from.

Does this work? Well, Figure 7 shows the result of the above manipulation of the interpolation coordinates, followed by the same cubic interpolation method used in Figure 6.
Figure 7: The undistorted version of Figure 5. Resampling was performed using the modified interpolation coordinates followed by cubic interpolation. Rendered at 400% size with nearest-neighbour upscaling. (click for 100% view).
Careful squinting reveals that the additional fading patches observed in Figure 6 have been reduced noticeably. This looks promising. Of course, one might argue that I have just added some more aliasing to the image. Which might be the case.

Further testing will be necessary, especially on more natural looking scenes. I might be able to coax sufficient distortion from one of my lenses to perform some real-world experiments.

Further possibilities

Using the forced geometric error method proposed above, we can now extract at least some contrast at the frequencies near Nyquist. We also know what the fractional pixel shift was in both x and y, so we know what the worst-case loss-of-contrast would be. By combining these two bits of information we can sharpen the image adaptively, where the sharpening strength is adjusted according to the expected loss of contrast.

Stay tuned for part two, where I plan to investigate this further.

References

  1. Fitzgibbon, A.W.: Simultaneous linear estimation of multiple view geometry and lens distortion. In: Proc. IEEE International Conference on Computer Vision and Pattern Recognition, pp. 125–132 (2001).
  2. Thevenaz, P., Blu T. and Unser, M.: Interpolation revisited, IEEE Transactions on medical imaging, 19(7), pp. 39–758, 2000.