## Friday, 2 November 2012

### Importance sampling: How to simulate diffraction and OLPF effects

So how exactly does one render a synthetic image that simulates both the effects of diffraction (circular aperture) and the 4-dot blur of a Lithium Niobate optical low-pass filter (OLPF)?

Before jumping into the combined effect, I will start with a method of rendering the effects of diffraction only.

### Diffraction simulation using weights on a regular grid

The most intuitive method of rendering diffraction effects is direct convolution. This involves convolution in the spatial domain, i.e., for each pixel in the output image, sample the underlying scene (usually a black square on a white rectangle in my examples) at many points arranged on a fine, sub-pixel-spaced regular grid. Each of these samples is then multiplied with an Airy disc function centred on that pixel, and added to the running total for each pixel.

This works reasonably well because it is very simple to sample the underlying scene: you just have to determine whether a sample point is inside the black rectangle, or not. The appropriate weights for the Airy disc function are obtained by directly evaluating the appropriately scaled jinc function (see here for an overview).

This regular grid sampling strategy is described in some detail in a previous post. It works well enough for some functions, but it turns out to be a poor choice for the Airy disc function, for which the weights are close to zero almost everywhere outside of the central peak:
 The Airy disc function (jinc squared)
Those peripheral rings have very low magnitude compared to the central peak, which means that samples from those regions will have a much smaller influence on our answer than samples from the centre.

Looking at the problem from another angle we see that this discrete sampling and convolution is really just a discrete approximation to the continuous convolution of the Airy function and the underlying scene, sampled at the pixel positions of our synthetic output image. This implies that other techniques, such as Monte Carlo integration, could potentially be applied to evaluate the integral.

### Monte Carlo integration

It may sound exotic, but MC integration is a very straightforward concept. Imagine that you want to compute the surface area of a unit circle (radius=1). Fit a square tightly around this circle (say, width=2, centred on circle), and generate (x,y) coordinates randomly (uniform distribution) within this square. For each of these random coordinates that fall inside the circle, increment a counter A.
After a fair number of samples (N), the value A/N will approximate π/4, which is the ratio of the area of the circle to that of the square.

This may seem like a roundabout method of computing the area of a circle, but the method is very general. For example, instead of simply incrementing the counter A when a point is inside the circle, we could have evaluated some function f(x,y) at each point inside the circle. This means that we can approximate complex double integrals over complex, non-convex regions, without breaking a sweat.

Note that this method is very similar to the direct convolution approach described above, and if we assume that our "random" coordinates just happened to fall on a regular grid, then we can see that these two approaches are really very similar. So why would you choose random coordinates over a regular grid?

I can think of two disadvantages to a regular grid: 1) you have to know how finely spaced your grid should be in advance, and 2) your regular grid may interfere with the shape of the region you are integrating over. If you have some way of measuring convergence (say, the variance of your estimate), then you can keep on generating random samples until convergence; with a regular grid, you must sample all the points, or your integral will be severely biased.

Sampling using random (x,y) coordinates is not optimal, though, since random numbers have a tendency of forming clumps, and leaving gaps (they must, or they would not be random!). What works better in practice is a quasi-random sequence of numbers, such as a Sobol sequence or the Halton sequence. These quasi-random sequences will fill the space more evenly, and you can still stop at any point during the integration if convergence has been achieved; they also tend to produce a lower variance in the integral than random sampling for the same number of samples.

### Importance sampling

While uniform random number sequences (and uniform quasi-random sequences) provide a convenient way of choosing new sampling positions for our Monte Carlo integrator, they can be wasteful if the function we are integrating has large regions over which the function value is small compared to the function's maximum, such as the Airy disc function above. What would happen if we could choose our sampling points in a biased manner, such that we choose sampling positions proportional to the function value at those positions?

This is the essence of importance sampling. If we can choose our sampling points according to a distribution that better fits the shape of the function we are integrating, then we can concentrate our samples in the areas that have a larger function value (weight), and are thus more important. This strategy reduces the variance of our MC integration procedure significantly.

Take the Airy disc function as an example: we could choose our sampling positions according to a Gaussian distribution. This will concentrate samples closer to the centre, but we have to be careful keep the standard deviation of the Gaussian wide enough so that sufficient samples can be generated from the outer rings of the Airy disc function. There is one very important thing to remember when applying importance sampling: you must divide the function value at each sampling position by the probability of observing that sampling position.

Thus, if generate our sampling points using a Gaussian distribution, then we have a Gaussian pdf p(x,y) = exp(-x2/(2sx2) - y2/(2sy2)) where sx and sy denotes the respective standard deviations in x and y, which means we must add the value f(x,y)/p(x,y) to our accumulator, rather than f(x,y)/N, as we would normally do with a regular grid or with uniform sampling. This is what it looks like when we are rendering a grey rectangle on a white background:
The gridlines denote the boundaries of the discrete pixels of our synthetic image. The red dots represent the sampling points of one pixel (the one the arrow points to).
Notice how the density of the sampling points decreases as we move away from the centre of the pixel --- this density is exactly Gaussian.

Since we are really computing the product of the weighting function and the underlying scene, we are accumulating I(x,y)*f(x,y)/p(x,y), where I(x,y) measures whether a point is inside or outside our target object (black rectangle).

What happens if the distribution of our sampling points match the distribution of f(x,y) exactly? Then f(x,y) = p(x,y), and we are effectively weighting each point equally, with sampling density effectively achieving the desired weighting. This strategy is optimal, since it makes the most effective use of every single sample. The only way to improve on this is to stratify according to the scene content as well, but that makes things a bit complicated.

### Importance sampling and the Airy disc

So how do you generate quasi-random (x,y) coordinates with a distribution that matches the Airy disc function? Same way you generate points from a Gaussian: by inverting the cumulative distribution. This technique is called the "inverse transform sampling method". For a Gaussian, you can use Moro's inversion, but I am not aware of any fitted polynomials for inverting the cumulative distribution of the Airy disc function. What now?

Well, I decided to use a look-up table to approximate the cumulative distribution of the Airy disc function. Since the function is radially symmetrical, this is just a 1-dimensional look-up table, which I have implemented as a piecewise-linear function. Thus, given a pair of uniform variates (x,y) in the range [0,1][0,1], you can obtain a sample following the Airy disc function density by choosing an angle θ = 2π * x, and a radial distance r by looking up the value y in the cumulative distribution of the unit Airy disc function. Scaling for wavelength, pixel pitch and f-number can be performed on r afterwards.

There is only small trick, though: If you generate a polar 2D coordinate as [r cos(θ), r sin(θ)], where r has a uniform distribution, you will end up with more points close to the centre than on the outer rim. You want to partition the circular disc into parts of equal area as a function of radius, which means that your r must first be transformed to r' = √r. This is critical, or your true distribution of points will differ from your assumed distribution, and your weighting of samples will be biased.

To apply this to the Airy disc function cumulative distribution table, we just go back to basics. The cumulative distribution as a function of the radius r can be approximated as a finite sum:
F(rn) = F(rn-1) + (2 jinc(rn))2 * (πrn2 - πrn-12)
where rn is simply our discrete sample along the radius (something like rn = n/N). This looks like a simple Riemann sum, with the important change being that our "width" parameter is not a linear function of rn, but in fact quadratic. This small change ensures that outer radii are assigned an area-proportionally larger weight, so that we can generate our sampling positions in polar coordinates without biasing them towards the centre.

### Summary of Airy disc function sampling

To briefly recap, here is the recipe for simulating the effects of diffraction:
1. Generate a cumulative distribution of the unit Airy disc function and store it in a look-up table.
2. Generate N (x,y) coordinate pairs in the range [0,1][0,1] using a quasi-random sequence such as the Halton sequence.
3. Transform these coordinates to an Airy disc distribution by
θ = 2π * x
r =
LUT[sqrt(y)]
(x',y') = [r cos(θ), r sin(θ)]
4.  For each pixel, add the pixel centre coordinates to each sampling point (x',y') to obtain (x",y").
5. Evaluate the scene (x",y"), thus accumulating I(x",y") * f(x",y")/p(x",y").
6. repeat steps 4-5 for all pixels in target image.
You may wonder about the scaling term f(x",y")/p(x",y"), which seems superfluous given that we expect this value to be equal to 1.0. Well, since I have a discrete approximation of the density (the LUT), I decided to use the actual probabilty from the LUT as p(x",y"), and the Airy disc function as f(x",y"). This way, if there are any residual errors in the approximation, the weighting should correct for it.

This algorithm can be illustrated as follows:
Notice the typical Airy pattern rings that are formed by the higher-density regions of our sampling point distribution.

One tiny detail has been omitted so far: the diameter of the region that we will sample. The magnitude of the Airy pattern drops off fairly rapidly, and it is tempting to only build our look-up table for the range 0 <= r <= 5. This yields very good sampling of the centre of the Airy pattern, and thus the higher frequencies in the MTF curve of our synthetic image. Unfortunately, such severe truncation distorts the lower frequencies of the MTF curve noticeably. I have obtained reasonable results with 0 <= r <= 45, storing roughly 20000 points in my look-up table.

### Convolution of Airy PSF and OLPF

Unfortunately, we are not done yet. For a sensor without an OLPF, we must still convolve the Airy PSF with the pixel PSF (typically a box function) to obtain the desired value for a given pixel. There are two ways of doing this: 1) convolve a sampled Airy PSF with a sampled box PSF to produce a sampled combined PSF, or 2) sample the scene using importance-sampled points, but perform the box function convolution with the scene at each sampling point.

The first method is simple and straightforward, but suffers from all the usual disadvantages of regular grid sampling. It requires a very fine grid to produce good results; somewhere around 1050625 samples per pixel in my experience. The second method is really quite efficient if we have a good method of performing the box function convolution efficiently.

As it turns out, the convolution of a box function centred as a specific coordinate with our target object is just the area of intersection between the polygon defining the box function, and the rectangle defining our target object (provided, of course, that our target is a rectangle). I relied on the Sutherland-Hodgman polygon clipping routine to clip the box function polygon with the target rectangle's polygon, which is quite efficient. Here is an illustration of such a box function intersecting our polygon, with the box function (in blue) just happening to align with the pixel grid:

The importance sampling algorithm from the previous section remains largely unchanged: in step 5, the evaluation of I(x",y") now simply denotes the result of the box function convolution, i.e., the area of overlap between a box function (width 1 pixel) centred at (x",y"), and the target rectangle.

Finally, to render a 4-dot OLPF blur, such as that effected by a Lithium Niobate AA filter, you simply take the average of four samples at the coordinates (x" ± 0.375, y" ± 0.375), assuming of course a split distance of 0.375 pixels (or total spread of 0.75 pixels). Each sample thus required four polygon intersection calculations, like this:

This approach is conceptually simple, and fairly flexible. The main disadvantage is that rendering times will increase by roughly a factor four. Fortunately, the larger support of the 4-dot OLPF PSF means that the synthetic image rendered using it will be smoother, which means we can use reduce the number of samples required to obtain a reasonable result.

One more advantage: since this rendering approach implements the photosite aperture as a polygon intersection, it is trivial to model different aperture designs. For example, the default choice of a "gap less" photosite aperture is not entirely realistic, since practical sensors typically do not have 100% fill factors. As pointed out by one of the MTF Mapper blog readers, modern "gap less" microlens designs still suffer from attenuation in the corners, resulting in a near-circular photosite aperture.

### Demonstration

We have time for a quick demonstration of the various PSF types, using a photosite pitch of 4.73 micron, i.e., like the Nikon D7000, assuming green light at 0.55 micron wavelenghts. Here are some synthetic images:
 a) Gaussian PSF with sd=0.57 pixels, mtf50=0.33
 b) f/8 circular aperture + square pixel aperture, mtf50=0.337

 c) f/8 circular aperture diffraction + 4-dot OLPF + square pixel aperture, mtf50=0.26
Note that the MTF50 values of examples (a) and (b) above are almost the same, and unsurprisingly, the images also look very much the same. Sample (c) looks just a tad softer --- exactly what we would expect image (b) to look like after adding an OLPF.

It seems like quite a lot of effort to simulate images with PSFs that correspond to diffraction effects, only to end up with images that look like those generated with Gaussian PSFs.

### Conclusion

That is probably enough for one day. In a future post I will provide more information on rendering time and accuracy.

All the algorithms discussed here have been implemented in the mtf_generate_rectangle tool included in the MTF Mapper package from version 0.4.12 onwards. See the documentation on the "-p" option, which now includes  "gaussian", "airy", "airy-box" and "airy-4dot-olpf" PSF types.