Mathematical explanation behind a picture posted (lifted from facebook)

What you are seeing is a physical manifestation of the mathematical operation known as the convolution.

First let me show you some pictures; we'll get into the mathematics afterwards. We start with the original

enter image description here

I take the image, desaturated the colours, and duplicated another layer, and pixel-wise added the layers after some translation. With a horizontal translation that is half the "wavelength" of the black bars, we get

enter image description here

With a translation of the same number of pixels, but vertically, we get

enter image description here

and finally, a diagonal translation at -45 degrees.

enter image description here


So what is going on? Why did I say that this is a manifestation of convolution?

Recall that the convolution of two functions defined on (say) the real line $\mathbb{R}$ is defined to be

$$ f * g (x) = \int_{\mathbb{R}} f(y) g(x-y) dy $$

In a course in Fourier analysis, one is taught to emphasize that this is the dual operation of multiplication. That is, convolution in physical space corresponds to (point-wise) multiplication in Fourier space. This immediately gives the following interpretation of a convolution in signal processing:

Convolving a signal $f$ by a function $\psi$ is the same as applying a frequency dependent filter $\hat{\psi}$ to the signal $f$.

Another way of looking at the convolution, however, after staring at the above definition for a bit, is that

A convolution is a way of taking weighted average of a signal with its translates. The weight depends on the amount of translation.

It is in this second sense that we will first look at the phenomenon you asked above. In the second image of this post, I averaged the signal with its translation horizontally by half the wavelength of the black bars. Hence this is a convolution. Similarly, in the third/fourth image of this post, I averaged the original with a vertical/diagonal translation. They are also convolutions. And you see that this reproduces the observation you made that the direction in which you shake your head/camera produces an effect on the image seen/captured.

So how is the process of shaking your head of shaking a camera a process of convolution? The idea is that the image you see with your eyes and you capture with a camera do not come from photons all emitted at the same instant in time (special relativity notwithstanding). In your vision, there is the well-known phenomenon of persistence of vision which posits that the perceived image is actually made up of photons arriving in a 40 millisecond interval. Similarly, the shutter-speed of a camera determines how long a camera registers light, and so a camera set on 1/25 for the shutter-speed will "open its eye" for 40 milliseconds, and the image registered on the CCD or on film will be photons arriving in that window.

Now, if you shake your head or camera so that the retina or the CCD or the film moves significantly during that 40 milliseconds, each of your retina cell, each of the photoelements on the CCD, or each of the dye pigments on the film will be exposed to photons originating from different spatial positions. (I am grossly simplifying here, but that's the moral of the story.)

To summarise: your eyes and cameras already take convolution of the incoming signal in time when they compose the image. By shaking the apparati you convert the temporal convolution to a spatial convolution. Which means that you are taking a weighted average of the image and its spatial translations, which is why what you see and capture on camera can be analogously described by digitally manipulating the image via an averaging/convolution procedure.

Note that this corresponds somewhat with Henning's comment to your question. The "eye's edge detection" he mentions is, roughly speaking, a description of how the eye is sensitive to different spatial frequencies of a signal (not to be confused with the actual electromagnetic frequencies with determines the colour). By shaking your head you apply a convolution operator, which in frequency space introduces a cut-off for high spatial frequency components. Buy reducing the high spatial frequency components, your eye is forced to get its information from the lower-frequency components in which the image of the Indian Actor hide. (There's some technical inaccuracies in this paragraph about how human physiology works and how it interacts with the shaking of the head, but I think this simpler picture illustrates the idea better.)


At this point I should mention that the idea of taking spatial convolutions of images and the exchange between temporal and spatial convolutions with the motion of the camera is not only useful for optical illusions. It actually has industry application in automatic image deblurring.