What is "energy" in image processing?

The energy is a measure the localized change of the image.

The energy gets a bunch of different names and a lot of different contexts but tends to refer to the same thing. It's the rate of change in the color/brightness/magnitude of the pixels over local areas. This is especially true for edges of the things inside the image and because of the nature of compression, these areas are the hardest to compress and therefore it's a solid guess that these are more important, they are often edges or quick gradients. These are the different contexts but they refer to the same thing.

The seam carving algorithm uses determinations of energy (uses gradient magnitude) to find the least noticed if removed. JPEG represents the local cluster of pixels relative to the energy of the first one. The Snake algorithm uses it to find the local contoured edge of a thing in the image. So there's a lot of different definitions but they all refer to the sort of oomph of the image. Whether that's the sum of the local pixels in terms of the square of absolute brightness or the hard bits to compress in a jpeg, or the edges in Canny Edge detection or the gradient magnitude:

The important bit is that energy is where the stuff is.

The energy of an image more broadly is the distances of some quality between the pixels of some locality.

We can take the sum of the LABdE2000 color distances within a properly weighted 2d gaussian kernel. Here the distances are summed together, the locality is defined by a gaussian kernel and the quality is color and the distance is LAB Delta formula from the year 2000 (Errata: previously this claimed E stood for Euclidean but the distance for standard delta E is Euclidean but the 94 and 00 formulas are not strictly Euclidean and the 'E' stands for Empfindung; German for "sensation"). We could also add up the local 3x3 kernel of the local difference in brightness, or square of brightness etc. We need to measure the localized change of the image.

skull default

In this example, local is defined as a 2d gaussian kernel and the color distance as LabDE2000 algorithm.

skull energy

If you took an image and moved all the pixels and sorted them by color for some reason. You would reduce the energy of the image. You could take a collection of 50% black pixels and 50% white pixels and arrange them as random noise for maximal energy or put them as two sides of the image for minimum energy. Likewise, if you had 100% white pixels the energy would be 0 no matter how you arranged them.


It depends on the context, but in general, in Signal Processing, "energy" corresponds to the mean squared value of the signal (typically measured with respect to the global mean value). This concept is usually associated with the Parseval theorem, which allows us to think of the total energy as distributed along "frequencies" (and so one can say, for example, that a image has most of its energy concentrated in low frequencies).

Another -related- use is in image transforms: for example, the DCT transform (basis of the JPEG compression method) transforms a blocks of pixels (8x8 image) into a matrix of transformed coefficients; for typical images, it results that, while the original 8x8 image has its energy evenly distributed among the 64 pixels, the transformed image has its energy concentrated in the left-upper "pixels" (which, again, correspond to "low frequencies", in some analagous sense).


Energy is a fairly loose term used to describe any user defined function (in the image domain).

The motivation for using the term 'Energy' is that typical object detection/segmentation tasks are posed as a Energy minimization problem. We define an energy that would capture the solution we desire and perform gradient-descent to compute its lowest value, resulting in a solution for the image segmentation.