The future of the camera is computational. Computational imaging technology, AI, and consumer mobile computing are the real drivers of innovation in digital imaging.
What is computational imaging? How does it work and what does it mean? These are the questions I want to answer. More than just another trendy tech buzzword, this is an entirely new way of recording and reproducing images. It is a total reinvention of the camera as we know it.
The Evolution of Digital Imaging
Every year at NAB in Las Vegas, and IBC in Amsterdam I look at all the latest professional video technology, hoping to see real innovation. What’s become clear to me in recent years is that I’m looking in the wrong place. The imaging technology of tomorrow won’t be developed for these trade shows. It won’t take the form of traditional broadcast cameras, camcorders, or even mirrorless compact cameras.
While professional camera technology seems to be stagnating, the consumer technology space is taking the lead.
It’s no secret that sales of traditional photo cameras, especially digital compacts, but also DSLR’s are falling. Mirrorless cameras are in many ways the pinnacle of large sensor photo (and video) camera technology. Stripped of redundant mechanical components, the camera has been simplified to a sensor and software.
From Hardware to Software
The next evolutionary step for all digital cameras is software. Computational imaging leverages mathematical algorithms, AI and machine learning to indirectly synthesize images. This is achieved by contextually inferring, creating new image information based on the intelligent analysis of real, but limited source image data. It’s a bit like looking at an incomplete drawing, recognizing what it should be, and filling in the gaps yourself. The process is far closer to how your brain processes visual information than any conventional camera.
Digital imaging is all about fidelity. For example, the fidelity between the output of an image sensor and physical reality. Also, the fidelity of encoded, recorded data to the output of an image sensor. An ideal camera system faithfully captures and records light so it can be faithfully reproduced and displayed with minimal perceivable loss.
All of this assumes that the conversion of light to voltage, and voltage to data in an image sensor sets the absolute upper limit of fidelity. In a direct imaging system involving the kind of conventional signal processing that cameras typically employ this is absolutely true.
Human Vision is Computational
The human visual system on the other hand, is not a direct imaging system. We don’t see complete exposures of our full field of view like photographs or frames of video. Instead, the cohesive, photographic like image you perceive is the result of a highly computed composite of spatially and temporally arbitrary visual information and stimuli. The source of this visual information is an image focussed on our retinas, but they supply only a fraction of the required information to build our full field of view at any given moment. For more on current research in this field see: A Mathematical Model Unlocks the Secrets of Vision – Quanta Magazine
Limited Source Information
There are two types of photoreceptive cells that make up the retina. These are called rods and cones. Rods are responsible for low light vision and are not sensitive to color. Cones are responsible for color vision and have a high spatial acuity. A small area at the center of the retina called the fovea contains cones packed in sufficient density to see high resolution information. This represents only about the central 2 degrees of our field of view. The fovea is less than 1% of our retina but takes up over 50% of the visual cortex in the brain.
A Real-Time Composite
Of course, we perceive far more at any moment than this small fraction of high resolution information. This is because the brain constructs a complete picture in real-time from a stream of visual information.
There is a disconnect between our perception of the whole, and the source input. We perceive reality as it is, in glorious wide gamut color, ultra high dynamic range and with stereoscopic depth, but the image in our minds eye is synthesized indirectly.
The Whole is Greater than the Sum of its Parts
This common saying has never been true of conventional digital imaging. When dealing with a conventional digital imaging system, the whole is exactly equal to, but never greater than the sum of its parts. It has been impossible to create pixels that were never captured.
One of the fundamental pillars of conventional digital imaging systems is the idea that it's impossible to get out more than you put in. This is simply no longer true.
By employing computational processes to contextually analyze a limited source of captured image data, it is possible to infer and create brand new image information with real world photographic accuracy.
This is different to what can be seen in some existing post production and signal conversion processes. Motion interpolated slow motion effects create new frames in between captured frames. Sophisticated scaling algorithms upscale HD to UHD, but there is always compromise. An upscaled HD image is not comparable to a native UHD image when they are directly compared.
However, true computational imaging algorithms have the potential to change this.
What is Computational Imaging?
Computational imaging according to wikipedia:
Computational Imaging is the process of indirectly forming images from measurements using algorithms that rely on a significant amount of computing.
It is the ‘indirectly‘ part of this description which is most important to understand.
Direct Imaging
A simple example, is to imagine we are imaging a curved line. Let’s assume the resolution of the direct imaging system we employ limits us to sampling, and recording this line using only three points. Point A is the start of the line and Point C is the end of the line. Point B is half way along the curve between point A and point C.
Of course, a system which simply records the position of three points along the curve will not accurately reproduce the original curve. Resolution limits the fidelity of our direct recording.
So, we increase resolution. Instead of three points, we measure and record the positions of five points, or even ten points along the line.
However, this simple system of passive direct reproduction still only connects the few dots we’ve recorded. The recorded information can be displayed exactly as it was captured, but cannot deliver an accurate reproduction of the original line. Resolution still limits the fidelity of the recording.
At this point we can further increase the resolution, by measuring and recording hundreds of points along the line. Or we can take a different approach.
Indirect Imaging
Instead of increasing resolution, we can perform a bit more analysis on the five points we have. For instance, we can recognize that the positions of these points relative to each other probably represent a continuous curve. How do we recognize this? Of course, this is an assumption, but it is an assumption based on previous training that this world of lines is made up mostly of continuous curves.
Based on this assumption we can build a virtual model of the original line. We can project a mathematically continuous curve passing through the five measured points. This model allows us to calculate the likely position of any number of virtual points along the curve. Our input resolution is limited to only five points, but the computed output resolution is infinite.
Mathematics informed by contextual pattern recognition
This analytic and interpretive process is an example of a contextually aware algorithm. The algorithm relies on underlying training, and facts about the data it will receive and how to correctly interpret it. These algorithms can be rules based, or trained on known data sets to recognize patterns and build a contextual reference. This is the power of machine learning.
More Than Pixels
Our world, and therefore images of our world are full of predictable patterns. Light behaves in predictable ways. It is possible to predictably and accurately model objects, edges, colors, contrast, patterns, texture and movement.
Real world scenes can be analysed and modelled virtually, projected beyond the limits of direct recording. A computational image of a projected model can be created at virtual resolutions, virtual color bit depths, even virtual frame rates. This projected model becomes the source of the computed pixels that make up the image you see.
From Pixels to Vectors
Groups of simple pixel values can be analysed to calculate vectors describing the changes between them in all directions both spatially, and temporally. These vectors can be mapped and further analysed. Vectors can be mathematically projected and resampled in ways pixel values cannot. This projected information can be translated into brand new pixel values.
This kind of approach to creating better images by intelligently inferring information is already happening. You can read about FiLMiC Pro’s “Cubiform” computational imaging engine and FiLMiC LogV2 here: What is FiLMiC LogV2? and learn how to get the most from it using a color managed workflow here: Shoot and Color Grade FiLMiC LogV2 with the X-Rite Colorchecker Video.
Cubiform specifically models luminance and chrominance, calculating mathematical vectors based on the relative changes in sampled input pixel values over a certain spatial image area, it’s a very specific algorithm, but the principle is there nonetheless.
Simply capturing light is no longer the end, it is just the beginning.
The Rise of Mobile Photo and Video
Nowhere is this evolution more apparent than in the smartphone camera.
Today’s implementations are incomplete and admittedly flawed. Features such as SmartHDR, dynamic tone mapping, portrait mode and simulated depth effects are only a small step in this direction.
As this technology develops, it will be responsible for more than just augmenting an otherwise conventional camera. Much of the camera as we know it will be virtualized. The physics of light can be simulated by sophisticated algorithms that are fed a constant stream of visual information from multiple precision imagers. This is already implemented in the iPhone 11, iPhone 11 Pro and iPhone 11 Pro Max.
It is the consumer electronics giants, specifically the largest players in mobile technology that have the incentive and profit potential to develop radical next generation computational imaging hardware and software. It requires a potential market on a mass consumer scale to justify the kind of development required.
Nobody fights harder to bring the best quality mobile imaging technology to market than smartphone manufacturers.
The Camera That's Always With You
Consumer technology no longer simply solves a problem or performs a task. We assimilate technology into the very fabric of our lives. There is no better example of this than the smartphone.
Mobile photo and video is all around us. We record, document and share every important moment of our lives with each other. The camera is an extension of how we experience and make sense of our lives and the world around us.
Many of us take this a step further. The rise of YouTube shows us the massive potential of consumer generated video content. Video production, distribution and consumption have become decentralized. I believe the future of professional imaging is developing in the consumer technology space.
The Power of Mobile Computing
Your smartphone of course, is more than just a camera. Serious computational imaging requires serious mobile computing power.
Thankfully, computing power, memory and functionality across desktop, laptop, tablet and mobile are converging towards a common point.
Desktop workstation performance has largely topped out over the past decade. We’re seeing incremental improvements to efficiency, multi core processing and on chip integration. This is in contrast to the brute force increase in clock speed we saw in the mid to late 2000’s.
Chip makers continue to pack more transistors in higher densities than ever before. This enables the devices we hold in our hands to come ever closer to established desktop computing benchmarks. Powerful integrated GPU’s and neural engines can process large data sets using complex algorithms rapidly and efficiently.
Our devices are no longer simply becoming faster, they are becoming more sophisticated, smarter, and smaller.
You can’t ignore the radical potential of computational imaging technology and ever more powerful mobile computing. Combine this with a massive ready consumer market and you can understand why I’m so excited.
Watch this space, because this is the space to watch.