Apple has introduced several computational imaging technologies to the iPhone imaging system. What is dynamic tone mapping, what does it do and how does it work?
If you’re shooting photos or video with a recent generation iPhone, then you’re seeing images that have been finessed into their final form by a lot of clever AI post processing. Much of this processing is a highly automated form of the manual adjustments a professional photographer will make to raw images or film scans. These adjustments are normally made using Lightroom or Photoshop. For video, these are adjustments made by an editor or colorist.
But exactly how is this automated? Does it actually work? What does it mean for you as an author of your images and video? These are some of the questions and issues I want to explore in this article.
From the very beginning of analog photography there has been the need to make alterations and adjustments to tone and exposure when creating a finished print from a negative. The print is what everyone sees, it might appear in a magazine or be framed on the wall of a gallery. The negative is the original exposed film directly from the camera. The same is true of digital imaging.
An image seen purely as it was exposed may not be very pretty. At the very least, a few basic adjustments to white level, black level, and contrast are always necessary.
Some of these adjustments are global, applied to an entire image, but many are more selective. Selective adjustments are made only to specific parts of an image, leaving other regions unaffected.
These decisions are normally made by the photographer, or retoucher, or in decades past, by technicians in a lab. In the case of video this is done in post by an editor or colorist. This is still the case for professional high-end production. However, for smartphone image capture these decisions and adjustments are being entirely automated thanks to carefully trained AI and neural processing.
What is Dynamic Tone Mapping?
Dynamic tone mapping is a computational process that uses an algorithm to segment and analyse a source image in real time and re-map, or make shifts to shadow levels, midtone levels and highlight levels in specific areas of the image automatically based on scene contrast ratios.
The goal of this process is to make a good looking, display ready image as easily as possible with no human input. This is automated image editing for consumer photography and videography on a mass scale.
The easiest way to think about this is to imagine the captured source image (an image that is the direct result of debayered raw image sensor data) and a target destination image (an image that looks pretty). Tone mapping is what we call the automated process that transforms the source image into the destination image.
The source image is made up of the light values actually captured by the camera image sensor (raw Bayer sensor data). The destination image is an altered version made to perceptually look “better” to a person looking at it, typically on a smartphone, tablet or laptop display.
The process is dynamic because whether you are capturing photos or video, the camera is moving and light is always changing. As the source image changes, it is continuously analyzed. The output changes in response.
There are a few important points to consider here.
- The actual source (raw) image data is the only true, real representation of the captured scene.
- The final altered image may look “better” but is also no longer true to reality.
- “Better” is entirely subjective and may or may not convey the artistic intent of the image.
How Does Dynamic Tone Mapping Work?
Technically there are two things to consider.
First is the amount of actual real-world light in the scene, and more importantly the difference in brightness between the brightest and darkest areas. This is the scene contrast ratio.
Second is how a camera actually records those light levels relative to each other, as well as to the maximum measurable white point, and minimum discernible black point.
The difference or ratio between maximum measurable white point and minimum discernible black point of any camera or imaging system is also known as dynamic range.
A dynamic tone mapping algorithm is constantly reading how bright the brightest parts of the image are compared to the darkest parts. The algorithm has been trained using machine learning on a vast data set of images, to decide what it thinks is the best way to record those light levels. It may choose to lift dark areas to be brighter, and to reduce the level of the brightest areas to be slightly darker. It may shift the mid tones brighter or darker as well.
This is difficult for a number of reasons.
The Challenges of Dynamic Tone Mapping
With the iPhone XS, XS Max and XR, this algorithm was particularly aggressive. If there was a bright light source behind a subject, instead of letting it over expose naturally, the algorithm would attempt to simultaneously bring down the level of extreme highlights while raising the shadow levels. The result would be highlights that are blown out but brought down towards the mid tones, and lifted shadows. The result was an awful muddy mess.
This is something a human photo editor would never do to an image like that, but it’s an example of the algorithm failing under extreme, but very common conditions.
There were additional issues, such as over saturated highlights and terrible skin tones. All of this together led to photos and video that had a definite, and often very bad “iPhone XS” look.
The challenge for any system employed to automatically manipulate images is to make the correct adjustments, by the correct amount. It also needs to create consistent results over a wide range of possible input values.
A Question of Authorship and Artistic Intent
Manipulating images to look nice is nothing new. Photographers have been giving instructions to editors and lab technicians forever. This is part of the photo-chemical process of making prints from film negatives, and of course we do it digitally all the time.
This is the essence of the immensely popular and widely taught Ansel Adams Zone System. This is basic photo editing 101, but the difference here is who, or what has control and makes the decisions.
Leaving these decisions up to machine learning is not what I, or any videographer, colorist, or post professional is used to or expects. By nature we crave that control, and don’t like giving up a big part of what helps author our artistic intent.
However, Apple builds and markets aspirational creative technology that appeals to non professionals. By comparison there is only a small niche of professionals that have latched onto the iPhone, or other smartphones as tools with serious potential. We can’t expect them to alienate their core market just because a few professionals want control.
We can only hope that they choose to accommodate both. I believe this is now happening with the iPhone 11 generation, especially the iPhone 11 Pro and iPhone 11 Pro Max.
Don't Mess With My Image
Ideally, we want an image where the recorded brightness levels, we call this luminance in our video is scene referred.
Scene referred means that a fixed mathematical transform, called a gamma encoding curve, determines exactly what luminance levels are recorded depending on how much light is exposing the image sensor.
I’m not going to get into linear vs non linear gamma encoding in this article, but here is a real world example. Below is the ARRI Alexa LogC gamma encoding curve at various EI (Exposure Index) values. Don’t worry about that right now, just understand that any relative scene brightness, or luminance level (given in exposure stops along the X axis) is encoded to a very specific value on the Y axis according to the curve.
An increase of one stop along the X axis is a doubling of light and a reduction of one stop is a halving of light. The higher the luminance (the brighter an object in the scene is), the higher the encoded value in the video file. The lower the luminance (the darker an object in the scene is), the lower the encoded value in the video file.
The important thing to understand is the curve determines the encoded value, and the curve doesn’t change. It will always be possible to trace back the real world brightness of any object in the scene relative to any other object in the scene based on the encoded luminance values.
A certain amount of light at the sensor will always result in the same value being recorded to the file. This is a fixed relationship determined by the curve.
This also means that if you have all the relevant exposure and photometric information, it’s possible to mathematically calculate exactly how much light was exposing the sensor from the scene, based on the recorded luminance level in the image, and therefore even how much light was illuminating the scene when it was shot.
This fixed, real world luminance to recorded value relationship is the basis of everything that happens in post when balancing levels and color, shot matching and creating consistent color grades.
Of course once a colorist makes adjustments and shifts levels around, it is no longer scene referred. However, the underlying fixed relationship exists in the source video files.
Apple decided it was better to throw out a fixed mathematical relationship altogether when recording video files. Instead an algorithm decides how best to encode luminance levels within a scene dynamically. The algorithm is automatically doing the job that would be done very carefully with a lot of consideration by a colorist.
Not only does this introduce a huge unknowable variable into the mix, it changes all the time, even within a single video clip.
It may make the average point and shoot user happy to have video that always looks bright and colorful. But it’s a nightmare for anyone intending to professionally color correct the video and match shots.
Dynamic Tone Mapping and the iPhone 11
Thankfully Apple have refined the whole image processing pipeline for the iPhone 11, iPhone 11 Pro and iPhone 11 Pro Max. Dynamic tone mapping is still active, but it now works far more consistently. In fact the iPhone 11 Pro Max is the best iPhone I’ve ever used for image capture and creation.
Of course there is still no fixed gamma curve determining what video luminance value will be encoded, but the range of possibilities is far more limited than it was in the iPhone XS generation devices. This makes it possible again for a colorist to match shots and adjust levels with some degree of accuracy. The most consistent conditions are bright, daylight exteriors where I notice very little, if any shifts in most situations.
The FiLMiC Pro team have done an excellent job in optimizing for the iPhone 11 devices. I have found the FiLMiC LogV2 processing and encoding works very well on the iPhone 11 and delivers stable and consistent images I can easily manipulate myself.
This said, there are still problems especially in low light and indoor environments when levels will shift despite ISO and shutter speed being locked down in FiLMiC Pro. This is purely caused by tone mapping that can’t be bypassed. Unfortunately if you happen to notice these shifts in your shot, there’s nothing you can do about it.
For most conditions, at least where there is plenty of light, I believe Apple have reached a good working compromise where consumers can enjoy crisp, bright, colorful video but professionals can still work with the files.