U Machine Learning Research Translates Color Camera Images Into Infrared

First published by the University of Utah John and Marcia Price College of Engineering.

Much modern technology relies on the explosion of tiny, inexpensive sensors that can be placed almost anywhere. For example, the first infrared imaging system, built in the 1960s, was the size of a small car; now, infrared cameras are just one of dozens of sensors self-driving cars need to understand their environment.

To keep pace with this progress, however, researchers are always looking for ways to make each sensor do more. With the growing power of AI-based tools, one approach is automatically translating sensor data from its native format to one that’s more relevant to a given application.

DATA The world's largest database of paired thermal infrared (TIR) and full color (RGB) images. KNOWLEDGE "Deep learning" techniques can train a model to recognize subtle correlations between the two image types. ACTION Despite having no temperature data, the model can produce a simulated TIR images from RBG inputs. Researchers at the University of Utah’s Price College of Engineering are tackling this problem with “deep learning,” an artificial intelligence technique that can train computer systems to automatically recognize similarities within massive datasets. In a new study, they have shown a way for traditional cameras to produce simulated thermal infrared images (TIR), with an accuracy comparable to commercial-grade infrared sensors.

Published in the journal Infrared Physics & Technology, the study was led by Rajesh Menon, USTAR Professor in the Department of Electrical & Computer Engineering, and Emma Wadsworth, an undergraduate researcher in his lab. They collaborated with fellow Menon lab members Advait Mahajan and Raksha Prasad.

Wadsworth, Mahajan, and Prasad were all undergraduates in Menon’s 2022 Computational Photography class, where they conducted this research.

At the core of the their work is a “generative adversarial network” (GAN) known as “pix2pix.” Within the field of machine learning, GANs work by training them to distinguish between real images contained in massive data sets and fake ones generated by the network itself. Eventually, the image generator “learns” the physics and context-aware correlations between those image pairs, enabling it to generate images more likely to be indistinguishable from real ones.

The training data included approximately 75,000 TIR images, captured in different lighting conditions from the roof of a traveling car, each paired with a visible light (RGB) image of the same scene, captured by a traditional camera.

These two parallel databases — now the largest real-world paired image dataset in the public domain — allowed the researchers to train the pix2pix network to do image translation; rather than generating images that look exactly like real ones from the same database, it can generate ones that look like they were produced by the other kind of sensor.

“Most experiments in this area have exclusively studied the thermal to visual direction for this translation,” says Wadsworth. “We wanted to find out if thermal image generation was possible, and if so, how good it was.”

“By increasing the variety in the dataset, we were hoping to solve a more general case of the problem than has been seen before, as well as provide the tools for future researchers to build on that,” they said.

“And by solving the problems of data diversity,” adds Menon, we’re opening a new chapter for this problem: moving from proofs of concept into models that could be useful for real-life applications. These kinds of generalized image-to-image translations could be the solution to extracting more data from any kind of camera.”

The researchers’ deep-learning approach is what allows their model to extract relevant data from an RGB image and translate it into an image that looks like it was taken by an IR sensor.

Thermal infrared imaging (left) and color (middle) cameras captured the same night-driving scene and were used to train the researchers’ image translation model. The simulated TIR image on the right was generated from the RGB image in the middle; it depicts the outline of trees and cars that are invisible to the naked eye despite having no information about their relative temperatures.

While the translated images don’t actually represent the temperature of what is in frame, being trained on real TIR images enables the model to pick up on subtle features of RGB images that are indistinguishable to the human eye. In the night-driving scenario seen above, TIR sensors can detect the slightly warmer temperature of objects and obstacles, producing the skyline and distinct car outlines that are missing in the RGB version.

But while the upper half of that RGB image may look uniformly black, it only takes a few pixels of slight color variations to give the model a sense of where those edges are, producing an output image that is usefully accurate.

“Enhancing and colorizing low-light RGB images only goes so far, but simulating a high-quality thermal image could provide a lot of information about what is going on,” says Wadsworth.

The study also highlighted the known errors that stem from the inherent limitations of this approach. While real IR sensors can reveal obstacles in complete darkness — especially living ones — RGB cameras need at least some visible features to feed the algorithm. As a result, pedestrians and cyclists that appeared in the original TIR images were sometimes missing from the simulated ones.

While RGB cameras can’t fully replace IR sensors, the ability for a single device to switch between those modes enables a host of potential applications. It also represents a new way of thinking about the data cameras and sensors can produce.

Menon has long advocated the concept of “non-anthropocentric cameras,” or the idea that the data cameras collect doesn’t need to resemble anything that humans would recognize as a visual image.

“Cameras have been designed solely for humans for more than a century,” Menon says. “But the vast majority of images are now recorded solely for machines! If we optimized the output of those cameras based on how computers process images, rather than how our vision works, we could improve their performance and unlock new capabilities.”

Humans can only see a small range of wavelengths of the electromagnetic spectrum, and there are other properties of light, such as polarization, that we have no natural ways of sensing. Integrating streams of otherwise invisible information could have applications in any number of fields, from monitoring plant health, detecting mechanical strain in bridges, or seeing through hazardous weather conditions.

Share Post