Neural Network Helped Upscale a Vintage 1896 film to 4K

When the 50-second silent short film L’Arrivée d’un train en gare de La Ciotat premiered in 1896, some theatregoers reportedly ran for safety at the sight of a projected approaching train, thinking that a real one would burst through the screen at any moment.

A wild thought, given the blurry, low-resolution quality of the original film. Thankfully those panicky cinephile pioneers never saw the AI-enhanced upscaled version released by Denis Shiryaev, or else they would have flipped their lids.

Shiryaev leveraged a pair of publically available enhancement programs, DAIN and Topaz Labs’ Gigapixel AI, to transform the original footage into a 4K 60FPS clip. Gigapixel AI uses a proprietary interpolation algorithm that “analyzes the image and recognizes details and structures and ‘completes’ the image,” according to Topaz Labs’ website.

Effectively, Topaz taught an AI to accurately sharpen and clarify images even after they’ve been enlarged by as much as 600 percent. DAIN, on the other hand, imagines and inserts frames between the keyframes of an existing video clip. It’s the same concept as the motion smoothing feature on 4K TVs that nobody but your parents use. In this case, however, it added enough frames to increase the rate to 60 FPS.

These are both examples of upscaling technology, which has been an essential part of broadcast entertainment since 1998, when the first high definition televisions hit the market. Old school standard definition televisions displayed at 720×480 resolution, a total of 345,600 pixels worth of content that can be shown at one time. High definition televisions display at 1920×1080, or 2,073,600 total pixels — six times the resolution of SD — while 4K sets, with their 3840×2160 resolution need 8,294,400 pixels.

One needs to fill in an additional 6 million pixels to enlarge an HD image to fit on a 4K screen, so the more upscale has to figure out what to have those extra pixels display. This is where the interpolation process comes in. Interpolation estimates what each of those new pixels should display based on what the pixels around them are showing; however, there are several different ways in which to measure that.

The “nearest neighbour” method fills the blank pixels in with the same colour as their nearest neighbour. It’s simple and effective but results in a jagged, overtly pixelated image. Bilinear interpolation requires a bit more processing power but it enables the TV to analyze each blank pixel based on its two nearest neighbours and generate a gradient between them, which sharpens the image. Bicubic interpolation, on the other hand, samples from its 16 nearest neighbours. This results in accurate colouring but a blurry image yet, by combining the results of bilinear and bicubic interpolation, TVs can account for each processes’ shortcomings and generate upscaled images with minimal loss of optical quality (sharpness and the occasional artifact) compared to the original.

Related Articles

Leave a Reply

Back to top button