Neural Network Based Rendering, The Road Ahead
Real-time, neural net based rendering is still in an early stage of its development life-cycle. This means that there is still room for bold, speculative innovation. In this blog we will make a sustained attempt to deliver on that potential.
There are several long-term trends working in our favor.
❶ Semi-Conductor Trends
While Dennard Scaling has slowed down substantially, and clock frequency increases have all but ceased entirely; Moore’s Law itself is less dead than many had assumed. The density increases are not quite on schedule, but they are not too far off either. (*1a)
The first fall-out from this sudden imbalance was the proliferation of multi-core architectures. The second is an over-abundance of compute power, mostly wasted in idle clock cycles due to a lack of data. For a time, this has been partially counteracted by radically increased SRAM cache sizes. (*1b)
But yet again the balance shifts: Logic density improved by 80% from N7 to N5, but SRAM density by only 35%. From N5 to N3 logic density will improve by 70% and SRAM density by 20%. We are moving further towards a world where compute is readily available, but data is not.
Neural nets are able to make good use of abundant computation, and often have data dependency patterns with highly local characteristics. This is unusual and advantageous. It makes neural nets a natural fit for the coming generation of semi-conductor devices.
Current neural net implementations do not take full advantage of this, but all the pieces are there.
❷ Cultural Trends
AI program logic (its weight matrix) is allowed to be incomprehensible, even though other programs are required to be readable and make sense. This inconsistency in our cultural expectations grant neural nets many advantages over traditional programs.
The program logic learned and expressed by neural net weights is not merely a form of spaghetti-code; it is branch-less, vectorized, multi-threaded spaghetti-code. To a software engineer such code is forbidden to the point of sacrilege. It is also the holy grail of performance optimization.
AI as a socio-cognitive concept represents a reset of expectations, a return to the optimism of the 1950s, and many things we could but would not do are now possible. (*2)
❸ Asset Trends
Explicit categorization and standardization is part of the engineering tradition. It is a contributing factor to its historical success. However, such practices also introduce scalability challenges in asset creation and utilization.
Machine learning has the potential to be more flexible in this regard, synthesizing millions of partial or irregular assets into a coherent inner representation. This also applies during run-time: by no longer separating data storage (and program logic) into neatly labeled units, unnecessary data latencies that we have previously taken for granted can be reduced or eliminated.
As stronger hardware allows for more ambitious asset usage, affording and managing such assets at scale becomes increasingly challenging. Machine learning might, at the end of the day, be easier than the alternatives.
❹ Investment Trends
When generative models are at their best, the images they produce are already photo-level quality. The issue is that for real-time rendering to work well the generative model would need to be at its best every frame, 120 frames per second.
Government-funded machine learning academia pursue goals somewhat different from ours. Our needs are not their needs. This shows in the neural nets they architect. Even so, our goals have perhaps never been closer than they are now.
The national governments of our world have in the last 15 years, remarkably and perhaps unprecedentedly, adapted to us rather than the other way around. In a standardization power-struggle between the video-game industry and society, we won. They use our GPUs and build on top of our standards. (*4)
As a side-effect of this the fruits of their research efforts are readily available to us. This is tremendously valuable and we should do our best to stay as compatible as possible going forward.
We are taking advantage of all these trends, though some parts of the design are further along than others. The contents of this blog will range from GPU hardware analysis, and machine learning architecture, to practical neural net training optimizations. Data dependencies and memory latency will be used as a basis for rethinking architectural assumptions. Even though the general case can be restrictive, the world ultimately consists of special cases.
(*1a) Going by TSMC, 20nm➞16nm in 2015 was the obvious low point at 0% density improvement, despite what the two node names involved would imply. Because the power related issues of 20nm were convincingly resolved, 16nm was still an extremely well received node.
(*1b) 3D Stacking introduces some further nuance to the situation. The sweet spot cost per transistor does not truly go down, but somewhat-worse-than-linear cost scaling for SRAM now extends further; even beyond what would have been the reticle limit. This has not been the case historically, and will be a boon for L3 starved high-budget devices.
(*2) Access to higher quality tools and hardware is also an important factor. Regarding the optimism, it is mostly limited to the rate of technological progress itself. Public expectations regarding the societal effects of AI are more in line with the sense of anxiety in early 20th century Europe.
(*4) The significance of this was not lost on scientists at the time. Early in the transition commentary like “Video games are essentially physics simulations, so it is only natural that GPUs are a good fit for quantum chemistry simulations” was rather common. They could adapt to us because we had already adapted to them.