RTX Explained : How The RTX Cards Can Achieve Real Time Raytracing.

In 2018, Nvidia released the RTX line of graphics cards which had the capability of raytracing in real-time. unlike rasterization, a rendering method, used since the ’90s, mostly used in all games which do not produce physically accurate images, raytracing, on the other hand, is able to produce physically accurate images.
It determines the color of a pixel by tracing the path of light from the camera to the object to the light source. ray tracing calculates light as it would in the real world. From light bouncing through the scene as it does to reflections, when light reflects from one object to another, to shadows when light passes through an object, to refraction when light passes through transparent to semi-transparent objects.

How do the RTX cards ray trace more than 30fps in games?

1 Architecture.

In the RTX line of cards, Nvidia dropped the Pascal architecture used in the GTX cards and created an architecture which they called Turing.
Turing fuses rasterization, raytracing, AI, and simulation by using a new hardware-based accelerator and a hybrid rendering approach. on the hardware fronts, Turing takes advantage of the latest GDDR6 memory technology, a new GPU processor (streaming multiprocessor—SM). The SM delivers a boost in shading efficiency with a 50% improvement in delivered performance.
The new GPUs also feature the new RT Cores. these are accelerated units dedicated to performing raytracing operations. “RT” for raytracing. the brains for the Turing architecture. RT Cores accelerate Bounding Volume Hierarchy (BVH) traversal and ray/triangle intersection testing (ray casting) functions.
Essentially, the process of BVH traversal would need to be performed by shader operations and take thousands of instruction slots per ray cast to test against bounding box intersections in the BVH until finally hitting a triangle and the color at the point of intersection contributes to final pixel color (or if no triangle is hit, background color may be used to shade a pixel).
Raytracing without hardware acceleration needs thousands of software instruction slots per ray to test successively smaller bounding boxes in the BVH structure until possibly hitting a triangle which is very computationally intensive.
What the RT cores do is perform the BVH traversal and ray-triangle intersection testing, saving the SM from spending the thousand of instruction slots per ray. Imagine rendering just a 1000px by 1000px image, that will be 1,000,000 rays to be casted and traced.
Now, the SM only has to launch a ray probe and the RT core does the BVH traversal and ray-triangle tests and return a hit or no hit to the SM. leaving the SM free to do other work. using RT cores, Turing can now deliver more Giga rays/sec than the old Pascal.


In more basic words, NVIDIA bumped the specs of the new cards.
Also, included in Turing are Turing Tensor cores. These specialized execution units for performing the tensor/matrix operations. the operation that makes deep learning to work.
A significant improvement of the new Turing tensor cores over the old Volta tensor cores is Deep Learning Super Sampling (DLSS).
A new technique that uses a deep neural network to extract multidimensional features of the rendered scene and intelligently combines details from multiple frames to construct a high-quality final image
In early 2018, Microsoft announced DirectML for AI and DirectX Raytracing (DXR) APIs. Turing combines the power of the RT cores and Hardware with Microsoft’s new Raytracing API’s.

2 Hybrid Rendering

Raytracing, as we know, is very computationally intensive. Rasterisation is not but not produce great results. the best way of achieving real-time raytracing was by mixing Rasterization and raytracing together to form hybrid rendering.
Rasterisation will in a frame, render only particular elements of that frame(the elements that it can ender to look photoreal) while Raytracing will render the other elements which Rasterization simply fails to render photorealistic.
In a scene, Rasterization will render Direct Shadows(can also be raytraced) and Deferred Shading, while Raytracing will render lighting, reflections, global illumination, ambient occlusion, transparency and translucent all with help from AI and the DirectX Raytracing (DXR) APIs and tensor cores will handle post-processing.
Developers also have the ability to tweak what gets raytraced and what gets rasterized to optimize things. A technique might be to specify material properties to be rendered for example only materials with a certain reflectivity level gets raytraced.

3 Post Processing.

Typically, Raytracing suffers from noise. to evade this, an image will have to be rendered with many samples per pixel and many bounces that the rays are allowed to make. however,  the Turing architecture uses about 1-2 samples per pixel to achieve realtime rendering in games.
Real-time ray tracing/denoising solutions must be able to support dynamic scenes: dynamic camera, moving light sources, moving objects.
The denoiser uses information such as surface roughness, light hit distance, etc. to guide the filter. Denoising is mostly needing when rendering soft shadows and removing overall noise. Noise affects soft shadows the most when rendering.
The denoised images are significantly close to the ground truth images (images rendered with many about hundreds of samples per pixels). The detail is also close to the ground truth images though not completely. There is still a loss of detail
Denoising is handled post rendering. the aim of the denoising process is to denoise a 1080p image in approx. 1ms or less for gaming class GPUs

the right side of the image is the NVIDIA® OptiX™ AI-Accelerated Denoiser


This step combines input from multiple rendered frames, trying to remove visual artifacts such as aliasing while preserving detail. NVIDIA found that this will be the perfect place to apply AI.
They developed the Deep Neural Network (DNN) to solve this challenge of aliasing.
Deep Learning Super-Sampling (DLSS produces a much better result than TAA (Temporal Anti- Aliasing) a shader-based algorithm that combines two frames using motion vectors to determine where to sample the previous frame.
DNN produces a much higher quality image than TAA with the same level of input samples which improves overall performance.
DLSS uses AI  while TAA uses algorithms that can yield unpredictable results.
TAA renders at the final target resolution and then combines frames, subtracting detail, DLSS allows faster rendering at a lower input sample count, and then infers a result that at the target resolution is a similar quality to the TAA result, but with half the shading work.
Information for this article was gotten from the  Nvidia Developer Blog.