Skip to content

Commit

Permalink
v4.11.0:
Browse files Browse the repository at this point in the history
HIGHLIGHTS:
- NRD: added "GetMaxAccumulatedFrameNum" helper, emphasizing the importance of time-based accumulation, also added "DEFAULT_ACCUMULATION_TIME" constants for all denoisers
- NRD: added "cameraAttachedReflectionMaterialID" to "CommonSettings" (currently used by REBLUR only), helping to deal with mirror reflections of objects attached to the camera. It's a suboptimal solution, future releases will address this problem better
- REBLUR/SIGMA: "stabilizationStrength" replaced with "maxStabilizedFrameNum" for consistency with other accumulation related settings and "GetMaxAccumulatedFrameNum" helper. The progression "0, 1, 2, 3... 9" is more understandable than "1.00, 0.50, 0.33, 0.25... 0.10"

BREAKING CHANGES:
- settings related (please, read UPDATE.md)
- "NRD_FrontEnd_UnpackNormalAndRoughness" returns "materialID" in the same range, it was passed to "NRD_FrontEnd_PackNormalAndRoughness"

DETAILS:
- NRD: added "GetMaxAccumulatedFrameNum" helper to the main header
- NRD: added "cameraAttachedReflectionMaterialID" to "CommonSettings" (currently used by REBLUR only, WIP longer term)
- NRD: "NRD_FrontEnd_UnpackNormalAndRoughness" returns "materialID" in the same range, it was passed to "NRD_FrontEnd_PackNormalAndRoughness"
- NRD: improved code reuse between REBLUR and RELAX, reduced divergence
- NRD: improved per frame rotators (better noise properties)
- NRD: interface polishing (no functional changes)
- RELAX: added clamping to user provided "max accumulated frame num" values
- RELAX: reduced memory usage of DIFFUSE denoiser (had a dummy allocation)
- RELAX: diff-spec lobe fraction replaced with one shared fraction (simplified settings)
- RELAX: partial refactoring to match overall NRD style
- REBLUR/RELAX: hot/cold branch sorting
- REBLUR/RELAX: improved parallax computations, affecting many aspects of specular tracking
- REBLUR/RELAX: improved curvature computations
- REBLUR/RELAX: added forgotten initialization of "gStrandThickness" constant (oops!)
- REBLUR: minor performance opts
- REBLUR: fixed minor regressions related to migration to stochastic bilinear filter for RGBA1010102 oct-packed normals (since prev data repacking is gone)
- REBLUR: stochastic bilinear gets replaced with native bilinear if normals are interpolate-able
- REBLUR: slightly decreased hit distance sensitivity in case of 0 roughness
- REBLUR: slightly reduced "lobeAngleFraction" default value to squeeze more normal details (using the old default is OK)
- REBLUR: "lobeHalfAngle" uses code from "GetNormalWeightParam" (reduced code entropy)
- REBLUR: adjusted virtual parallax based confidence to be a bit more aggressive
- REBLUR: improved screen space sampling (now used in more shaders without IQ loss, improves perf, WIP)
- REBLUR: "GetGeometryWeightParams" should not become more strict if history is long (the rest is handled by normal weights)
- REBLUR: minimized deprecated "GetBilateralWeight" usage
- REBLUR: relaxed maximum value of slope scaling for disocclusion threshold
- REBLUR: improved antilag (which didn't do much last year), also made almost FPS independent, tuned to "not hurt"
- REBLUR: finally fixed ancient and ugly issue leading to "bright lighting crawling", caused by unbiased (bright) fast history selectively accelerating the biased (fireflies suppressed) main history
- REBLUR/SIGMA: "stabilization strength" settings replaced with "max stabilized frame num"
- SIGMA: removed unnecessary checks in spatial filters
- SIGMA: "closest viewZ" replaced with "longest MV" logic in the temporal stabilization pass
- SIGMA: history copying decoupled into a separate pass
- SIGMA: added disocclusion tracking to get rid of ghosting
- SIGMA: fixed ugly white outlines (suboptimal weighting scheme between dense and sparse passes)
- SIGMA: tuned defaults settings
- SIGMA: introduced screen space sampling to better respect low-discrepancy based per frame rotators
- updated docs
- updated dependencies
- refactoring
  • Loading branch information
dzhdanNV committed Nov 5, 2024
1 parent b680731 commit dc0ba40
Show file tree
Hide file tree
Showing 67 changed files with 1,822 additions and 1,401 deletions.
2 changes: 1 addition & 1 deletion External/MathLib
10 changes: 4 additions & 6 deletions Include/NRD.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,12 @@ license agreement from NVIDIA CORPORATION is strictly prohibited.
#include <cstddef>

#define NRD_VERSION_MAJOR 4
#define NRD_VERSION_MINOR 10
#define NRD_VERSION_MINOR 11
#define NRD_VERSION_BUILD 0
#define NRD_VERSION_DATE "9 October 2024"
#define NRD_VERSION_DATE "5 November 2024"

#if defined(_MSC_VER)
#define NRD_CALL __fastcall
#elif !defined(__aarch64__) && !defined(__x86_64) && (defined(__GNUC__) || defined (__clang__))
#define NRD_CALL __attribute__((fastcall))
#if defined(_WIN32)
#define NRD_CALL __stdcall
#else
#define NRD_CALL
#endif
Expand Down
18 changes: 9 additions & 9 deletions Include/NRDDescs.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ license agreement from NVIDIA CORPORATION is strictly prohibited.
#pragma once

#define NRD_DESCS_VERSION_MAJOR 4
#define NRD_DESCS_VERSION_MINOR 10
#define NRD_DESCS_VERSION_MINOR 11

static_assert(NRD_VERSION_MAJOR == NRD_DESCS_VERSION_MAJOR && NRD_VERSION_MINOR == NRD_DESCS_VERSION_MINOR, "Please, update all NRD SDK files");

Expand Down Expand Up @@ -167,9 +167,9 @@ namespace nrd
- Optional inputs are in ()
*/

// =============================================================================================================================
//=============================================================================================================================
// REBLUR
// =============================================================================================================================
//=============================================================================================================================

// INPUTS - IN_DIFF_RADIANCE_HITDIST (IN_DIFF_CONFIDENCE, IN_DISOCCLUSION_THRESHOLD_MIX)
// OUTPUTS - OUT_DIFF_RADIANCE_HITDIST
Expand Down Expand Up @@ -211,9 +211,9 @@ namespace nrd
// OUTPUTS - OUT_DIFF_DIRECTION_HITDIST
REBLUR_DIFFUSE_DIRECTIONAL_OCCLUSION,

// =============================================================================================================================
//=============================================================================================================================
// RELAX
// =============================================================================================================================
//=============================================================================================================================

// INPUTS - IN_DIFF_RADIANCE_HITDIST (IN_DIFF_CONFIDENCE, IN_DISOCCLUSION_THRESHOLD_MIX)
// OUTPUTS - OUT_DIFF_RADIANCE_HITDIST
Expand All @@ -239,9 +239,9 @@ namespace nrd
// OUTPUTS - OUT_DIFF_SH0, OUT_DIFF_SH1, OUT_SPEC_SH0, OUT_SPEC_SH1
RELAX_DIFFUSE_SPECULAR_SH,

// =============================================================================================================================
//=============================================================================================================================
// SIGMA
// =============================================================================================================================
//=============================================================================================================================

// INPUTS - IN_PENUMBRA, OUT_SHADOW_TRANSLUCENCY
// OUTPUTS - OUT_SHADOW_TRANSLUCENCY
Expand All @@ -251,9 +251,9 @@ namespace nrd
// OUTPUTS - OUT_SHADOW_TRANSLUCENCY
SIGMA_SHADOW_TRANSLUCENCY,

// =============================================================================================================================
//=============================================================================================================================
// REFERENCE
// =============================================================================================================================
//=============================================================================================================================

// INPUTS - IN_SIGNAL
// OUTPUTS - OUT_SIGNAL
Expand Down
146 changes: 90 additions & 56 deletions Include/NRDSettings.h

Large diffs are not rendered by default.

21 changes: 15 additions & 6 deletions Integration/NRDIntegration.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,7 @@ void Integration::Dispatch(nri::CommandBuffer& commandBuffer, nri::DescriptorPoo
{
const ResourceDesc& nrdResource = dispatchDesc.resources[n];

// Get texture
nri::TextureBarrierDesc* nrdTexture = nullptr;
if (nrdResource.type == ResourceType::TRANSIENT_POOL)
nrdTexture = &m_TexturePool[nrdResource.indexInPool + instanceDesc.permanentPoolSize];
Expand All @@ -578,13 +579,19 @@ void Integration::Dispatch(nri::CommandBuffer& commandBuffer, nri::DescriptorPoo
NRD_INTEGRATION_ASSERT(nrdTexture && nrdTexture->texture, "'userPool' entry can't be NULL if it's in use!");
}

const nri::AccessBits nextAccess = nrdResource.descriptorType == DescriptorType::TEXTURE ? nri::AccessBits::SHADER_RESOURCE : nri::AccessBits::SHADER_RESOURCE_STORAGE;
const nri::Layout nextLayout = nrdResource.descriptorType == DescriptorType::TEXTURE ? nri::Layout::SHADER_RESOURCE : nri::Layout::SHADER_RESOURCE_STORAGE;
bool isStateChanged = nextAccess != nrdTexture->after.access || nextLayout != nrdTexture->after.layout;
bool isStorageBarrier = nextAccess == nri::AccessBits::SHADER_RESOURCE_STORAGE && nrdTexture->after.access == nri::AccessBits::SHADER_RESOURCE_STORAGE;
// Prepare barrier
nri::AccessLayoutStage next = {};
if (nrdResource.descriptorType == DescriptorType::TEXTURE)
next = {nri::AccessBits::SHADER_RESOURCE, nri::Layout::SHADER_RESOURCE, nri::StageBits::COMPUTE_SHADER};
else
next = {nri::AccessBits::SHADER_RESOURCE_STORAGE, nri::Layout::SHADER_RESOURCE_STORAGE, nri::StageBits::COMPUTE_SHADER};

bool isStateChanged = next.access != nrdTexture->after.access || next.layout != nrdTexture->after.layout;
bool isStorageBarrier = next.access == nri::AccessBits::SHADER_RESOURCE_STORAGE && nrdTexture->after.access == nri::AccessBits::SHADER_RESOURCE_STORAGE;
if (isStateChanged || isStorageBarrier)
transitions[transitionBarriers.textureNum++] = nri::TextureBarrierFromState(*nrdTexture, {nextAccess, nextLayout}, 0, 1);
transitions[transitionBarriers.textureNum++] = nri::TextureBarrierFromState(*nrdTexture, next);

// Create descriptor
uint64_t resource = m_NRI->GetTextureNativeObject(*nrdTexture->texture);
uint64_t key = CreateDescriptorKey(resource, isStorage);
const auto& entry = m_CachedDescriptors.find(key);
Expand All @@ -607,6 +614,9 @@ void Integration::Dispatch(nri::CommandBuffer& commandBuffer, nri::DescriptorPoo
}
}

// Barriers
m_NRI->CmdBarrier(commandBuffer, transitionBarriers);

// Allocating descriptor sets
uint32_t descriptorSetSamplersIndex = instanceDesc.constantBufferSpaceIndex == instanceDesc.samplersSpaceIndex ? 0 : 1;
uint32_t descriptorSetResourcesIndex = instanceDesc.resourcesSpaceIndex == instanceDesc.constantBufferSpaceIndex ? 0 : (instanceDesc.resourcesSpaceIndex == instanceDesc.samplersSpaceIndex ? descriptorSetSamplersIndex : descriptorSetSamplersIndex + 1);
Expand Down Expand Up @@ -664,7 +674,6 @@ void Integration::Dispatch(nri::CommandBuffer& commandBuffer, nri::DescriptorPoo
m_NRI->UpdateDescriptorRanges(*descriptorSets[descriptorSetResourcesIndex], instanceDesc.samplersSpaceIndex == instanceDesc.resourcesSpaceIndex ? 1 : 0, pipelineDesc.resourceRangesNum, resourceRanges);

// Rendering
m_NRI->CmdBarrier(commandBuffer, transitionBarriers);
m_NRI->CmdSetPipelineLayout(commandBuffer, *pipelineLayout);

nri::Pipeline* pipeline = m_Pipelines[dispatchDesc.pipelineIndex];
Expand Down
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# NVIDIA REAL-TIME DENOISERS v4.10.0 (NRD)
# NVIDIA REAL-TIME DENOISERS v4.11.0 (NRD)

[![Build NRD SDK](https://github.com/NVIDIAGameWorks/RayTracingDenoiser/actions/workflows/build.yml/badge.svg)](https://github.com/NVIDIAGameWorks/RayTracingDenoiser/actions/workflows/build.yml)

Expand All @@ -15,11 +15,11 @@ For quick starting see *[NRD sample](https://github.com/NVIDIAGameWorks/NRDSampl
- *RELAX* - A-trous based denoiser, has been designed for *[RTXDI (RTX Direct Illumination)](https://developer.nvidia.com/rtxdi)*
- *SIGMA* - shadow-only denoiser

Performance on RTX 4080 @ 1440p (native resolution, default denoiser settings):
- `REBLUR_DIFFUSE_SPECULAR` - 2.45 ms
- `RELAX_DIFFUSE_SPECULAR` - 2.90 ms
- `SIGMA_SHADOW` - 0.35 ms (0.25 ms if temporal stabilization is off)
- `SIGMA_SHADOW_TRANSLUCENCY` - 0.45 ms (0.32 ms if temporal stabilization is off)
Performance on RTX 4080 @ 1440p (native resolution, default denoiser settings, `NormalEncoding::R10_G10_B10_A2_UNORM`):
- `REBLUR_DIFFUSE_SPECULAR` - 2.40 ms (2.15 in performance mode)
- `RELAX_DIFFUSE_SPECULAR` - 2.95 ms
- `SIGMA_SHADOW` - 0.40 ms
- `SIGMA_SHADOW_TRANSLUCENCY` - 0.50 ms

Supported signal types:
- *RELAX*:
Expand Down Expand Up @@ -322,14 +322,14 @@ The *Persistent* column (matches *NRD Permanent pool*) indicates how much of the
| | REBLUR_DIFFUSE_SPECULAR_OCCLUSION | 67.94 | 38.12 | 29.81 |
| | REBLUR_DIFFUSE_SPECULAR_SH | 266.12 | 105.62 | 160.50 |
| | REBLUR_DIFFUSE_DIRECTIONAL_OCCLUSION | 84.56 | 42.25 | 42.31 |
| | RELAX_DIFFUSE | 99.25 | 63.31 | 35.94 |
| | RELAX_DIFFUSE | 90.81 | 54.88 | 35.94 |
| | RELAX_DIFFUSE_SH | 158.31 | 88.62 | 69.69 |
| | RELAX_SPECULAR | 101.44 | 63.38 | 38.06 |
| | RELAX_SPECULAR_SH | 168.94 | 97.12 | 71.81 |
| | RELAX_DIFFUSE_SPECULAR | 168.94 | 97.12 | 71.81 |
| | RELAX_DIFFUSE_SPECULAR_SH | 303.94 | 164.62 | 139.31 |
| | SIGMA_SHADOW | 15.00 | 0.00 | 15.00 |
| | SIGMA_SHADOW_TRANSLUCENCY | 33.94 | 0.00 | 33.94 |
| | SIGMA_SHADOW | 31.88 | 8.44 | 23.44 |
| | SIGMA_SHADOW_TRANSLUCENCY | 50.81 | 8.44 | 42.38 |
| | REFERENCE | 33.75 | 33.75 | 0.00 |
| | | | | |
| 1440p | REBLUR_DIFFUSE | 150.06 | 75.00 | 75.06 |
Expand All @@ -342,14 +342,14 @@ The *Persistent* column (matches *NRD Permanent pool*) indicates how much of the
| | REBLUR_DIFFUSE_SPECULAR_OCCLUSION | 120.06 | 67.50 | 52.56 |
| | REBLUR_DIFFUSE_SPECULAR_SH | 472.56 | 187.50 | 285.06 |
| | REBLUR_DIFFUSE_DIRECTIONAL_OCCLUSION | 150.06 | 75.00 | 75.06 |
| | RELAX_DIFFUSE | 176.31 | 112.50 | 63.81 |
| | RELAX_DIFFUSE | 161.31 | 97.50 | 63.81 |
| | RELAX_DIFFUSE_SH | 281.31 | 157.50 | 123.81 |
| | RELAX_SPECULAR | 180.06 | 112.50 | 67.56 |
| | RELAX_SPECULAR_SH | 300.06 | 172.50 | 127.56 |
| | RELAX_DIFFUSE_SPECULAR | 300.06 | 172.50 | 127.56 |
| | RELAX_DIFFUSE_SPECULAR_SH | 540.06 | 292.50 | 247.56 |
| | SIGMA_SHADOW | 26.38 | 0.00 | 26.38 |
| | SIGMA_SHADOW_TRANSLUCENCY | 60.12 | 0.00 | 60.12 |
| | SIGMA_SHADOW | 56.38 | 15.00 | 41.38 |
| | SIGMA_SHADOW_TRANSLUCENCY | 90.12 | 15.00 | 75.12 |
| | REFERENCE | 60.00 | 60.00 | 0.00 |
| | | | | |
| 2160p | REBLUR_DIFFUSE | 318.88 | 159.38 | 159.50 |
Expand All @@ -362,14 +362,14 @@ The *Persistent* column (matches *NRD Permanent pool*) indicates how much of the
| | REBLUR_DIFFUSE_SPECULAR_OCCLUSION | 255.06 | 143.44 | 111.62 |
| | REBLUR_DIFFUSE_SPECULAR_SH | 1004.12 | 398.44 | 605.69 |
| | REBLUR_DIFFUSE_DIRECTIONAL_OCCLUSION | 318.88 | 159.38 | 159.50 |
| | RELAX_DIFFUSE | 374.69 | 239.12 | 135.56 |
| | RELAX_DIFFUSE | 342.81 | 207.25 | 135.56 |
| | RELAX_DIFFUSE_SH | 597.81 | 334.75 | 263.06 |
| | RELAX_SPECULAR | 382.69 | 239.12 | 143.56 |
| | RELAX_SPECULAR_SH | 637.69 | 366.62 | 271.06 |
| | RELAX_DIFFUSE_SPECULAR | 637.69 | 366.62 | 271.06 |
| | RELAX_DIFFUSE_SPECULAR_SH | 1147.69 | 621.62 | 526.06 |
| | SIGMA_SHADOW | 56.19 | 0.00 | 56.19 |
| | SIGMA_SHADOW_TRANSLUCENCY | 127.81 | 0.00 | 127.81 |
| | SIGMA_SHADOW | 119.94 | 31.88 | 88.06 |
| | SIGMA_SHADOW_TRANSLUCENCY | 191.56 | 31.88 | 159.69 |
| | REFERENCE | 127.50 | 127.50 | 0.00 |

# INTEGRATION VARIANTS
Expand Down Expand Up @@ -636,7 +636,7 @@ When denoising reflections in pure mirrors, some advantages can be reached if *N
Notes, requirements and restrictions:
- the primary hit (0th bounce) gets replaced with the first "non-pure mirror" hit in the bounce chain - this hit becomes *PSR*
- all associated data in the g-buffer gets replaced by *PSR* data
- the camera "sees" PSR like the mirror surface in-between don't exist. This space is called virtual world space
- the camera "sees" PSR like the mirror surface in-between doesn't exist. This space is called virtual world space
- virtual space position lies on the same view vector as the primary hit position, but the position is elongated. Elongation depends on `hitT` and curvature at hits, starting from the primary hit
- virtual space normal is the normal at *PSR* hit mirrored several times in the reversed order until the primary hit is reached
- *PSR* data is NOT always data at the *PSR* hit!
Expand Down Expand Up @@ -674,7 +674,7 @@ IN_MV = GetMotionAt( B );
## INTERACTION WITH FRAME GENERATION TECHNIQUES
Frame generation (FG) techniques boost FPS by interpolating between 2 last available frames. *NRD* works better when framerate increases, because it gets more data per second. It's not the case for FG, because all rendering pipeline underlying passes (like, denoising) continue to work on the original non-boosted framerate.
Frame generation (FG) techniques boost FPS by interpolating between 2 last available frames. *NRD* works better when frame rate increases, because it gets more data per second. It's not the case for FG, because all rendering pipeline underlying passes (like, denoising) continue to work on the original non-boosted framerate. `GetMaxAccumulatedFrameNum` helper should get a real FPS, not a fake one.
## HAIR DENOISING TIPS
Expand Down Expand Up @@ -704,7 +704,7 @@ Hair strands tangent vectors *can't* be used as "normals guide" for *NRD* due to
**[NRD]** *NRD* requires non-jittered matrices.
**[NRD]** Most of denoisers do not write into output pixels outside of `CommonSettings::denoisingRange`.
**[NRD]** Most denoisers do not write into output pixels outside of `CommonSettings::denoisingRange`.
**[NRD]** When upgrading to the latest version keep an eye on `ResourceType` enumeration. The order of the input slots can be changed or something can be added, you need to adjust the inputs accordingly to match the mapping. Or use *NRD integration* to simplify the process.
Expand Down Expand Up @@ -771,7 +771,7 @@ maxAccumulatedFrameNum > maxFastAccumulatedFrameNum > historyFixFrameNum
**[SIGMA]** Using "blue" noise helps to minimize shadow shimmering and flickering. It works best if the pattern has limited number of animated frames (4-8) or it is static on the screen.
**[SIGMA]** *SIGMA* can be used for multi-light shadow denoising if applied "per light". `SigmaSettings::stabilizationStrength` can be set to `0` to disable temporal history. It provides the followinmg benefits:
**[SIGMA]** *SIGMA* can be used for multi-light shadow denoising if applied "per light". `SigmaSettings::stabilizationStrength` can be set to `0` to disable temporal history. It provides the following benefits:
- light count independent memory usage
- no need to manage history buffers for lights
Expand Down
2 changes: 1 addition & 1 deletion Resources/Version.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Versioning rules:
*/

#define VERSION_MAJOR 4
#define VERSION_MINOR 10
#define VERSION_MINOR 11
#define VERSION_BUILD 0

#define VERSION_STRING STR(VERSION_MAJOR.VERSION_MINOR.VERSION_BUILD encoding=NRD_NORMAL_ENCODING.NRD_ROUGHNESS_ENCODING)
13 changes: 7 additions & 6 deletions Shaders.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -230,14 +230,15 @@ RELAX_DiffuseSpecularSh_PrePass.cs.hlsl -T cs
RELAX_DiffuseSpecularSh_SplitScreen.cs.hlsl -T cs
RELAX_DiffuseSpecularSh_TemporalAccumulation.cs.hlsl -T cs
RELAX_Validation.cs.hlsl -T cs
SIGMA_ShadowTranslucency_Blur.cs.hlsl -T cs
SIGMA_ShadowTranslucency_ClassifyTiles.cs.hlsl -T cs
SIGMA_ShadowTranslucency_PostBlur.cs.hlsl -T cs
SIGMA_ShadowTranslucency_SplitScreen.cs.hlsl -T cs
SIGMA_ShadowTranslucency_TemporalStabilization.cs.hlsl -T cs
SIGMA_Copy.cs.hlsl -T cs
SIGMA_SmoothTiles.cs.hlsl -T cs
SIGMA_Shadow_Blur.cs.hlsl -T cs
SIGMA_Shadow_ClassifyTiles.cs.hlsl -T cs
SIGMA_Shadow_PostBlur.cs.hlsl -T cs
SIGMA_Shadow_SmoothTiles.cs.hlsl -T cs
SIGMA_Shadow_SplitScreen.cs.hlsl -T cs
SIGMA_Shadow_TemporalStabilization.cs.hlsl -T cs
SIGMA_ShadowTranslucency_Blur.cs.hlsl -T cs
SIGMA_ShadowTranslucency_ClassifyTiles.cs.hlsl -T cs
SIGMA_ShadowTranslucency_PostBlur.cs.hlsl -T cs
SIGMA_ShadowTranslucency_SplitScreen.cs.hlsl -T cs
SIGMA_ShadowTranslucency_TemporalStabilization.cs.hlsl -T cs
Loading

0 comments on commit dc0ba40

Please sign in to comment.