Authors: Utkarsh Dwivedi, Saksham Nagpal, Linda Zhu
TerrainX is an implementation of the paper Large-scale terrain authoring through interactive erosion simulation published on 28th July 2023 that aims to combine interactive large-scale terrain authoring with geomorphologically accurate erosion simulation.
- Parallelized Stream Power Erosion
- Raymarched Terrain
- Interactive Authoring
- Real-World Data Integration
- Performance Analysis
- Building
- Milestones
- Credits
To address the incremental and interactive resolution of the stream power equation, the authors address the most computationally expensive aspect of solving this equation - the drainage area. The authors propose a parallel approximation of the drainage area that results in a fast convergence rate for the stream power equation. We started off by writing a compute shader that simulates this approximated version of the equation, and our result was as follows:
To visualise the 3d terrain based on the computed height map, we follow the author's approach of employing sphere tracing, and the results looked like this:
![]() |
![]() |
---|---|
Normals as color | Lambertian Shading |
One of the main goals of the paper is to bridge the gap between interactive terrain authoring with geomorpholigcally accurate erosion simulation. For this, they present several tools, of which we formulated a subset:
Users can use Ctrl + Mouse-Click
and drag the mouse to uplift the terrain by the required amount driven by the brushStrength
parameter within a certain radius driven off of the brushRadius parameter
.
Using the same controls as the painting tool, users can toggle to erase the terrain instead by checking the eraseTerrain
option on the GUI.
Checkcing the customBrush
box will enable users to paint and erase using a texture-based brush. The reference textures are color gradient that comes from the original paper's demo video. We sample the g
channel value to add onto the terrain's uplift map. You can recognize the pattern of the original texture based on the brush shape.
The brushScale
represents the different mipmap levels of the original texture. Due to the inverse correlation between mip level and texture size, i.e. the smaller the mip level, the bigger the texture, at backend we calculate brushScale = MAX_SIZE - totalMipCount
such that at mip level=0, the brush size is the biggest in "scale".
Pattern 1 | Pattern 2 | Pattern 3 |
---|---|---|
![]() |
![]() |
![]() |
While our application provides some basic height maps to play around with, the user can also upload their own height map if needed by using the Upload Custom Height Map
button on the GUI controls:
Once a user clicks on the button, a file upload dialogue pops up and the user can select any height map of their choice:
Once selected, the height map name is reflected on the GUI name placeholder and the erosion simulation picks up the custom height map:
After establishing a usable model based on the paper, we wanted to see its applicability using some real world data. We used Earth Explorer to get the height field for a certain part of the world. This video was a helpful walkthrough that showed us how we could use Earth Explorer to get the required height maps.
First, we select a section of the terrain on Earth Explorer whose data we want to retrieve. We selected a section of the Himalayas:
Next, Earth Explorer shows the sections for which data is available and chops it up into individual textures of 1201X1201 pixels. We selected the one shown below:
Finally, we use this texture in our application, and we can see a visaulization of the Himalayas based on the Height Map used and running at a solid 80+ FPS:
Performance data captured on Windows 11 Home, AMD Ryzen 7 5800H @ 3.2GHz 16 GB, Nvidia GeForce RTX 3060 Laptop GPU 6 GB, running on Google Chrome
For analysing the impact of workgroup size on performance, a 2250x2250 texture was used with varying workgroup sizes.
FPS vs workgroup size |
---|
![]() |
8x8 is the optimal workgroup size, as there are no performance gains after that. This is the workgroup size used for the rest of the analysis.
Performance data captured on Windows 11 Home, AMD Ryzen 7 5800H @ 3.2GHz 16 GB, Nvidia GeForce RTX 3060 Laptop GPU 6 GB, running on Google Chrome
The steepest flow for the erosion is calculated not once, but twice, for each pixel. This calculation involves sampling a the neighbours of the current pixel in a 25x25
matrix area. This becomes costly with increasing texture resolutions.
Calculating this steepest flow only once, and then storing it in a buffer for reuse later, boosts performance significantly. This performance boost is more apparent with higher resolution textures, as with lower resolution textures the total number of neighbourhood samples across all pixels during a frame are not very high.
FPS: Steepest flow calculation optimisation using storage buffer | Memory Usage: Steepest flow optimisation using storage buffer |
---|---|
![]() |
![]() |
This does have an impact on memory with greater texture sizes (due to the additional steepest flow buffer), but this is acceptable for the performance gains.
Performance data captured on Windows 11 Home, AMD Ryzen 7 5800H @ 3.2GHz 16 GB, Nvidia GeForce RTX 3060 Laptop GPU 6 GB, running on Google Chrome
Our first iteration of the terrain rendering using raymarching used textureSample
.
textureSample
requires texture sampling to happen in uniform control flow. This means that the terrain raymarching could not use bounding box optimisations to only sample the heightfield when the ray hit the bounding box of the terrain. The texture needed to be sampled for every ray (every pixel!) and every frame. This also meant that many calculations that should only happen when the ray actually hit the terrain, were happening at all times. This slowed down performance significantly!
textureSampleLevel
is hardware accelerated, and does not have the same uniform control flow requirement as textureSample
. After making these changes we noticed significant performance boosts.
FPS: Axis-Aligned Bounding Box optimisations |
---|
![]() |
While both the steepest flow buffer and bounding box optimisations improve performance, their real benefit shows when both are combined.
FPS: both optimisations |
---|
![]() |
We're able to get ~40 fps on the web even with 4.5k size textures when both these optimisations are enabled!
Brandon Jones has a great artice on WebGPU Render Bundles, and had talked about the same in his guest lecture for our class CIS 5650 GPU Programming Fall 2023. According to the article, Render Bundles can assist with CPU-side optimizations by reducing the overhead with a large number of repeated Javascript calls. Although majority of our project has its processing being performed on the GPU side, we wanted to see if Render Bundles would affect our performance in any way or not. Rendering using Render Bundles can be toggled ON/OFF by using the Use Render Bundles
checkbox on the UI:
A performance comparison with/without using Render Bundles for our application across varying texture resolutions yielded the following result:
FPS: With VS Without Render Bundles |
---|
![]() |
As expected, we do not see any major performance improvement using Render Bundles as our application seems to be GPU bound whereas Render Bundles shine only with the CPU-side optimizations. In fact, on a high resolution 4K texture, Render Bundles seem to take a slight hit for our use case. We also asked about the same in the WebGPU Matrix Chat and our theory was corroborated with the same explanation in this thread.
Performance data captured on Windows 11, Intel i7-12800H @ 2.40GHz 16GB, NVIDIA GeForce RTX 3070 Ti, running on Google Chrome
In our compute pass, we have three pairs of ping-pong buffers for different data: heightfields, uplifts and stream area. These ping-pong buffers are all storage textures, which means that they have to be in rgba8unorm
format to be able to use the GPUTextureUsage.STORAGE_BINDING
flag. However, most of our inputs only have greyscale values. It would be wasteful to utilize only one channel, i.e. r
, out of four rgba
while going through frequent texture binding for ping-ponging. In theory, we only need an array of floats to hold the greyscale values for better memory usage, and that matches perfectly the concept of storage buffer in WebGPU.
We tried switching to use storage buffers for stream area because streams is the only data that's not impacted by GUI, i.e. no runtime change, and not used for any rendering stages, thus more manageable when it comes to refactoring. We ran both methods, stream area in rgba8unorm
format GPUTextures and storage buffers in array<f32>
, on various texture resolutions to see if the latter optimizes performance.
Memory Usage | FPS |
---|---|
![]() |
![]() |
As we can see from above charts, there is no big difference in FPS because changing the data structure mainly impacts on how we are storing the data, not how we are computing the data. Regarding the allocated memory, it is evident that as texture resolution increases, storage buffer in general does save more memory. The reason for the exception at 4K is unclear but my guess is either due to the stats-js
library we use or some Chrome broswer setting.
Note that the memory allocation here is at peak usage. No matter what heightfield and uplift resolution combination we have, eventually the memory usage all drops to ~30 MBs because of simulation convergence.
Performance data captured on Windows 11 Home, AMD Ryzen 7 5800H @ 3.2GHz 16 GB, Nvidia GeForce RTX 3060 Laptop GPU 6 GB, running on Google Chrome
There is an unused implementation of parallel reduction in the branch fixHeightrange
. This is used to calculate the highest height across all pixels in the heightfield. We compare a simple CPU side version of this to the parallel reduction version of this. None of the other optimisations mentioned above were active during this analysis. This parallel reduction is unused because the need to use this calculation was made redundant during a different change in the project.
FPS: CPU calculation vs parallel reduction |
---|
![]() |
It is evident that parallel reduction is significantly faster for texture resolutions up until 2k textures. Around 4.5k texture resolution the benefits gained from parallel reduction are not enough by themselves, and must be combined with other optimisations mentioned above.
webgpu-erosion-simulation
is built with Typescript
and compiled using Next.js. Building the project
requires an installation of Node.js.
- Install dependencies:
npm install
. - For development, start the dev server which will watch and recompile
sources:
npm start
. You can navigate to http://localhost:3000 to view the project. - To compile the project:
npm run build
.
Below are the documents that keep track of our work progress for each milestone. Each milestone is roughly one week.
- Authors of Large-scale terrain authoring through interactive erosion simulation
- WebGPU Samples
- 3d-view-controls npm package
- WebGPU Matrix Chat
- Brandon Jones
- WebGPU - All of the cores, none of the canvas - article on compute shaders in WebGPU by Surma
- WebGPU Fundamentals
- WebGPU Manual