Casually Pathtracing with WebGPU Part 1

This is the first post part of a series spread across a somewhat long period of time and several WebGPU API versions. I started working on this out of passion and curiosity. I’ve been working in computer graphics for a while now but it’s been mostly rasterization, hardware accelerated or otherwise. Besides a toy raytracer I wrote way back in the day with Java (don’t ask), I never got to write my own true pathtracer/raytracer. Additionally, my graphics experience so far was limited to older generation graphic APIs like OpenGL, so I never had the chance to delve into modern APIs like Vulkan.

Now, maybe the biggest question here is “Why WebGPU?!”. I mean, it’s an ever-changing standard (currently), it’s more limited and restrictive, it’s running in a web-browser for Christ’s sake! While this may be true, there are also some theoretical advantages to it, like portability, ease of access and use, and finally, the fact that it’s running inside a web browser is an advantage as much as it a disadvantage, but from different points of view.

“OK, so what are we doing?”

Gilbert Strang

Ok, so we want to write a pathtracer using WebGPU, we first need to establish some primary goals:

One thing that I don’t want to make is another toy tracer which works exclusively with parametric shapes. There are tons of toy tracers out there, some written in WebGL too, and don’t get me wrong, some of them are really cool. However they are somewhat limited in what they can actually display on the screen, and another thing I want to avoid is to come up with complex combinations of SDFs in order to get more complex shapes to display. That being said, we just determined that we’ll be tracing polygon meshes, probably triangles. This way, we can edit our geometry using regular third party applications like Blender, import them, and render them directly.

Since it’s going to be WebGPU, we’ll do our tracing using compute shaders. Normally, software which specializes in hardware accelerated ray tracing, uses CUDA or OpenCL to achieve it. However, that’s off the table for us.

What about the material system and rendering? It’s a bit early to say, however, I do intend to have physically based materials. We’ll start really simple with basic shading and move upwards form there.

What about all the rest of the features any raytracer should have, like reflections, refraction, caustics, GI, etc? Again, I’d rather not get ahead of myself and just let things flow naturally. We’ll see which features we’ll have time to implement, and when.

And also… we want it in real time. It’s a bit of a stretch I know, so let’s just say we want it to run at an interactive frame rate.

The quote from above, belongs to prof. Gilbert Strang. I watched several of his lectures on calculus here, which I highly recommend. During his lectures, he sometimes asks “OK, so what are we doing?”. He explained why, saying that he finds it very important and practical, as it helps to re-focus and cement the identity of the real task at hand. We’re at this point now.

Starting out

Before trying to start on actual implementations, I wanted to check out other simple ray tracing implementations using compute shaders. I found several. Most of them were written in C++ (like sane people do) and provided good insight. I found a single implementation for WebGPU here. Which is largely based on a simple raytracer built using webgl2-compute here. The latter is a very good starting example, regardless of the fact that webgl2-compute is no longer a thing.

With a minimal example available, next thing on the list was to choose a solid dev environment. This was happening really early 2021. The browser I chose is Chromium with Dawn as the implementation for WebGPU. I first tried on Windows, where most of WebGPU’s implementation was working more or less, however it had an issue with storage textures. The WebGPU context was breaking whenever I tried using a storage texture. This made it virtually impossible for me to continue development. I skipped Apple completely and moved to my old Linux laptop. With some effort and a little pain (I recently updated the OS version and was having issues with the nvidia drivers), I managed to have WebGPU fully working in Chromium on Linux. (Note: Later on in 2021, the issue with Windows and storage textures got fixed)

What’s the first thing you do, when you first try out a new graphics API? You render a triangle, of course!

With this out of the way, it was time to do some actual tracing. I wanted to start simple. Really simple. So the obvious choice was to try and trace some parametric shapes. I went for spheres since they’re the easiest.

Note: I didn’t post any code regarding the triangle nor the spheres, since they’re just for testing purposes. For the next stage, where we trace actual geometry I will post small snippets which I find more important, however the best source code you can use for reference is Oktomus’s Implementation (1) which my code is based on for now. Additionally I found these WebGPU samples(2) very useful for a quick practical reference to the WebGPU API for various use cases.

Now that I have some path tracing going on, it’s time to throw some real geometry at it. I still want to keep it simple, so my scene of choice is the Cornell box. Oktomus’s implementation (1) was also using the Cornell box, which was great because I could easily compare results.

32 triangles scene, brute force

Right, so at this point we have something. We’re path tracing a scene composed of real triangle geometry. No more parametric shapes. Up till now, the code is more or less the same as Oktomus’s(1) . Let’s try to expand a bit upon it.

More triangles

The Cornell scene is nice and all, however it contains a meager 32 triangles. Real scenes contain thousand, tens of thousands, hundred thousands triangles, millions of triangles. Let’s add a mesh with a few more triangles than a cube

1k triangles scene, brute force

The Suzanne mesh has 968 triangles. In total, we have a scene of 1000 triangles. It’s so slow that it hurts! No surprise really. Let’s see why this happens.

This is the function that computes the resulting color for a given primary ray:

// Compute the color for a given ray.
vec3 color(Ray r, inout float seed, vec2 pixel)
   float t;
    vec3 n;
    int max_depth = 2;
    int depth = 0;
    int mesh_indice;

    vec3 res = vec3(0.0);

    int light_mesh_indice = 0;
    Mesh light = meshes[light_mesh_indice];
    int light_count = light.triangle_count;

    while (depth < max_depth 
        && hit_world(r, EPSILON, MAX_FLOAT, t, mesh_indice, n) 
        && t > 0.0)
        Mesh mesh = meshes[mesh_indice];
        vec3 surface_normal = n;

        // Primary ray hit a light, stop.
        if (mesh.emission != vec3(0.0) && depth == 0)
            return mesh.emission;

        vec3 hit_point = ray_at(r, t);

        // Consider hit.
        if (mesh.emission == vec3(0.0))
            float light_pdf = 0.0;

            vec3 light_point = random_point_on_mesh(light, seed, pixel, light_pdf);

            vec3 lh = light_point - hit_point;
            float dist = length(lh);

            // Trace a shadow ray.
            Ray shadow_ray;
            shadow_ray.origin = hit_point;
            shadow_ray.direction = normalize(lh);

            if (!hit_world(shadow_ray, EPSILON, dist, t, mesh_indice, n) 
                || mesh_indice == light_mesh_indice)
                // Direct lighting contribution.
                res += light_pdf * mesh.diffuse * light.emission * abs(dot(surface_normal, shadow_ray.direction));

        // Bounce.
        vec2 s = rand2(seed, pixel);
        vec3 target = hit_point + n + sample_sphere_uniform(s);
        r.origin = hit_point;
        r.direction = normalize(target - r.origin);


    return res / float(depth);

This function is executed for each pixel on the screen where we trace a ray (primary ray) from. There are several things going on, but we won’t get into each of them now. We’re currently interested in the intersections testing that’s happening between the rays and the scene geometry. Each time we want to know whether a ray hit something in the scene we call the hit_world function.

bool hit_world(Ray r, float t_min, float t_max, inout float t, inout int mesh_indice, inout vec3 n) {
    bool does_hit = false;
    t = 0.0;
    float best_min_t = t_max;

    mesh_indice = -1;

    for (int i = 0; i < meshes.length(); ++i)
        Mesh mesh = meshes[i];

        for (int j = 0; j < mesh.triangle_count * 3; j += 3)
            vec3 v0 = vertices[triangles[mesh.offset + j]];
            vec3 v1 = vertices[triangles[mesh.offset + j + 1]];
            vec3 v2 = vertices[triangles[mesh.offset + j + 2]];

            if (hit_triangle_mt(r, v0, v1, v2, t) && t >= t_min && t < t_max && t < best_min_t)
                best_min_t = t;
                does_hit = true;
                mesh_indice = i;
                n = normalize(cross(v1 - v0, v2 - v0));


    if (does_hit)
        t = best_min_t;

    return does_hit;

We immediately see that we check the ray’s intersection with all the triangles in the scene. For the Cornell scene, with it’s 32 triangles, it wasn’t such a big deal. But now that we have almost 1000, things aren’t working out.

Before we start on improving our intersection testing approach, it would be good to have an idea of the current time it takes to render one frame, and other useful statistics. Unfortunately, back when I was working on this, timestamp queries were not supported in WebGPU (and still aren’t at the time of this writing), however atomic operations were (and currently aren’t anymore at the time of this writing). To simplify, we’ll set the depth of ray tracer to 1, meaning that light will not bounce.

Average Frame Time: 85.03 ms
Primary rays: 262144
Triangles tested: 380979720
Triangles hit: 240782
Shadow rays hit: 91110

Right, so for our scene consisting of almost 1k triangles, at a resolution of 512×512, in each frame we’re testing 380+ million triangles. That’s a bit excessive to say the least. We need to seriously cut down on the number of ray-triangle intersections.

Simple Acceleration

Ray and Kajiya wrote an iconic paper(6) back in ’86 with a simple solution for accelerating ray tracing of more complex scenes. We can learn from it, try to adopt some ideas from it and see how it goes. We can define a bounding box for each of our objects. Each time we want to test a mesh with a ray, instead of going blindly through all it’s triangles, we first test it against this bounding box. We will implement the ray-box intersection using the slab method defined by (6)

bool rayIntersectWithBox(vec3 boxMin, vec3 boxMax, Ray r) {

    vec3 tbot = (boxMin - r.origin) * r.invDir;
    vec3 ttop = (boxMax - r.origin) * r.invDir;
    vec3 tmin = min(ttop, tbot);
    vec3 tmax = max(ttop, tbot);
    vec2 t = max(tmin.xx, tmin.yz);
    float t0 = max(t.x, t.y);
    t = min(tmax.xx, tmax.yz);
    float t1 = min(t.x, t.y);

    return t1 >= max(t0, 0.0);

Now let’s look at the new stats:

Average Frame Time: 14.5 ms
Primary rays: 262144
Triangles tested: 39451030
Triangles hit: 237662
Shadow rays hit: 91156

The improvement is substantial, almost an order of magnitude. In fact, let’s increase the ray tracing depth back to 2.

1k triangles scene, with bounding box testing

With a simple improvement like this, we’re back in interactive land. We’re confident enough now to increase our scene’s triangle count.Let’s make a new scene, ditch the Cornell box setup and ramp up the triangles a bit. Let’s say we’ll do around 6k triangles now. I mean, the bounding box improvement seems to be doing the trick, right?

6k triangles scene, with bounding box testing

Dead wrong! At only a little over 6k triangles we’re back where we started off. The frame time here is over 500 ms and we’re back in the hundreds of millions of triangle tests. The truth is that our simple bounding boxes are not nearly efficient enough. In this last scene, the Suzanne mesh has almost 6k triangles. We put a box around it and test it against the incoming rays. Quite a few of those rays will intersect the box but not any of the triangles, since the box is not a tight fit around the mesh. Moreover, even if we get a true positive hit on the box, and the ray does indeed intersect one of triangles of the mesh, our ray tracer still goes through the rest of the mesh’s triangles before it chooses the closest hit.

There are probably a few other improvements we can implement for our bounding boxes approach, however they still won’t be enough. We need a much better acceleration structure for our geometry. One such acceleration structure is a Bounding Volume Hierarchy (BVH) which we will be implementing in the next post

Hold up

There’s something wrong with our rendering.

Flat Normals visible

The Suzanne mesh and curved plane in the background have flat normals. That’s because in the original reference we’ve been working with, we only send the position attribute for each vertex in our geometry buffer, and the normal is compute simply by taking the cross product of two edges.

normal = normalize(cross(tri.vertices[1].position - tri.vertices[0].position, tri.vertices[2].position - tri.vertices[0].position));

We’re going to need to add the per vertex normals too. Additionally we’ll be adding uvs, for later use.

meshes.forEach((mesh) => {
    const vertices = mesh.vertices;
    const normals = mesh.normals;
    const uvs = mesh.uvs;
    for (let i = 0; i < vertices.length; i += 3) {
        _verticesBuffer[index++] = vertices[i];
        _verticesBuffer[index++] = vertices[i + 1];
        _verticesBuffer[index++] = vertices[i + 2];
        _verticesBuffer[index++] = 0.0; // padding

        _verticesBuffer[index++] = normals[i];
        _verticesBuffer[index++] = normals[i + 1];
        _verticesBuffer[index++] = normals[i + 2];
        _verticesBuffer[index++] = 0.0; // padding

        _verticesBuffer[index++] = uvs[i];
        _verticesBuffer[index++] = uvs[i + 1];
        _verticesBuffer[index++] = 0.0; // padding
        _verticesBuffer[index++] = 0.0; // padding

Right, so now we have normals coming in our compute shader. Now what? We need a way to interpolate them across the faces.

We will be using barycentric coordinates(7) (described very nicely in the indicated reference) to determine the factors needed to interpolate the intersection point normal.

/* tri -> Triangle that we've hit
    pt -> Ray's intersection point with the triangle
void getNormal(Triangle tri, in vec3 pt, out vec3 normal) { 
    // get the vector from the point to the triangle verts
    vec3 vector0 = tri.vertices[0].position - pt;
    vec3 vector1 = tri.vertices[1].position - pt;
    vec3 vector2 = tri.vertices[2].position - pt;

    // calculate areas as well as factors
    // We don't divide by 2 ( like the triangle area formula says ) because
    // we're dividing the area by the smaller interior triangles areas which
    // we don't divide by two either. We're interested in the factors only
    float a0 = length(cross(tri.vertices[0].position - tri.vertices[1].position, tri.vertices[0].position - tri.vertices[2].position)); // main triangle area
    float a1 = length(cross(vector1, vector2)) / a0; // v0 area / a0
    float a2 = length(cross(vector2, vector0)) / a0; // v1 area / a0
    float a3 = length(cross(vector0, vector1)) / a0; // v2 area / a0

    // find the normal
    normal = tri.vertices[0].normal * a1
	    + tri.vertices[1].normal * a2
	    + tri.vertices[2].normal * a3;

The above works fine, however we can achieve the same thing faster. If we look at our original triangle intersection function

bool hit_triangle_mt(Ray r, vec3 v0, vec3 v1, vec3 v2, out float t)
    vec3 e1 = v1 - v0;
    vec3 e2 = v2 - v0;
    vec3 h = cross(r.direction, e2);
    float a = dot(e1, h);

    if (a < EPSILON && a > EPSILON)
        return false;

    float f = 1.0 / a;
    vec3 s = r.origin - v0;
    u = f * dot(s, h);

    if (u < 0.0 || u > 1.0)
        return false;

    vec3 q = cross(s, e1);
    v = f * dot(r.direction, q);
    if (v < 0.0 || u + v > 1.0)
        return false;
    t = f * dot(e2, q);
    if (t > EPSILON)
        return true;

    return false;

We can see that the barycentric coordinates are already computed here as u and v. The third barycentric coordinates w can be extrapolated as 1 – u – v. All we need to do is output u and v from the triangle intersection function, store them in our intersection structure and use them in our normal interpolation function like so:

/* tri -> Triangle that we've hit
    pt -> Ray's intersection point with the triangle
void getNormal(Triangle tri, in vec3 pt, out vec3 normal) {
    normal = normalize(tri.vertices[1].normal * tri.u + tri.vertices[2].normal * tri.v + tri.vertices[0].normal * (1 - tri.u - tri.v));

Now our normals are smoothly interpolated

Smooth normal interpolation

Next up, proper ray tracing acceleration using BVH!


(1) Writing a ray tracer for the web:

(2) WebGPU Samples:




(6) Ray Tracing Complex Scenes:

(7) Barycentric Coordinates: