Writing Custom GLSL Shaders for Scatter Plots

You need to render 100k+ scatter points at 60fps, but DOM nodes thrash layout and Canvas 2D bottlenecks the CPU — so the points must move to GLSL shaders.

This is the implementation companion to WebGL shader optimization: the exact vertex and fragment code, attribute layout, and picking pass for a production scatter plot.

Done well, this pipeline renders a million-point scatter in a single draw call at full refresh rate, with smooth circular marks, correct alpha compositing, and instant hover — work that would melt a DOM-based chart and bottleneck a Canvas 2D one. The cost is the GLSL itself, which is unforgiving: a missing precision qualifier, a wrong stride, or a silent shader-compile error renders nothing, with no exception thrown. The discipline below front-loads the checks that catch those failures before they reach a user.

Diagnostic checklist

Mapping a dataset to the GPU

A scatter plot is point-sprite rendering: each row becomes one vertex, the vertex shader projects it to clip space and sets gl_PointSize, and the fragment shader clips the square sprite into an anti-aliased circle. Coordinates and sizes live in pre-allocated Float32Array buffers uploaded once, so per-frame work is a single draw call.

Getting the buffer layout right is where silent bugs hide. Pack X and Y interleaved into one Float32Array, upload with bufferData(gl.ARRAY_BUFFER, data, gl.STATIC_DRAW) for a static dataset, and describe it to the shader with vertexAttribPointer(loc, 2, gl.FLOAT, false, 0, 0) — a stride and offset of zero because the two floats are contiguous. The normalize flag must be false for floating-point position data; setting it true would remap your coordinates into the [0,1] range and silently corrupt the plot. If you also pack a per-point size or category, the stride and offsets change, and a mismatch there produces the classic “renders but in the wrong places” failure. During initialization, never call gl.finish() — it forces a CPU-GPU sync that blocks the event loop and inflates Time to Interactive for no benefit.

Scatter point shader stages Vertex attributes project to clip space and set point size, then the fragment shader clips each square sprite into an anti-aliased circle. aPosition, aSize Float32Array VBO vertex shader gl_Position, gl_PointSize fragment shader circle clip + alpha anti-aliased One draw call renders the whole point cloud
Vertex attributes project and size each point; the fragment shader clips the sprite into a smooth circle.

The fragment shader is where a scatter plot earns its visual quality. WebGL point sprites rasterize as squares by default, so without intervention every point is a hard-edged box. The fix is radial-distance math: gl_PointCoord gives normalized [0,1] coordinates across the sprite, so length(gl_PointCoord - vec2(0.5)) is the distance from the sprite’s center, zero at the middle and about 0.707 at the corners. Feeding that through smoothstep produces a soft, anti-aliased edge for free on the GPU, and discarding fragments beyond the radius clips the square into a circle. Because dense scatter plots overlap heavily, enable alpha blending (SRC_ALPHA, ONE_MINUS_SRC_ALPHA) so overlapping points composite rather than overwrite, and pass alpha as a uniform rather than computing it per fragment to keep fill-rate low on crowded clusters.

Broken vs fixed

// ❌ BROKEN fragment shader: no circle clip, no precision — renders hard squares.
varying vec4 vColor;
void main() {
  gl_FragColor = vColor; // Fills the entire square point sprite; edges are aliased squares.
}
// ✅ FIXED fragment shader: radial clip + smoothstep edge = anti-aliased circle.
#version 300 es
precision highp float; // A11Y: highp avoids coordinate truncation that distorts small marks.

in vec4 vColor;
uniform float uAlpha;
out vec4 fragColor;

void main() {
  float dist = length(gl_PointCoord - vec2(0.5));   // 0 at center, ~0.707 at corner.
  float alpha = smoothstep(0.5, 0.45, dist) * uAlpha; // PERF: branchless soft edge.
  if (alpha < 0.01) discard;                          // Drop fully transparent fragments.
  fragColor = vec4(vColor.rgb, alpha);
}

Step-by-step fix

  1. Compile and link with logs. After compiling each shader, read gl.getShaderInfoLog(); after linking, gl.getProgramInfoLog(). Silent failures otherwise render black.
  2. Upload coordinates once. Pack X/Y into a Float32Array, bufferData(..., STATIC_DRAW), then vertexAttribPointer(loc, 2, gl.FLOAT, false, 0, 0).
  3. Project in the vertex shader. gl_Position = uProjection * vec4(aPosition, 0.0, 1.0) with a CPU-computed orthographic mat4.
  4. Size for HiDPI. Set gl_PointSize = clamp(uZoom * uBaseSize * uDPR, 1.0, 64.0) so points stay crisp on Retina without branching.
  5. Clip to a circle. In the fragment shader, compute length(gl_PointCoord - 0.5) and smoothstep the edge; discard near-zero alpha.
  6. Enable blending. gl.enable(gl.BLEND) and gl.blendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA) before drawing so overlaps composite correctly.
  7. Add GPU picking for hover. Render unique RGB IDs to an offscreen framebuffer, readPixels one pixel under the cursor on rAF, and map back to the array index.
  8. Cull off-screen points by checking gl_Position bounds in the vertex shader to cut fragment load on zoomed views.
// ✅ FIXED vertex shader: projection + branchless HiDPI sizing.
#version 300 es
precision highp float;

in vec2 aPosition;
in float aSize;
uniform mat4 uProjection;
uniform float uZoom;
uniform float uBaseSize;
uniform float uDPR;

void main() {
  gl_Position = uProjection * vec4(aPosition, 0.0, 1.0);
  // PERF: clamp instead of if/else — smooth size scaling with no divergence.
  gl_PointSize = clamp(uZoom * aSize * uBaseSize * uDPR, 1.0, 64.0);
}
// PERF: throttled GPU pick — readPixels once per frame, O(1) hit-test vs O(n) CPU scan.
let pickScheduled = false;
canvas.addEventListener('pointermove', (e: PointerEvent): void => {
  if (pickScheduled) return;
  pickScheduled = true;
  requestAnimationFrame(() => {
    gl.bindFramebuffer(gl.FRAMEBUFFER, pickFBO);
    renderPickPass(); // Each point outputs its index packed as RGB.
    const px = new Uint8Array(4);
    gl.readPixels(e.offsetX, canvas.height - e.offsetY, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, px);
    const index = (px[0] << 16) | (px[1] << 8) | px[2];
    // A11Y: route the same highlight to keyboard focus so screen readers announce it.
    if (index < pointCount) highlightPoint(index);
    pickScheduled = false;
  });
});

Verification

// PERF/correctness: fail fast on shader and program errors before the first draw.
gl.compileShader(fragShader);
console.assert(
  gl.getShaderParameter(fragShader, gl.COMPILE_STATUS) === true,
  gl.getShaderInfoLog(fragShader) ?? 'fragment compile failed'
);
gl.linkProgram(program);
console.assert(
  gl.getProgramParameter(program, gl.LINK_STATUS) === true,
  gl.getProgramInfoLog(program) ?? 'link failed'
);
console.assert(gl.getUniformLocation(program, 'uProjection') !== null, 'uProjection unused/optimized out');

Visually, points should render as soft circles, not squares, and stay sharp on a 2× display. Use EXT_disjoint_timer_query to confirm GPU frame time stays under 16.67ms at full point count, and verify hover highlights the correct point via the picking pass.

The GPU-picking pass deserves its own verification because it fails silently. After rendering the pick pass, read back a known point’s pixel and assert that decoding (R << 16) | (G << 8) | B returns that point’s index. If it returns garbage, the usual causes are a Y-axis flip (gl.readPixels uses bottom-left origin, so you must read at canvas.height - offsetY), an index that overflows 24 bits past ~16.7M points, or a pick shader that writes interpolated rather than flat colors. Throttle the readback to one readPixels per animation frame; calling it on every raw mousemove creates a GPU-CPU synchronization stall that freezes the UI precisely when the dataset is large enough to need picking in the first place.

Edge cases & gotchas

  • Beyond framebuffer resolution. Past ~4M points, RGB IDs can collide with sub-pixel coverage; fall back to a CPU quadtree for coarse picking.
  • NaN coordinates. A single NaN in the buffer corrupts the projection; guard with isnan() before division or filter on upload.
  • Z-fighting on transparency. For overlapping translucent points, disable depth writes (gl.depthMask(false)) during the transparent pass or sort by depth before upload, and avoid writing gl_FragDepth in a transparent fragment shader.
  • Contrast and color encoding. Alpha-blended points can fall below WCAG 1.4.11 non-text contrast against the background when stacked thinly; expose a high-contrast toggle that overrides the alpha uniform with opaque fallback colors for low-vision users, and never encode a category by hue alone — pair it with a redundant channel.
  • Black-screen-after-resize. Recreating the context on resize without rebinding buffers and re-resolving uniform locations renders nothing; rebuild GPU state on webglcontextrestored and after any deliberate context recreation rather than assuming the old handles survive.