Reducing Draw Calls with Instanced Rendering

Your scatter plot renders one drawArrays call per point, and at a few thousand markers the GPU sits idle while the CPU drowns in draw-call overhead and FPS falls off a cliff.

The fix is instanced rendering: one draw call that stamps the same geometry thousands of times, with per-point data supplied through instanced attributes. This guide sits under WebGL shader optimization, part of the high-performance animation and GPU acceleration overview.

Diagnostic checklist

How instancing collapses the calls

Per-marker draw calls versus one instanced draw call Instancing replaces thousands of draw calls with a single call that reuses geometry and reads per-instance attributes. N draw calls 1 instanced call drawArrays · point 1 drawArrays · point 2 drawArrays · point 3 ... ×N CPU overhead drawArraysInstanced(quad, N) per-instance: pos, color, size divisor = 1
One instanced draw call reuses a single quad N times, reading per-point attributes advanced once per instance.

Why draw calls, not triangles, are the bottleneck

A modern GPU can rasterize tens of millions of triangles per frame without breaking a sweat. What it cannot tolerate is being told to do so one tiny batch at a time. Every drawArrays call carries a fixed overhead on the CPU side: validating state, binding buffers, flushing the command to the driver, and synchronizing. When you issue one draw call per data point, that per-call overhead — measured in microseconds individually — multiplies by the number of points until the CPU spends the entire frame preparing draws rather than the GPU spending it executing them. The telltale signature in a profiler is high scripting/CPU time, low GPU utilization, and a frame rate that scales inversely with point count even though the total triangle count is trivial.

Instancing breaks the link between “number of objects” and “number of draw calls.” Instead of telling the GPU “draw this quad, now draw this quad, now draw this quad,” you tell it once: “draw this quad N times, and here is a buffer of per-instance data to vary each copy.” The GPU’s vertex shader runs once per vertex per instance, reading the shared geometry attributes the normal way and the per-instance attributes via a divisor that controls how often the attribute pointer advances. A divisor of 0 (the default) advances the attribute every vertex — that is your base geometry. A divisor of 1 advances it once per instance — that is your per-point position, color, and size. The single call replaces the entire loop.

The role of the vertex attribute divisor

The divisor is the one concept that trips people up, so it is worth stating precisely. For a quad drawn as a 4-vertex triangle strip across N instances, the vertex shader executes 4 × N times. For an attribute with divisor 0, the GPU reads a new value for each of the 4 vertices and then repeats that same set of 4 for every instance. For an attribute with divisor 1, the GPU reads a single value and holds it constant across all 4 vertices of one instance, then advances to the next value for the next instance. That is exactly the semantics you want: the corner offsets vary per vertex (to form the quad shape), while the marker’s center, color, and size are constant within a marker and vary between markers.

Broken vs fixed

// ❌ BROKEN: a draw call per point. CPU-bound at a few thousand markers.
for (const p of points) {
  gl.uniform2f(uPos, p.x, p.y);        // per-call uniform
  gl.uniform4f(uColor, p.r, p.g, p.b, 1);
  gl.uniform1f(uSize, p.size);
  gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4); // PERF: N draw calls = N CPU dispatches
}
// ✅ FIXED: one instanced call; per-point data in instanced attributes.
// Geometry (a unit quad) is reused; pos/color/size advance once per instance.
gl.bindVertexArray(vao);
// PERF: a single dispatch draws all N markers
ext.drawArraysInstancedANGLE(gl.TRIANGLE_STRIP, 0, 4, points.length); // WebGL1
// WebGL2 equivalent: gl.drawArraysInstanced(gl.TRIANGLE_STRIP, 0, 4, points.length);

Step-by-step fix

function setupInstancing(
  gl: WebGLRenderingContext,
  program: WebGLProgram,
  quad: Float32Array,       // 8 floats: a unit TRIANGLE_STRIP
  instances: Float32Array,  // N * 7: x,y, r,g,b, size, alpha
): WebGLVertexArrayObjectOES {
  const ext = gl.getExtension('ANGLE_instanced_arrays'); // Step 1
  const vaoExt = gl.getExtension('OES_vertex_array_object')!;
  if (!ext) throw new Error('ANGLE_instanced_arrays unavailable; use the per-call fallback');

  const vao = vaoExt.createVertexArrayOES()!;
  vaoExt.bindVertexArrayOES(vao);

  // Step 2: base geometry, advances per vertex (divisor 0, the default)
  const quadBuf = gl.createBuffer()!;
  gl.bindBuffer(gl.ARRAY_BUFFER, quadBuf);
  gl.bufferData(gl.ARRAY_BUFFER, quad, gl.STATIC_DRAW);
  const aCorner = gl.getAttribLocation(program, 'aCorner');
  gl.enableVertexAttribArray(aCorner);
  gl.vertexAttribPointer(aCorner, 2, gl.FLOAT, false, 0, 0);

  // Step 3: per-instance interleaved buffer
  const stride = 7 * 4; // bytes
  const instBuf = gl.createBuffer()!;
  gl.bindBuffer(gl.ARRAY_BUFFER, instBuf);
  gl.bufferData(gl.ARRAY_BUFFER, instances, gl.DYNAMIC_DRAW);

  const aPos = gl.getAttribLocation(program, 'aInstancePos');
  gl.enableVertexAttribArray(aPos);
  gl.vertexAttribPointer(aPos, 2, gl.FLOAT, false, stride, 0);
  ext.vertexAttribDivisorANGLE(aPos, 1); // Step 4: advance once per instance

  const aColor = gl.getAttribLocation(program, 'aInstanceColor');
  gl.enableVertexAttribArray(aColor);
  gl.vertexAttribPointer(aColor, 3, gl.FLOAT, false, stride, 2 * 4);
  ext.vertexAttribDivisorANGLE(aColor, 1); // Step 4

  const aSize = gl.getAttribLocation(program, 'aInstanceSize');
  gl.enableVertexAttribArray(aSize);
  gl.vertexAttribPointer(aSize, 1, gl.FLOAT, false, stride, 5 * 4);
  ext.vertexAttribDivisorANGLE(aSize, 1); // Step 4

  vaoExt.bindVertexArrayOES(null);
  return vao;
}
// vertex shader: expand the unit quad to a sized, positioned marker
attribute vec2 aCorner;        // per-vertex, divisor 0
attribute vec2 aInstancePos;   // per-instance, divisor 1
attribute vec3 aInstanceColor; // per-instance, divisor 1
attribute float aInstanceSize; // per-instance, divisor 1
uniform vec2 uResolution;
varying vec3 vColor;
void main() {
  // PERF: all per-point variation comes from instanced attributes, no uniforms per call
  vec2 world = aInstancePos + aCorner * aInstanceSize;
  vec2 clip = (world / uResolution) * 2.0 - 1.0;
  gl_Position = vec4(clip * vec2(1.0, -1.0), 0.0, 1.0);
  vColor = aInstanceColor;
}

Verification

// Assert one draw call covers the whole dataset by wrapping the GL call.
let drawCalls = 0;
const realDraw = ext.drawArraysInstancedANGLE.bind(ext);
ext.drawArraysInstancedANGLE = ((...a: Parameters<typeof realDraw>) => {
  drawCalls++; return realDraw(...a);
}) as typeof realDraw;
renderFrame();
console.assert(drawCalls === 1, `expected 1 instanced draw call, got ${drawCalls}`);

In the DevTools Performance panel, scripting time per frame should drop sharply and GPU utilization rise; a tool like Spector.js will show a single instanced draw replacing the previous flood of calls.

The most convincing verification is a scaling test. Render 1,000 markers, then 10,000, then 100,000, and record the frame time at each. With per-call drawing, frame time grows roughly linearly with point count and crosses the 16.67 ms budget early. With instancing, frame time stays nearly flat across that range because the CPU-side cost is constant — you issue one call regardless of N — and the GPU has ample headroom for the geometry. If your instanced version also scales linearly, you have probably reintroduced per-point CPU work somewhere, most often by rebuilding the entire instance buffer from scratch every frame instead of updating it in place.

Performance notes

The win from instancing is real but it has a shape worth understanding. Setup cost is paid once: uploading the static quad and configuring the attribute pointers and divisors happens at initialization, not per frame. Per-frame cost is dominated by whatever portion of the instance buffer changes. If positions are static and only colors animate, update only the color region of the interleaved buffer. If everything moves every frame, you still pay an O(N) buffer update on the CPU plus the upload to the GPU — but that is a memory copy, which is an order of magnitude cheaper than N draw-call dispatches, and it stays comfortably inside the frame budget for hundreds of thousands of points.

Memory layout matters. The interleaved buffer in the example packs position, color, and size contiguously per instance, which is cache-friendly when you rebuild the whole buffer. If you only ever animate one attribute, a separate buffer per attribute can be cheaper because you re-upload only the bytes that changed, with bufferSubData targeting exactly that buffer. Either way, allocate the typed array once and overwrite it each frame; allocating a fresh Float32Array per frame creates GC pressure that surfaces as periodic dropped frames, the same pattern surfaced by the dropped-frame diagnostics in this section’s guides.

Edge cases and gotchas

  • Reset divisors when sharing a context. A leftover divisor of 1 on a shared attribute will corrupt later non-instanced draws in the same context — set the divisor back to 0 on those attributes, or, better, isolate instanced state behind its own vertex array object so binding the VAO restores the correct divisors automatically.
  • Update only what changes. For animated positions, bufferSubData into the instance buffer rather than reallocating; reuse a single typed array to avoid GC churn, as covered in WebGL shader optimization. Re-creating the buffer with bufferData every frame defeats much of the benefit because it forces the driver to reallocate GPU memory.
  • WebGL1 needs the extension; WebGL2 does not. On WebGL1 you must obtain ANGLE_instanced_arrays and call its drawArraysInstancedANGLE/vertexAttribDivisorANGLE methods; on WebGL2 the same operations are core methods on the context (drawArraysInstanced, vertexAttribDivisor). Feature-detect and fall back to the per-call loop only as a last resort, since that is exactly the slow path you are escaping. For shader-side fundamentals see WebGL shader basics for 2D data points.