Picking using the Depth Buffer

Picking makes 3D applications interactive.  It allows users to select and interact with objects in the 3D scene.  In applications built with Point Break, common uses of picking include double clicking on an object to zoom to it and right clicking on an object to bring up a context menu with operations that act on the object.

In this blog, I will describe the algorithm Point Break uses to implement picking.  The depth buffer, in combination with multipass rendering, is used to identify picked objects.  The algorithm works in many cases when others do not, is almost trivial to implement, and provides good performance for scenes with moderate object depth complexity given its use of culling and scissoring.


Before describing the algorithm, first consider the goals for picking in Point Break.

  • Picking should actually work.  Seriously.  Some picking algorithms do not consider per fragment operations such as the alpha test.  If a model has a hole in it because the alpha test failed, picking on the hole should not return the model.
  • It should be trivial for developers implementing primitives to make their primitives pick-able.  In addition, individual parts of a primitive should be pick-able.  For example, it should be possible to return the exact mesh of a picked model primitive.  Although, this is actually more of an architecture issue than an algorithm issue.
  • Picking should be fast enough to allow roll-over picking, that is, continuously executing a pick as the mouse moves across the screen.  This is useful for reporting the current latitude/longitude/altitude or highlighting primitives as the mouse rolls over them.
  • Drill picking should be possible.  That is, the first pick returns the top object, then successive picks at the same screen location return the next object beneath the last one picked. 

Many of the well known picking algorithms do not meet these goals or do not meet them cleanly.  OpenGL selection and feedback ignores per fragment operations, is a pain to use, and I believe is not hardware accelerated.  Furthermore, it is not part of OpenGL 3.0  or OpenGL ES.  A attractive alternative is to use the Color Buffer for picking.  This can be difficult for primitives that use shaders or textures with alpha channels.  The stencil buffer can also be used for picking but this prevents using it for rendering.  Finally, casting a ray thru the picked pixel and finding the intersected objects is widely used.  Besides requiring a copy of objects in CPU memory, this approach does not consider per fragment operations.


In designing an algorithm to meet our picking goals, the first observation to make is:

We are going to read the depth value of the picked pixel to compute the world space position of the pick.

Furthermore, we should observe:

For most picks, the depth complexity of the picked pixel is relatively low.

To be precise, we don't care so much about the exact depth complexity; we care about the number of objects behind the pixel, which is likely to be far less than the actual depth complexity.  Sometimes, only a single model is behind a pixel, e.g. when a satellite is picked.  Other times, a model and the earth are behind a pixel.  Although a scene with the viewer on the ground looking toward the horizon may have a high object depth complexity.

Since we need to read the depth buffer at least once, we ask:

If we only render the objects behind the picked pixel, can we afford to read the depth buffer after rendering each object?  If we clear the depth buffer before each object, an object is picked if it wrote to the depth buffer.

"No, of course not." the naysayers exclaim.  glReadPixels forces a pipeline flush and moving data from GPU to CPU memory is slow on all but PCIe buses.  How slow?  Too slow when 3 reads are required per pick?  10 reads?  1,000?

I didn't know; so I implemented it to find out.  The algorithm works as following.

   modify the view frustum to just cover the picked pixel;
   scissor out everything except the picked pixel;
   for each visible object
      clear depth buffer;
      render object;
      read depth buffer at pick location;
      if (depth value != clear value)
         add object to list of picked objects;
   sort list of picked objects near to far;

Similar to picking with the color buffer, the scene is rendered into the back buffer, which is never swapped to the front.  In addition, it assumes that all primitives write to the depth buffer, even translucent objects (which are rendered back to front without depth buffering for normal scenes).  Its hard to come up with a case when an object could not write to the depth buffer.  Perhaps, a post-process, per-pixel effect such as fog.  Even, then it could write depth just when rendered for a pick.  Who wants to pick fog anyway?

The first line of the algorithm need some explanation.

modify the view frustum to just cover the picked pixel;

This is crucial for performance.   A view frustum is created that still starts at the viewer position and intersects the normal near and far planes but the frustum only intersects the near plane thru the picked pixel.  This is used for culling so only objects behind the picked pixel are rendered.  A 2D view is shown below.  The original frustum is in black and the modified frustum is in red.

Pick Frustum

When this frustum is used for culling in a scene when the entire earth would be rendered, only a fraction of the earth is rendered for picking as shown below.

Culling for Picking

This actually shows a bad case; the bounding spheres of three tiles intersect the frustum since the picked pixel is approximately in the center of the image.  To reduce the fill rate on both the depth buffer clear and object rendering, the second line of the algorithm scissors everything out except the picked pixel.  This means, although all the vertices are processed in the image above, fragments are only processed for the picked pixel.

Odds and Ends 

There's still a few more details to consider.

First, objects that write the same depth as the clear value are not picked.  In most sane cases, this means objects rendered onto the far plane can be missed.  Thankfully, this is almost never an issue and can be fixed by also reading the color buffer.

Second, if the viewer is far away, its possible for a large amount of the scene to be behind a single pixel.  For example, the entire earth and everything on it could be behind a single pixel.  What does the user think they are picking on anyway?  Regardless, view frustum culling is not effective so we turn to a simple solution.  Objects that are farther from the viewer then a user defined threshold or do not meet a user defined pixel size are not rendered.

Finally, we've ignored the problem of rubberband picking, where the user drags a box in screen space and wants to pick every object in it.  Since the frustum may be large, culling is now less effective.  The good news is this doesn't have to execute at highly interactive rates like a single pick used for roll-over picking.

Summary and Future Work

In summary, the depth buffer, in combination with multipass rendering, is used to identify picked objects.  This algorithm considers per fragment operations, is straightforward to implement, and provides good performance for most scenes.  In a future entry, I may post some performance numbers.

This algorithm is still ripe with potential optimizations.  For example, when drill picking doesn't matter and only the front object needs to be picked, can hardware occlusion queries help?  Can PBOs improve the performance of reading the depth buffer?  If the object depth complexity is high, can each object be written to a different pixel and read back all at once?

In my eyes, the ideal picking algorithm has the ease of use of this algorithm and the speed of traditional color buffer picking with a single read back.

9 Responses to “Picking using the Depth Buffer”

  1. 1 qz0rz

    this seems to work well especially if you’re able to effectively cull your objects before running the algorithm…

    in case performance is an issue… it sounds like you might be able to get away w/ a 1×1 size z-buffer… how about this… have a pool of 1×1 z-buffers, and for each object, render it to its own z-buffer. after rendering all objects, do all the read-backs in one shot. would this be faster? I dunno, cuz u might have to stall the pipeline each time you switch z-buffers, anyway. but it’s an interesting idea to consider. If anything, maybe using a 1×1 size buffer is faster than the full frame buffer.

    one problem w/ “picking” algorithms is that they always assume the ray is coming from the camera, which can be inconvenient =/ it’d be particularly annoying if u have to do tens (or hundreds) of them per frame, because each one would require up to a full scene redraw… I guess if you have that problem, you can think about processing multiple picking requests in the same pass…

  2. 2 Cozzi

    Yes, hierarchical culling is a must for this algorithm to be efficient. In addition, this algorithm benefits from tight bounding volumes.

    I agree with using a single read-back for performance improvements. I don’t know if a separate z-buffer for each object is the way to go; switching FBOs might be a killer, as you suggest.

    In terms of raw rendering, using a 1×1 z-buffer is probably faster since the scissor test, used in the current algorithm, executes after the fragment program. How much faster? Who knows? When objects are rendered for a pick, their fragment programs are almost always very simple.

    I’m considering just rendering each object to a different pixel by changing the viewport then reading back the depth buffer once. I’m pretty sure it will work. It has little overhead and still only a single read-back (assuming there are enough pixels for each object).

    What use cases do you have in mind for a ray that doesn’t come from the camera? Perhaps an orthogonal projection?

  3. 3 Cozzi

    Looks like I was wrong. The scissor test can happen before a fragment program, even though the Direct3D documentation may lead you to think otherwise. Given this, I can’t see using 1×1 z-buffers being faster then rendering to different pixels in the same buffer.

  4. 4 qz0rz

    ya, ur idea of using a different pixel per object sounds good.

    a ray that doesn’t come from camera… actually not a big deal unless you’re writing a game on pointbreak ;-)

  5. 5 Robert Basler

    I was wondering how you set up glViewport for this? I tried setting it to be the same coordinates as glScissor, but that’s not working for me. glScissor seems to work as expected. Without calling glViewport, picking works great, but I’d like that optimization.

  6. 6 Patrick Cozzi


    I actually don’t change the viewport at all. Changing the view frustum for culling and scissor region is all that is needed to get the optimization.


  7. 7 Robert Basler

    I tried using gluPickMatrix which seemed to work, is that what you mean by changing the view frustum?

  8. 8 Patrick Cozzi

    By view frustum, I mean the view frustum that you are using for culling in your application code. If you’re not doing any culling, this optimization will not be as effective (although the scissor test will still help). When you render only a single pixel or small square of pixels for picking, there’s a good chance that fewer objects need to be rendered. Read up on view frustum culling and you’ll see what I mean.

  1. 1 Dark Flow Studios » XNA mouse picking using a deferred depth buffer
Comments are currently closed.