Rendering Text Fast



Our new text rendering code is 5x faster (in a bad case) than our STK code - and the new code uses the same algorithm! How's that work? Well, we are using the same algorithm but the implementation is vastly different.  In this post, I'll describe the new implementation, which offloads work from the CPU to a vertex shader running on the GPU, enabling the use of static vertex buffer objects.

 Classic Algorithm

The algorithm STK uses to render 2d text in the 3d scene is to render a textured quad for each character.  First each character from a font is written into one or more texture atlases as shown below:

Font Texture Atlas

The bounding texture coordinates for each character are computed.  To render a string, its world position is converted to window coordinates on the CPU (now is a good time to review your transformations).  Then, a textured quad is rendered for each character with the corresponding texture coordinates.  The window coordinate is translated on the CPU after each character so that characters aren't rendered on top of each other. 

This is probably the most widely used text rendering algorithm.  It works fine but has performance problems: each character is processed by the CPU every frame.  Even worse, pretty much any example code you are going to find uses immediate mode to render the quad!  Today's GPUs are programmable and massively parallel, so let's offload this work to the GPU.

GPU-based Implementation

The GPU implementation is a straightforward extension to the CPU implementation.  Instead of translating each vertex on the CPU one after another, each vertex is translated on the GPU in parallel.  The mesh (e.g. one quad per character) is now static as far as the CPU is concerned, so it can be stored in fast video memory, and the CPU to GPU bus traffic can be avoided by using static vertex buffer objects.

Each vertex contains its window space translation.  For example, the first character's quad will have four vertices: (0, 0), (w, 0), (w, h), and (0, h), where w and h are the width and height of the character, respectively, in pixels.  The job of the vertex shader is to take the text's origin in world coordinates and the vertex's pixel translation and output the transformed vertex in clip coordinates.

The following GLSL vertex shader is one possible implementation.  The shader we use in production is a quite a bit different due to techniques to improve precision and other features that are unimportant to us here.

uniform vec3 uTextOrigin;   // In world space
uniform vec4 uViewport;     // [left, bottom, width, height] 
 
void main(void)
{
    //
    // The model-view matrix (hyphen in model-view?) is really the
    // model-view-projection matrix, so v is in clip coordinates.
    //
    vec4 v = gl_ModelViewMatrix * vec4(uTextOrigin, 1); 
 
    //
    // Perspective divide to get normalized device coordinates.
    //
    v.xyz /= v.w; 
 
    //
    // Viewport transform to get window coordinates.
    //
    v.x = uViewport.x + uViewport.z * (v.x + 1.0) * 0.5;
    v.y = uViewport.y + uViewport.w * (v.y + 1.0) * 0.5;
    v.z = (v.z + 1.0) * -0.5; 
 
    //
    // v is now the text's origin in window coordinates.  Translate
    // to get this character's position.
    //
    gl_Position = gl_ProjectionMatrix * (vec4(v.xyz, 0) + gl_Vertex); 
 
    //
    // Pass through color and texture coordinates
    //
    gl_FrontColor = gl_Color;
    gl_TexCoord[0] = gl_MultiTexCoord0;
}

First, the text's origin is transformed by the perspective model-view-projection matrix.  This is the "standard" vertex shader output.  Since we need to operate in window coordinates, the vertex goes through perspective divide and a viewport transform. Now we simply add the translation to position the vertex for its character. Finally, the translated vertex is transformed by an orthographic projection matrix and a few inputs are passed through to the fragment shader (which can be fixed function, in this case).

Optimizations

Now that we have pushed a bunch of work to the GPU, there are other details to consider.

First, not all fonts will fit in a single texture, even if the texture is big.  When a font spans multiple textures, sort the draw calls by texture to reduce the amount of texture binds.  It is not guaranteed that one string can be rendered in a single draw call since the string may span multiple textures.  Texture arrays could make it possible but they require a GeForce 8 or better.

Having one or more draw calls per string isn't terrible (in OpenGL at least).  The thing to avoid is setting the text's origin uniform every draw call.  I found it to be very expensive, so I removed it by duplicating the text's origin across all vertices and modifying the vertex shader accordingly.  Besides not having to set the uniform every draw call, this enables draw calls from multiple strings to be batched into a single draw call as long as they share the same texture.  In some cases, I got a 250% performance gain on a GeForce 8800 GTX. I'm still not sure if this is a good idea - I've heard setting uniforms may cause an expensive shader recompilation to allow for optimizations.  But, I've also heard that this drastic of an improvement could imply a driver bug in setting uniforms.

Regardless, once draw calls are batched and sorted by texture, the rendering code is trivial.

// load matrices, enable blending, alpha test, texturing, etc.       
 
useProgram(...);       // activate vertex shader
uniform(viewport);       
 
bind(vertexBuffer);    // one vertex buffer for all strings       
 
for each texture
    bind(texture);
    for each drawCall   // should only be one or a few per texture
        draw(drawCall.indexCount, drawCall.indexOffset);     
 
// clean up

To be fair, sorting and batching make the rendering code elegant and efficient but make dynamic updates difficult.  What if a string changes? Batched draw calls may need to be broken apart.  What if the length of the string changes? Not fun. What if changing the string means different textures are required? This is a lot of bookkeeping and not interesting enough to discuss here.

Optimizations I May Never Implement

Here are two ideas that may provide modest gains but could also be more trouble then they are worth.

Remove Geometry for Spaces

When a space is rendered, a quad is sent down the pipeline but the frame buffer is never written to because all the fragments fail the alpha test.  That's a lot of work to do nothing.  If you know what characters are spaces for a given font, you can simply not create a quad for that character.  I'm pretty sure you can't guarantee that ' ' is a space in every font so you'll have to inspect the bitmap for each character.  If you're going to that much trouble, you might as well reduce the size of the quad and adjust the translation according for every character - you'll save even more fragments from failing the alpha test.

Cache-Coherent Texture Layout

There has been a ton of recent work in cache-coherent vertex layouts, e.g. OpenCCL, Tom Forsyth's algorithm, and the algorithm we use: Fast Triangle Reordering for Vertex Locality and Reduced Overdraw.  I'm unaware of any work in cache-coherent texture atlas layouts, although there are plenty of heuristics for texture atlas layouts that minimize wasted space.  Perhaps a cache-coherent layout can be computed using knowledge of the language (e.g., the letter e is the most common letter in the english language) or using a machine learning algorithm with the expected set of strings to be rendered.  I suppose such a layout could improve texture cache hits.  Given the high degree of GPU threading used to hide latencies and the fact that this algorithm is already very light in fragment processing, I didn't explore this any further.  This might be a good project for a graduate student but I won't promise that it will work.

Results

I rendered 13,691 strings, with a total of 192,073 characters using both Point Break's GPU-based implementation and STK's CPU-based implementation.  It is not exactly an apples-to-apples comparison but at least we can get a feel for the GPU implementation's speed.

Text Batch Primitive

In an empty 3D window with just the text, the GPU version was 545% faster.  Not bad - but I expected better.  This test doesn't capture the full potential of the algorithm.  Since the vertex crunching is all done on the GPU, the CPU is now free to do other things.  This is really the use-case we care about; an application built with Point Break is likely to have lots of CPU-intensive anaylsis computations (potentially on multiple threads).

To emulate this, I put a dummy loop that counts to 50,000,000 and outputs the value to the console window at the start of each frame.  Without any text rendering, the frame rate was 17.31 fps - obviously all CPU load.  The frame rate dropped to 10.86 fps when text is added using the CPU code.  When the GPU code is used, the frame drops only slightly to 17.28 fps. The difference is almost "in the noise."

This is an approach we are implementing throughout Point Break - utilizing the programmable GPU to free up the CPU for more important things, such as your application.

Interested in this type of stuff? Good. We want to hire you.

27 Responses to “Rendering Text Fast”


  1. 1 Etienne

    Really interesting.

    I wondered why text effects like “drop shadow” or “outline” weren’t implemented.
    Sometimes it’s difficult to read texts and these effects would be useful and attractive.

    I think that I’ve a start of answer.
    If you have a “textured quad rendered for each character” it’s quite difficult to obtain a global effect.

  2. 2 Cozzi

    Yes, text can be difficult to read without a shadow or border. We implemented shadows and a few other features. I just didn’t discuss everything to keep the post to a reasonable size.

    When shadow text is rendered, each character has a 1 pixel shadow surrounding it. Both the text and shadow color are configurable. This is implemented with multitexturing, so two textures are used: the texture for the original font, and a texture with each character scaled for the shadow.

    Text will be available in our next alpha release, r6. The overview in the help covers its usage, including how to turn on shadows.

  3. 3 vexator

    Hi! You posted a link to this article as a reply to my question here:

    http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=278075#Post278075

    I am, however, not sure how this would work. All the glyphs in the vertex buffer would be positioned at the origin, right? So how would adding gl_Vertex in the computation of the final position place the glyphs next to each other? and how would we account for the glyphs’ different advance values?

    Thank you!

    Matthias

  4. 4 Patrick Cozzi
  5. 5 Stacey Bradley

    I can’t agree with the above post, and would like to pick up on a few of the OP’s points. Not everyone will agree and though I am one of them, I do respect your right to have your view. Either way I have enjoyed reading Rendering Text Fast at AGI Graphics Team Blog.

  6. 6 Ruby

    All of people 2011 can be brighter knowing this!

  7. 7 Reba Mccloudy

    Excellent post. I was checking continuously this blog and I’m impressed! Very useful information specifically the last part :) I care for such info much. I was seeking this certain info for a long time. Thank you and best of luck.

  8. 8 Pamelina

    Wow! This can be one particular of the most useful blogs We have ever arrive across on this subject. Basically Fantastic. I am also a specialist in this topic therefore I can understand your hard work.

  9. 9 sex movie rental

    There has been a long list of todo’s in my iteranary every day, but I really make it a point to come back here and browse a number of your records. Engaging and recommended, you might be among the best there is.

  10. 10 rent house dublin

    House and Rent : Get get of real estate listings, homes in support of sale, houses in place of rent, as a consequence property records, as competently as school after that neighborhood in rank, more than Realtor.com. http://www.houseandrent.com

  11. 11 Cheap Jets Jerseys

    Im truly experiencing the design and layout of this blog. Its an easy at the eyes which make it way more enjoyable for me to come here and visit with greater frequency. Did you hire out a designer to develop your theme? Great work!

  12. 12 ZyprexaCN

    Bottleneck in my computer and solutions? How to make my computer not Crash while on minecraft!? Why can’t I open my internet options? I can’t wrap text in microsoft word 2010?How do I know is compatible with other hardware in my computer? Need help finding a cheap web hosting service!?Why can’t I see people’s friendslist on google+? Hdwr exists;how to reinstall built-in web cam? [url=http://zyprexa-zydis.blogspot.com]Zyprexa[/url] Whats going on with my ipod? Tell me how to take myself off of facebook i do not know how tell me please every little detail as to do this? Zyprexa I am looking for a laptop ranging 22000 to 28000? How to connect Sql server 2000 with asp.net? How can I get a Customized IP Proxy ? True crime new york no roads? Port forwarding port 3074 Question? wind power pros and cons My RSS, Archives, and search has disappeared from my tumblr theme.? Can I delete the extra downloads of Mozilla Firefox or does it make a difference?Can I install Mac OS X 10.3 from USB? How can i use addon downloads for open framework, what do i need to link exactly and how? [url=http://quotk.com/vb/member.php?u=3791]wind power pros and cons[/url] Facebook Disconnect – any know anything about this? Why do you keep asking me to upgrade when I have already done so.?Limitations of using computers in education? How to I transfer a file to an external hard drive???? Error message…? Facebook Privacy! (Hiding all old posts from everyone)? [url=http://baiproba.uw.hu/index.php?action=profile;u=954]wind power pros and cons[/url]How much does an interior designer with no work experience work monthly in dubai? [url=http://forums.camillacastro.us/index.php?action=profile;u=114215]wind power pros and cons[/url]Can you lip sing on youtube? [url=http://forum.mydesipanu.net/smf/index.php?action=profile;u=242579]wind power pros and cons[/url]Need a good motherboard for a reasonable price.? [url=http://professornewbie.fourcolorconsulting.com/index.php?action=profile;u=53335]wind power pros and cons[/url]

  13. 13 window tint harrisburg pa

    This was a really nice post.

  14. 14 proxy list

    I’m a long time watcher and I just believed I’d drop by and say hello there for your very first time.

  15. 15 lawn care delaware

    Can I just say what a relief to find someone who actually knows what theyre talking about on the internet.

  16. 16 Petardy Allegro

    Thanks a lot for giving everyone remarkably wonderful possiblity to read from here. It can be so excellent and also jam-packed with a lot of fun for me and my office mates to search the blog no less than three times per week to read through the fresh things you have. And lastly, I am just at all times impressed with the tremendous concepts you serve. Certain two facts in this article are honestly the most efficient I’ve ever had.

  17. 17 best penis enlarger

    Yet another thing I would like to mention is that as opposed to trying to fit all your online degree classes on times that you finish off work (because most people are tired when they return), try to get most of your instructional classes on the saturdays and sundays and only a couple of courses on weekdays, even if it means taking some time off your weekend break. This is really good because on the weekends, you will be more rested and also concentrated upon school work. Thx for the different tips I have realized from your site.

  18. 18 Zaproszenia Allegro

    Simply wish to say your article is as amazing. The clarity in your post is simply great and i can assume you’re an expert on this subject. Fine with your permission let me to grab your RSS feed to keep updated with forthcoming post. Thanks a million and please keep up the rewarding work.

  19. 19 pakigchristen

    cheap louis vuitton bags uk for less louis vuitton bags uk and check coupon code available

  20. 20 LileLaulp

    [url=http://www.davidnarine.com]UGG boots[/url] may be the title that pertains to unique boots that is definitely made working with sheepskin. These special boots arrived initially from down under and now the trademark is which has a US producer. Nonetheless, the process of producing this special footwear stays the identical. They’re developed through the pores and skin or coat of a sheep that happen to be generated then sold internationally. A lot of these footwear ended up usually worn amid shepherd group of individuals who needed to maintain the warmness of their feet. The earliest kind of this footwear worn by men and women who usually do not belong from the shepherd group of individuals was in Planet War I. Through people days, pilots essential difficult sneakers along with those who are warm and comfortable adequate. These kinds of footwear ended up the most effective method for trend concerns required to resolve.
    Merely a heads up, most of these footwear are literally a trend image now. These turn into a rage for individuals round the globe who generally adore these distinctive boots. The majority of people through the entire planet are knowledgeable about the need for a lot of these footwear. The more youthful era, elder men and women, farmers and subject personnel, media personalities and also pupils are working with these kinds of footwear. Seldom does any footwear variety can unite individuals from all areas of modern society and [b]Ugg Boots Clearance[/b] have preserved to achieve this eyesight. Moreover, it really is said which the term UGG is really a slang commenced in Australia which implies ‘ugly’. Very well, basically if this can be appropriate, it wouldn’t halt the wild desire for this footwear. These unique footwear certainly are a great strike, regardless of having this tag for being ‘ugly’. Absolutely everyone loves these types of footwear due to the fact almost nothing else delivers the degree of warmth these “booties” offer. These come in a range of shades, styles and designs. It isn’t simple to maintain anything in account because the selection pertaining to those kinds of footwear is absolutely amazing. Regardless of what you’re most popular colour is or what your chosen style are, you may surely have a set of [b]ugg boots sale[/b] that suit you.For guys, the shades often be or simply not as much http://www.davidnarine.com easy and simple, classically talking. Around the other hand, females have got a wide range of possibilities. Obviously, most women want to have on different shoe pairs and also boots along with a range of attire depending on their desire. Similarly, girls can get these kinds of footwear in different design selections. An individual can buy those who are large above the ankle or most likely people who goes mid-calf over and above. You are able to moyen all of those that has a tight quick skirt, and certainly, you may glance as fashionable as you could be. These are an ideal mixture together with jeans at the same time. If you need to drop by a standard situation but are literally tend not to have significantly time gown oneself up, the best factor you are able to do would be to put on jeans that happen to be skinny with a lot of these footwear and you will be the core of attraction to the get together.

  21. 21 Alegrro

    I think this is one of the most important info for me. And i am glad reading your article. But wanna remark on some general things, The web site style is ideal, the articles is really excellent : D. Good job, cheers

  22. 22 beats by dre

    Sustain the wonderful focus on the positioning. I want it. Might use some more frequent updates, nevertheless i am quite sure you have some are more or better stuff to try and do.

  23. 23 Niewielki Portal

    Thanks for your thread. It’s more useful that I was thinking.

  24. 24 Kastet Allegro

    I do not even know how I ended up here, but I thought this post was great. I don’t know who you are but certainly you are going to a famous blogger if you aren’t already ;) Cheers!

  25. 25 Bethann Brixner

    Generally I don’t read post on blogs, however I wish to say that this write-up very pressured me to take a look at and do it! Your writing style has been amazed me. Thank you, quite nice article.

  26. 26 Zegarki Damskie Allegro

    My spouse and i ended up being absolutely comfortable Michael managed to finish off his investigation while using the ideas he received using your blog. It’s not at all simplistic to simply always be freely giving instructions which some people have been trying to sell. We really remember we have the writer to appreciate because of that. The main illustrations you’ve made, the easy site menu, the friendships you can aid to engender – it is all wonderful, and it is letting our son and the family believe that this article is pleasurable, and that’s rather serious. Thanks for the whole thing!

  27. 27 Acuckoswaldo

    must check bailey uggs ugg bailey button sale with confident , just clicks away

Leave a Reply