Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

What are these Pixel Shaders of which you speak?

By polyglot in Technology
Tue Oct 28, 2003 at 08:12:38 PM EST
Tags: Technology (all tags)

If you program or play computer games or even recently attempted to purchase a video card, then you will have no doubt heard the terms "Vertex Shader" and "Pixel Shader". What do they mean, other than that marketdroids will never cease to invent crappy technology jargon? Hear all the noise about "Hardware T&L" a couple of years ago? It's all related and this article will attempt to describe it all neatly for you.

Warning: mathematics ahead; this devolves into a basic 3D graphics tutorial. It's nothing more complicated than what you'll see in your final year of high school (linear algebra). If you're totally non-technical, run screaming now.

Sponsor: rusty
This space intentionally left blank
...because it's waiting for your ad. So why are you still reading this? Come on, get going. Read the story, and then get an ad. Alright stop it. I'm not going to say anything else. Now you're just being silly. STOP LOOKING AT ME! I'm done!
comments (24)
active | buy ad

First of all, you need to have a basic understanding of how a consumer render pipeline (stop panicking already) works and how this is modelled by Direct3D or OpenGL. I shall concentrate on OpenGL here since it's a tad more portable and it's what I'm familiar with. If you're not a programmer, skip down to the bottom where there's links to demos and pretty pictures.

Pipeline Overview

The information & logic that makes up what you see on the screen in a 3D game (or other visualisation) is composed of:

  • Vertex data for models
  • (optional) Index data into vertex arrays
  • Object positioning logic: vertex transformations
  • Textures
  • Yet more textures (eg lightmaps, bumpmaps)
  • Texture combining logic (add, multiply, dependent, etc)
  • Fragment operations (stencil, depth test, alpha test)
  • Blending logic (for transparency)

For each object (eg teapot, knife, player) drawn, the following occurs:

  1. The application tells the card where to find a chunk of memory containing all the vertex data
  2. The geometry pipeline transforms each vertex from model space to clip space and may perform some lighting calculations
  3. (optional) the geometry pipeline generates texture coordinates from mathematical formulae
  4. Primitives (triangles, points, quadrilaterals) are rasterized, ie converted from a set of corner vertices into a huge pile of fragments (a piece of information representing what will become a pixel)
  5. The colour of each fragment is determined by passing it through the fragment pipeline (which performs texture lookups, amongst other things)
  6. Some tests are performed on the fragment to decide if it should be kept or discarded
  7. The pixel colour is calculated based on the fragment's colour, the current pixel colour and some logical operations involving the fragment or pixel's alpha channel
  8. You see a pixel

This article will give a bit of an overview of what happens at each stage and show how a programmable pipeline ("shaders") is more powerful than the equivalent fixed-function pipelines.

Model Mesh Data

It all starts with a 3D model - a collection of and knowledge of how they form a mesh of triangles or quadrilaterals. A vertex typically contains the following properties, with the Position field being mandatory:

  • Position
  • Normal (vector perpendicular to face)
  • Colour (1 or 2)
  • Texture coordinates (1 to 4 for each of 1 to 8 texture units)
  • Other properties for advanced functionality (eg matrix skinning)

The position and normal vectors are in model space, meaning that they are relative to some origin that is significant to the model, eg for a homonid it might be between their feet, a teapot might have its origin at the centre of the base, etc. The colours are used to specify simple material properties. Texture coordinates specify which part of a texture each vertex corresponds; collectively they describe how the texture is stretched over the model.

The typical method of getting this data to the renderer is to put it all in a big array and tell it how large (in bytes) each vertex is and how many of them there are. The vertices are numbered from 0 upwards. The vertex soup is organised into triangle by passing a list of indices to the renderer that tells it what combinations of vertices make up triangles. For example, the index list {0, 10, 1, 11, 2, 12} in triangle strip mode will cause 4 triangles to be rendered: {0, 10, 1}, {10, 1, 11}, {1, 11, 2}, {11, 2, 12}. Organising the mesh into long strips and fans means you need to pass (on average) about 1 index per triangle rather than 3 because after the first two indices, every new index causes a triangle to be rendered.

All vectors are in homogenous coordinates, a neat form that deals very neatly with projective geometry. A nice oversimplification is that there is an extra (4th) coordinate called w; the effective position of a vector is {x/w, y/w, z/w} rather than just {x, y, z}. Directional vectors (eg normals) have w=0 so are (sort-of) infinite length while still maintaining a magnitude. Position vectors (eg vertex positions) typically have w=1 so are unchanged. All vectors therefore have 4 elements, matrices are 4x4. The application needn't always supply all 4 fields of a vector; the latter ones have defaults.

Geometry Pipeline

The job of the geometry pipeline is to convert vertex data from model space to world space, view space and then clip space. Each of these transformations is a matrix*vector multiplication, so all of the matrices are pre-multiplied together before vertex processing begins; this works because matrix multiplication is associative.

Pworld = Mm Pmodel
Pview = Mv Pworld
Pclip = Mp Pview
Pclip = (Mp Mv Mm) Pmodel

World space has the origin at some globally unique position - the corner of a map, the centre of a solar system, whatever. The camera viewing the scene must also typically move, so another transformation is necessary - from world space to view space where the camera is at the origin and looking along the Z axis.

The modelview matrix is the product of the model transform and view transform and is directly manipulatable by an application.

So, to draw a mesh object moving through space, the only varying state is (probably) the modelview matrix. In the case of software "T&L", the transformation and lighting is performed by software on the host CPU. In other words, the CPU performs a matrix*vector multiplication for every vertex in every model drawn, as well as calculating lighting at each vertex (probably in view space). Hardware T&L provides silicon with exactly this functionality, relieving the CPU of this task so that it may concentrate on other things like AI.

The conversion from view space to clip space is performed by the projection matrix, this is what defines the shape of your view: is it a cuboid (orthographic view / parallel projection, as used in CAD programs) or a frustum (perspective view that divides object screen positions by their depth) ? Remember that it's all in homogenous coordinates; for perspective division the projection matrix contains a -1 in the bottom row, second from the right. The result is that when a vertex position is multiplied by this matrix, w is set to -z, thereby causing the x and y screen positions to be divided by depth.

There are yet more (optional) matrices in operation:

  • Colour matrix: transforms vertex colours
  • Texture matrix (one per texture unit): transforms texture coordinates

It may not be necessary to store texture coordinates in the model if they are planar (vary linearly across some plane in view space or clip space). The geometry pipeline can generate the coordinates at render time, this process is known as TexGen.

Having been transformed to clip space, primitives (triangles, etc) are clipped against the view volume. If they're outside, they're discarded; if partially inside then new primitives are generated that lie entirely within the clip volume and coplanar with the original primitives.

Still, the transformation and lighting is fixed - there is only one way to do it and the hardware does just that. Positions are transformed by the modelview matrix, normals are transformed by its inverse transpose, colours and textures are transformed by the colour and texture matrices (if they're not identity), lighting is calculated with a standardised formula based on light positions, vertex positions and a simple model involving dot products.

Programmable Geometry Pipeline

To gain flexibility of transformation, modern (since about GeForce3 for consumers) hardware has provided a mechanism whereby the hardware behaves more like a CPU than a fixed-function pipeline of matrix operations. The programmer has the opportunity to write simple assembly programs that define exactly what transformations are to be applied to each vertex, completely bypassing the default fixed-function vertex pipeline. The deformations are of course not limited to position, but effects can be implemented that modify colours, texture coordinates, normals. The programmable pipeline encompasses modelview transformation of the positions and normals, texture and colour transformation (or generation) but not projection, perspective division or clipping.

Initially it's not obvious why you would bother to do such a thing, but what it gives you is flexibility in a very high performance part of the system. Common applications of this flexibility are:

  • Custom lighting models
  • Matrix skinning: transforming a vertex using multiple modelview matrices and blending between them, this gives realistic elbows in models
  • Procedural geometry: waves, grass, moving blobs, etc
  • Deformable objects: dents, elastic stretching/bending, etc
  • Hair, cloth
  • Arbitrary scrolling, twisting, bending, pinching and swirling of objects & their textures

Grass is a good example of the use of vertex programs. The CPU holds a model of a single blade of grass (plus textures, etc) and sends a few thousand copies of it to the renderer. The vertex program can use the position of each vertex as the argument to sin() functions and use the result to translate the vertex to one side. By varying the phase of the sin() with each frame drawn, the grass appears to wave in a grasslike fashion; each blade sways side to side and the net effect has waves travelling through the grass. All with zero effort from the CPU except to update a couple of numbers each frame. Because the renderer bends each blade of grass, it can accurately generate the normal for each vertex and have the grass accurately lit so that it changes colour as it bends and interacts differently with the light.

In OpenGL, vertex program functionality is currently provided by the ARB_vertex_program extension; Direct3D calls these things Vertex Shaders.

So any time you see lots of objects with similar dynamics, objects bending, objects with multiple parts and skinned elbows, etc, chances are you're looking at vertex shaders in action.


For each primitive in eye space (ie the screen coordinates of the vertices are known), the screen coordinates for each pixel are determined and values for position, colour, texture coordinates, etc are determined by interpolation between the vertex values of the primitive. This means that colour, texture, etc, all vary continuously over a primitive, (mostly) as expected. Having determined where in view-, texture- and colour-space each pixel will come from, a fragment is evaluated from that information by the fragment pipeline.

Fragment Pipeline

The purpose of the fragment pipeline is to determine what colour (r, g, b, a) a fragment will be, based upon lighting calculations, texturing, etc. In the fixed-function pipeline, a set of textures are downloaded (one per texture unit) and a texture environment (texenv) specified for each texture unit. A texenv specifies how to combine (eg add, multiply) a texture sample with some other colour (iterated colour, result from other texture unit). Special-case texenv operations are available in extensions to perform things like bumpmapping (deflecting the normal vector per-pixel to modify lighting and reflection calculations); this must all be manually and explicitly supported by the card driver and the application.

Consider a knife model - it has a wooden handle and a chromed blade with a bloodstain on it. We want to draw the knife in a single pass (ie render it only once and not rely on slow framebuffer blending operations to give us extra texture layers) from a single model (not using separate meshes for the handle and blade). We want it to interact with lighting from the scene (colours and shadowing) as well as have the blade reflect its surroundings. A simple implementation might use four textures:

  • Diffuse colour: woodgrain texture, black on the blade
  • Lightmap: propagation of light through the scene, including shadows and colours
  • Glossmap: reflectivity of the knife, dark grey on the handle, white on the chrome and pale pink on the bloodstained chrome
  • Environment cubemap: 6 faces of a cube, contains view of world as seen from the knife

The lighting equation we'll use is:

Cfrag = lightmap * diffuse + glossmap * environment

Therefore the wooden handle interacts as expected with scene lighting (neon signs, candles, some static shadows) and you can see reflections of the room in the blade. Where there's blood on it, the reflection is tinted.

Setting all of that up takes wads of code and a fair bit of effort. It's also quite inflexible because the texture environments typically only allow you to do arithmetic on outputs (colours) rather than inputs (texcoords) and therefore special effects (eg bumpmapping) must be performed with special and specific extensions.

Like a vertex program, a fragment program is a short program that replaces the old fixed-function pipeline, in this case for fragments. A fragment program can sample textures and do arithmetic, giving it at least the same capabilities as the old pipelines. However, the arithmetic is not restricted to operations on colours, it can be performed on the texture coordinates before the texture is sampled. This leads to dependent texture reads, whereby the result of one texture read is used as (or to modify) the texture coordinates to read from another; most forms of bumpmapping are special cases of dependent texture reads and are neatly implemented as such.

But of course, having gained the flexibility, it is much more powerful than just reimplementations of the fixed-function pipeline. No longer is texturing restricted to just "drawing pictures on surfaces", it is now a generalised function sampling operation. Any function of 1, 2 or 3 variables can be stored at arbitrary resolution in a texture (lots of memory in the 3D case but 1D and 2D are quite tractable) and evaluated on a per-pixel basis. Uses for such technology include:

OpenGL exposes this functionality via the ARB_fragment_program extension, Direct3D refers to it as Pixel Shaders.

Its not always obvious that you're looking at the result of a fragment program; many programmers use them to implement shaders very similar to the default pipeline or a basic extension. However, they are often used to implement very subtle little lighting effects that enhance the realism of a scene without you consciously noticing their effect - things like scattering, peach-fuzz on faces (see the nVidia Dawn demo), etc. The effect doesn't have to be smack-you-in-the-face dramatic like heavy bumpmapping of a reflective surface. Of course, if you don't notice the effect and how it got there, its probably done its job.

Per-Fragment Operations

Having decided the fragment colour, it is passed (or not!) through the stencil, depth and alpha tests, each of which may decide to keep or discard the fragment. If it is kept, it is combined with the framebuffer using operations specified by the program (eg replace, blend using source alpha, etc).


This discussion would not be complete without a mention of Cg ("C for graphics"), a high-level language from nVidia that compiles down to the assembly instructions you would otherwise be required to write to use the above extensions or their D3D equivalents. Cg does indeed look like C, and its existence makes the writing of both vertex and fragment programs trivial - you just write down the equations you want and it "just works". Well, nearly - there is a certain amount of effort required but nothing onerous.

Cg supports a variety of target architectures; obviously NV20 and NV30 are first among them. However, it is capable of generating assembly opcodes for the two ARB [1] extensions described above as well as the DirectX equivalents. I am currently using it to write both vertex and fragment programs on an ATI card using the ARB targets ("arbvp1", "arbfp1").


All of this information is available from opengl.org and the manufacturers: ATI and nVidia. There's SGI too, but your average weekend hacker is not likely to have access to their kit; nevertheless this weekend hacker is grateful to SGI for the effort that they've put into making OpenGL what it is now. The manufacturers make available detailed specification documents, research papers and slides for presentations that they've recently given at conferences like SIGGRAPH and GDC. All of the same effects can be achieved through DirectX9.

Even if you're not interested in programming these video cards, looking at the manufacturers' demos can be enlightening or just plain cool: ATI Demos, nVidia OpenGL Demos, Cg examples and yet more Cg. If you're an artist/modeler rather than a programmer, it would appear that there now exists a free version of Maya for personal, non-commercial use (I haven't tried it); there is a Cg plugin for it that should allow you to fiddle with shaders without having to write your own 3D engine.

Specifications & Documentation

Thanks to pb:
GPUs used for general-purpose programming
FFT implemented in fragment program

If you can program already in C or C++ (or even sort of) then you will likely find that 3D programming is something you enjoy. Working on a project you enjoy is of course the fastest way to get good at anything, so get to it!


Usual [non]disclaimer - I am not associated with any graphics hardware or software companies, nor do I have any pecuniary interest in them. I am the proverbial satisfied customer - I recently purchased a Radeon 9600 Pro for the explicit purpose of playing with fragment programs. In the interests of fairness, the approximate nv equivalent is the FX5600

[1] ARB = OpenGL Architecture Review Board


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


More technical stories like this?
o Bring it on! 86%
o Stick it where it fits! 2%
o my brain hurts 5%
o Reactionary-liberal greenie pinko! 2%
o Greedy ideological neo-con redneck! 0%
o YHBT. YHL. 4%

Votes: 95
Results | Other Polls

Related Links
o homogenous coordinates
o ARB_vertex _program
o Advanced lighting models
o Scattering
o Cartoon rendering
o filtering & post-processing
o Procedural Textures
o ARB_fragme nt_program
o Cg
o opengl.org
o nVidia
o DirectX9
o ATI Demos
o nVidia OpenGL Demos
o Cg examples
o yet more Cg
o artist/mod eler
o Maya
o Cg plugin
o OpenGL.org
o OpenGL tutorials
o OpenGL 1.4
o Gamasutra
o ATI OpenGL Support
o nVidia OpenGL Support
o nvSDK
o pb
o GPUs used for general-purpose programming
o FFT implemented in fragment program
o Radeon 9600 Pro
o FX5600
o Also by polyglot

Display: Sort:
What are these Pixel Shaders of which you speak? | 41 comments (18 topical, 23 editorial, 0 hidden)
Why I voted this up (1.00 / 21) (#25)
by tofubar on Tue Oct 28, 2003 at 08:03:59 PM EST

I'm not fucking bored enough to read about panda mathematics or whatever the fuck the Chinese have developed now but it looks like you did a good job because it's correctly formatted so I'll just vote you up.

Will we see this all back into CPU? (none / 3) (#26)
by Fen on Tue Oct 28, 2003 at 08:33:29 PM EST

I think it's an interesting question as to whether video graphics will be seen as a subset of general processing. Future CPUs may be able to customize themselves on the fly to deal with the specific requirements of 3D processing.
Bandwith can be a problem (none / 3) (#28)
by richarj on Wed Oct 29, 2003 at 12:45:40 AM EST

The graphics card GPU is right next to the rest of the graphics card (memory, ramdac)and the connection between the two is not a specification like AGP. This enables the developers (hardware) to make the link between the components as fast as possible while still being able to fit in a non-proprietary system. If you want more you need to lose the PC architecture and go for something like a SGI system.

"if you are uncool, don't worry, K5 is still the place for you!" -- rusty
[ Parent ]
Rumors are (none / 0) (#35)
by Guybrush Threepwood on Wed Oct 29, 2003 at 04:07:10 PM EST

that NVidia and ATI are seriously considering the CPU market (the nForce chipset could be seen as evidence). I don't doubt it at all, but I'd say we have at least 3 or 4 years before we see a merge of CPU's and GPU's.

-- Dont eat me. I'm a mighty pirate!
[ Parent ]

Cell & Good Programming (2.25 / 4) (#27)
by turtleshadow on Tue Oct 28, 2003 at 09:10:04 PM EST

It will be interesting how the graphic industry will make use of new technology like Sony's Cell architecture and old fashion tweaked code like the scene coders can wring out.

While much of today's visual graphics engines are trying to model reality its often the more artistic takes on what humans perceive as reality than getting all the little triangles mathematically correct.

Indeed the peach fuzzing and such is nice however most artistes know that tricking the eye is often less expensive than getting the scene physically correct and square with mathematics.
Monet, Dali, Michaelangelo all knew this... but they weren't coders

Math Sucks (1.16 / 6) (#29)
by Katt on Wed Oct 29, 2003 at 05:32:00 AM EST

I hate math. If I could go back in time, I'd figure out who invented Algebra and hire a local mercenary to punch that person in the nose.

Of course, I'd be screwed because time travel probably requires math, the space-time continuum would implode, and the world as we know it would vanish leaving us all sitting in caves trying to figure out how to start a fire with that pile of rocks. Sounds worth it to me...

Your explanation looks pretty cool, though.

Well, since you asked... (none / 2) (#30)
by lonesmurf on Wed Oct 29, 2003 at 10:30:37 AM EST

Since you asked, an Arabic mathematician, Al-Khorizmi "invented" algebra. He wrote a treatise centuries ago about solving problems using a step-by-step method. Algebra is an Anglicized version of an Arabic word in the title of his treatise. It literally means to solve for the unknown. Ol' Al's name has been Anglicized to the word algorithm which is a step-by-step plan to solve any problem, especially in writing a computer program.


Sic em, boy.


I am not a jolly man. Remove the mirth from my email to send.

[ Parent ]
Ah ha. (none / 1) (#37)
by Katt on Wed Oct 29, 2003 at 04:40:47 PM EST

He's on my list... I bumped him up to the spot just below "invent time travel."

[ Parent ]
al-Khwarizmi (none / 1) (#38)
by grendelkhan on Wed Oct 29, 2003 at 08:18:21 PM EST

The book in question is Hisab al-jabr w'al mugabalah, meaning "the science of reunion and reduction". When I was in high school, there was a sidebar in one of the textbooks explaining that this 'reduction' was the basic idea behind algebra: to solve a new problem, you transform---reduce---it to an older problem that you already know how to solve.

-- Laws do not persuade just because they threaten --Seneca
[ Parent ]

dependent texturing, FP limitations (3.00 / 5) (#31)
by Guybrush Threepwood on Wed Oct 29, 2003 at 02:18:43 PM EST

Great stuff.. It's a shame that I didn't have time to catch this in editing, though. Some stuff I'd have added goes here:

Dependent texturing: Since you already mention it, you could have also said that this allows textures to be treated as lookup tables, and allows texture values (colors) to be interpreted literally as pointers to other textures. This is the way current research work has implemented ordinary data structures in the GPU (for example, look at the Computation using GPU's section at SIGGRAPH 2003).

Limitations: There are major problems in using vertex and fragment programs (OpenGL name for vertex and pixel shaders) that could be mentioned. Fragment and vertex programs are very efficient because they are executed in parallel in the many fragment and vertex transformation units present in modern GPU's. To allow this sort of parallelism, they all must execute all the instruction in the exact same order (There's only one program counter for many activations of a fragment program). This places a lot of restrictions in shader code: there is no conditional branching. So, there are no loops, no flow control, and no recursion. In fact, all code compiled from Cg or HLSL is inlined into a huge list of instructions. There are 'if's, 'for's, and 'while's in Cg, but try looking at the assembly output: in the case of the  'if', for example, both sides are executed, the values are stored in temporaries, and finally an atomic 'test-and-set' instruction is executed.

Fragment programs alone are not Turing complete: you need other tricks involving the CPU. These tricks are, mainly, reading the output and interpreting it as a texture in a new pass, and using masks to simulate conditionals.

Great article, nevertheless.

-- Dont eat me. I'm a mighty pirate!

Conditionals in GPU programs (none / 1) (#32)
by Xtapolapocetl on Wed Oct 29, 2003 at 03:04:40 PM EST

This places a lot of restrictions in shader code: there is no conditional branching. So, there are no loops, no flow control, and no recursion. In fact, all code compiled from Cg or HLSL is inlined into a huge list of instructions. There are 'if's, 'for's, and 'while's in Cg, but try looking at the assembly output: in the case of the 'if', for example, both sides are executed, the values are stored in temporaries, and finally an atomic 'test-and-set' instruction is executed.

Just a note, while this is true in the current generation of GPUs, it won't be true forever, and Cg is a forward-looking language that is designed to be able to support future vertex/fragment programs in a Turing-complete manner.


zen and the art of procrastination

[ Parent ]
Yes, but (none / 1) (#34)
by Guybrush Threepwood on Wed Oct 29, 2003 at 04:03:00 PM EST

I'd expect at least another generation or two of GPU's before we have general turing-completeness in fragment and vertex programs. (Actually, this is sort of a running bet at our uni CG lab :)

By the time pixel-level programming achieves Turing-completeness, I'd say we're in for a big change in programming at large. There'll be a much larger emphasis on distributed computing, but instead of having your CPU's far away from each other on a gigabit network, they'll be only some microns away :) There's not much to be gained in the MHz arms race anymore.

-- Dont eat me. I'm a mighty pirate!
[ Parent ]

Yeah (none / 1) (#36)
by Xtapolapocetl on Wed Oct 29, 2003 at 04:14:18 PM EST

I'd expect at least another generation or two of GPU's before we have general turing-completeness in fragment and vertex programs. (Actually, this is sort of a running bet at our uni CG lab :)

Sure. Actually, I'd even be surprised if it happened that fast. There are other things GPU makers need to focus on first (reducing the other limitations of GPU programs, i.e. program length, instruction restrictions, texture access restrictions, etc.)

By the time pixel-level programming achieves Turing-completeness, I'd say we're in for a big change in programming at large. There'll be a much larger emphasis on distributed computing, but instead of having your CPU's far away from each other on a gigabit network, they'll be only some microns away :)

You're probably right. I wish the x86 world would take a cue from Apple and make multiprocessor boxes the norm (at least for everything but the low-end and portable markets), but I think that's a pipe dream, what with the intense price competition and so forth.

There's not much to be gained in the MHz arms race anymore.

Yeah. The rest of the computer really needs to catch up. It would really make my life a lot easier if the PC architecture was more like the Xbox (unified memory architecture). That way it would actually be possible to switch between the CPU and GPU and use whatever is fastest for the current task much more efficiently. That's just not feasible for most stuff these days.


zen and the art of procrastination

[ Parent ]
Minor nit (none / 1) (#40)
by Guybrush Threepwood on Thu Oct 30, 2003 at 09:42:54 AM EST

texture access restrictions

The GeforceFX series has no limits on dependent texturing. I think the Radeon 9800 hasn't, either, but since I don't do my work with the Radeon series, I can't be sure.
-- Dont eat me. I'm a mighty pirate!
[ Parent ]

Don't be so sure (none / 1) (#39)
by squigly on Thu Oct 30, 2003 at 04:48:59 AM EST

Proper conditional branching is difficult to achieve with these designs.  Typically they operate on everal fragments at once, which means that you can't branch based on a fragment value since different fragments will want to go in different directions.

You can have a certain amount of forking with conditional writes, but this still means all the instructions on both forks are being executed.  It is also possible that they'll switch to a multiprocessor style architecture, but we may never go in that direction.

[ Parent ]

I dunno (none / 1) (#41)
by Xtapolapocetl on Sun Nov 02, 2003 at 03:51:23 PM EST

Apparently nVidia's NV30-based stuff already has real variable looping and conditional branching, but I don't have one to verify that myself.


zen and the art of procrastination

[ Parent ]
I do, and they don't :) (none / 0) (#42)
by Guybrush Threepwood on Fri Nov 21, 2003 at 05:25:01 PM EST

It's all about compiler inlining.

-- Dont eat me. I'm a mighty pirate!
[ Parent ]

Sh: A high-level meta-programming language for GPU (none / 1) (#33)
by coleslaw on Wed Oct 29, 2003 at 03:57:03 PM EST


What are these Pixel Shaders of which you speak? | 41 comments (18 topical, 23 editorial, 0 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!