Horde3D
http://horde3d.org/forums/

Tile-Based rendering GPU and the ideal rendering pipeline
http://horde3d.org/forums/viewtopic.php?f=7&t=1072
Page 1 of 1

Author:  MistaED [ 15.01.2010, 04:08 ]
Post subject:  Tile-Based rendering GPU and the ideal rendering pipeline

Hi all,

I've been looking up how these new GPUs work in mobile devices (like the AMD Z-series or PowerVR SGX) and that they use a tile-based deferred rendering pipeline. Now I've been trying to work out how this would match Horde3D's rendering pipelines of forward and deferred lighting, but it sort of confuses me on what approach would benefit more.

From my understanding, the deferred part in the hardware is at the rasterisation level where it "bins" visible triangles to the screen into allocated tiles and can work out the overlapping parts and not need to render them. And with this approach, the absolute worst thing you could do is put alpha tested or "discard" in the shader which breaks all of this optimisation.

I'm just wondering now, how would real-time shadows work for alpha-tested things like leaves? Do you render alpha tested/discard shaders as a separate pass after opaque or not do it at all in favour of alpha blends in a separate pass? So you'd have opaque rendering and alpha blend only, but that doesn't really work well for creating the shadows then. Maybe it is best to just go with static baked shadows with a very small number of forward rendered lights?

Thanks for your input!

Author:  DarkAngel [ 15.01.2010, 06:08 ]
Post subject:  Re: Tile-Based rendering GPU and the ideal rendering pipeline

We used to use tile-based GPUs at one of my old jobs (embedded devices), similar model to the one used in the Sega Dreamcast IIRC :wink:

MistaED wrote:
Now I've been trying to work out how this would match Horde3D's rendering pipelines of forward and deferred lighting, but it sort of confuses me on what approach would benefit more.
The best pipeline still probably depends on the number (and size) of dynamic lights you want to have in your scene, regardless of the target hardware.
Quote:
From my understanding, the deferred part in the hardware is at the rasterisation level where it "bins" visible triangles to the screen into allocated tiles and can work out the overlapping parts and not need to render them.
Yep, it's supposed to cut down on the number of pixels that actually get shaded - in theory you can have lots of overdraw in your scene and still not be bound by fill-rate. On PC/360/PS3 GPUs, we achieve the same thing by using the "Z-pre-pass" technique (popularised by Doom 3).
Quote:
And with this approach, the absolute worst thing you could do is put alpha tested or "discard" in the shader which breaks all of this optimisation.
Do you have a reference for this? I wasn't aware of alpha-test/discard slowing down tile-algorithms (though on a regular PC/360/PS3 GPU it does slow down the Z buffer...).
Quote:
I'm just wondering now, how would real-time shadows work for alpha-tested things like leaves? Do you render alpha tested/discard shaders as a separate pass after opaque or not do it at all in favour of alpha blends in a separate pass? So you'd have opaque rendering and alpha blend only, but that doesn't really work well for creating the shadows then. Maybe it is best to just go with static baked shadows with a very small number of forward rendered lights?
By "real-time shadows", I assume you're talking about shadow maps?
As you say, alpha-blending is of no use when generating shadow maps, so you'd have to use alpha-testing...
Or you could simply not use alpha-testing for shadows, so instead of a quad of alpha-tested leaves casting a correct shadow, it would just cast a "quad" shadow...

Author:  MistaED [ 15.01.2010, 06:44 ]
Post subject:  Re: Tile-Based rendering GPU and the ideal rendering pipeline

Ok I kind of sounded like a layman on some things, whoops sorry about that I'll try to sound more like what I'm talking about. Yep I'm familiar with the Z pre-pass on PC/360/PS3, so I guess the tile-based hardware doesn't need to do this step.

I'm not too sure how Imagination Technologies feel about talking about their stuff in public forums but the discard info is in their SDK package. It makes sense though when you think about it, alpha-test would make those overdraw-reducing algorithms a bit difficult to deal with. I noticed in Sonic Adventure 2 on the dreamcast that the trees were alpha blend and not test, and I think I read someone benchmarking blend vs test on the iPhone (MBX-based GPU) with test being massively sluggish on framerate compared but I can't find that post anymore. If you've seen bounce evolution on the nokia N900 the leaves on the trees look like they're alpha-blend: http://www.youtube.com/watch?v=V6JfxdWg0HI just they haven't got shadowmaps, just baked lighting :)

Yeah rendering a quad to the shadowmap wouldn't look too pretty http://www.youtube.com/watch?v=WGAc_szIX4s 16 secs in has an example of this. Thanks for the info DarkAngel!

Author:  marciano [ 15.01.2010, 22:33 ]
Post subject:  Re: Tile-Based rendering GPU and the ideal rendering pipeline

Alpha test/discard can affect early z culling. That's why you need to explicitly enable it with a flag in our model shader. There are several more conditions that can corrupt hierarchical and fine grained z culling. A good overview for ATI hardware can be found here: http://developer.amd.com/media/gpu_assets/Depth_in-depth.pdf. Nvidia has a similar list in their GPU Programming Guide. PIX on the 360 can very well analyze the current HiZ benefit but unfortunately I'm not aware of any tool that could do the same on PC (especially for GL).

Author:  zoombapup [ 07.02.2010, 11:24 ]
Post subject:  Re: Tile-Based rendering GPU and the ideal rendering pipeline

I remember going down to meet the PowerVR people a loooong time ago. They had a demo with a dinosaur that you could cut out the skin of and see its bones, which was pretty neat. But in real games, you just never got anywhere near the speed you required from their chips.

The tiling idea is nice, but its really as far as I can recall, just a way of limiting the amount of memory required. I suspect they had a higher gate count because of it, but it ended up being better just using more memory. They might have also had a higher precision depth scheme too (not a z buffer, but something akin to it).

Weirdly enough, larabee ended up talking about using a tile based scheme too (I guess it just helps organize your render pipeline). See Mike Abrash's talks from last GDC for more info on the details of it.

Always take these things with a pinch of salt though, because people claim a huge amount but dont always deliver (for instance a creative labs 3d blaster was actually slower in our tests than a software renderer).

Page 1 of 1 All times are UTC + 1 hour
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/