Comments on: Stage3D stress test (b3d engine) 20,000 primitives ~ 15,000,000 triangles (updated with playable demo)

By: bwhiting

bwhiting — Fri, 30 Nov 2012 21:57:44 +0000

Have done further optimizations since that was uploaded so could probably squeeze a tiny bit more out of it.

“affiliate transformations on GPU”

Do you mean recompose the transformation matrix on the gpu? At the moment I don’t do that although I have thought about it, it would munch up a fair few instructions though (why I had swerved it before) but I could test the performance comparison quite easily – might give that a bash next week.

By: devu

devu — Fri, 30 Nov 2012 21:35:12 +0000

Quite old machine (about 10x cheaper than any iMac anyway 😉 ), I managed to capture the output when FPS dropped down to 30fps and utilised 85% on my viewport.

AMD Phenom 2 955 x4, ATI Radeon HD 5800

cullingTime: 2
occlusionTime: 0
sortingTime: 0
renderTime: 29
totalTime: 32
totalObjects: 14000
totalTriagles: 10500000
totalRenderedObjects: 11891
totalRenderedTriagles: 8918250 (85%)
totalProgramChanges: 0
Direct :: hardware

Still impressive performance, but one question remains. Did you try affiliate transformations on GPU ?

By: bwhiting

bwhiting — Fri, 22 Jun 2012 11:10:19 +0000

You could go deferred for sure but because we don’t have proper support for multiple render targets its going to require a complete pass for each object for every buffer you want, i.e. one pass of everything for depth, one for normals, one for colour, one for specular…. and so on. That said you can of course combine multiple outputs into one buffer making use of the available channels but its still going to be very memory taxing and a problem for slower graphics cards… but definitely doable!

By: Dave

Dave — Fri, 22 Jun 2012 10:04:56 +0000

Good stuff! I’ve just discovered blend mode on the context docs this morning, I had been wondering how you’d do passes without rendering to bitmapDatas (yuk!). Passes (as you’re doing) make the whole process simpler so I think that’s my next trick. I’m also wondering (in a similar line of thought) whether I should look at deferred rendering to sort out my lighting issues but that again sounds like outputting images and then re-evaluating – sounds expensive… more thought required there I think 🙂

Thanks!

Dave

By: bwhiting

bwhiting — Thu, 21 Jun 2012 21:09:25 +0000

With respect to shaders this is how my engine works:

I have a base material class that contains render state information such as culling, blending etc.. as well as the shader program, comprising of the fragment and vertex parts.

The material system allows you to write your shaders per material but also to build them out of various part too combining them automatically.

i.e.:

var mat:Material = new Material(“my material”);
material.add(new BasePass());
material.add(new TexturePass(texture));
material.add(new DiffuseLightingPass(light), BlendMode.multiply);
material.add(new SpecularLightingPass(light), BlendMode.add);
material.add(new FogPass(0xFFFFFF));

the base pass is the one that transforms the vertex into the world space then clip space ready to be rendered but it can still be modified by any following pass.

so each pass has an optional ability to modify the positions/normals or whatever and pass it to its relevent fragment shader and the fragment shader gets the current colour as an input from the one before where applicable… not sure if this makes any sense but I think it warrants a blog post to explain it properly. But it allows anyone a really easy way to build up complex materials from simple parts) currently I have quite a few effects from lighting to normal mapping to detail mapping to fog to reflections and refractions to simple colours and texture stufff + loads more, so already there a many many combinations that you can make thousands of materials out of if you wanted, and all the fragment parts support blending in the shader automatically so the same effect can look different depending on the blending you want. Its a pretty neat system but I am in the process of making it optional to build them like that, previously it was the only way.

So far I am sticking to the mini assembler as well, although I am a big fan of minko’s as3 shaders, they look like a pretty neat way to build em.

Erm went on a bit of a tangent there but my advice it try to keep your system simple but easy to get hardcore with if the need arises.

If you want more intricate details then I’ll deffo write a post about it as there is a couple nifty tricks that could be useful and some that help speed things up. Also sure that someone could point out ways to improve it!

By: Dave

Dave — Thu, 21 Jun 2012 16:27:09 +0000

I was the same. I wrote a simple fixed function software 3D engine over the course of a number of years (all the maths, from scratch, took me ages!) and I’m now converting it to a hardware programmable pipeline. So the good new is I’ve already battled through the basics of how fundamentally all this stuff works, now I’m just wrapping my head around how best to use the hardware / programmable pipeline.

How do you feel about dynamically built shaders? I’ve got a few simple shaders on the go now but they’re all hardcoded in AGAL (mini assembler) and I keep thinking: Should I bother to make these dynamically generated?

The pairing between the vertex shaders and the fragment shaders makes that a little scary because you’d have to manage the inputs and outputs. But I keep thinking: if the guy (ha! like anyone will ever use this thing :-P) doesn’t want fog, why am I making the shader do all the fog calculations only to decide it’s a value of 0?

Also which AGAL assembler do you favour? As mentioned I’m using the mini version. That’s because I’ve always wanted to dabble in some simple assembly. I’ve done that, it’s initially fun but ultimately painful and I’m thinking about changing to something else! 🙂

By: bwhiting

bwhiting — Thu, 21 Jun 2012 12:22:34 +0000

No worries, I feel your pain though! I had no background in 3D/GPU programming so all self taught (and that was a LOT of trial error / having to answer most of my own questions… a painful process indeed). So always happy to discuss with folks in similar situations.

Feel free to ask questions regarding engine structure as I have been working on b3d for about 2 years now with quite a few iterations and its pretty speedy whilst remaining flexible at the same time. (its mid way though an overhaul at the moment mind woo)

Shame there isn’t a good forum (not necessarily an actual ‘forum’) of 3d flash devs who one can post questions/ideas to, in order to get some educated feedback/help or just general discussion.

Good luck with your endeavours anyhow!!

By: Dave

Dave — Thu, 21 Jun 2012 11:38:47 +0000

Hey Ben,

Thanks for the good solid reply there!

Sorry I tend to be overly dramatic, yes a 1000 houses is a lot 🙂

I’d pretty much arrived at the same conclusions, having been reading around for weeks but it’s always better to chat with someone about it I find. I was worried that I was missing some obviously better strategy in my esoteric trawling of the web but it seems not.

Very difficult to find people to discuss this stuff with: Most Stage3D dabblers seem to be either doing small tech demos that don’t scale up, using a third party engine (no use to me at all since I’m building my own engine for my own amusement), or are so far ahead that they don’t have time to talk down to people like me in the early / mid stages of building something substantial and asking meatier questions than “what’s a vertex buffer then?”.

I’ve been really focused on getting the context changes to a bear minimum but I think I need to start sensibly trading in more draw calls to get the scene flexibility I want / need.

Hmmmm… more food for thought.

Many thanks though, you’ve reassured me that I’m not insane and that’s always nice to hear!

All the best,

Dave

By: bwhiting

bwhiting — Thu, 21 Jun 2012 07:34:05 +0000

Wow monster post!

Will do my best to try and answer your questions but there will be some speculation involved with a hint of guess work but fingers crossed it will clarify some of the things you mentioned above.

Right example 1:

1000 houses is pretty steep, with a limit of 65535 verts per buffer that only leaves you 65 triangles per house.. so possible but not much intricate detail there.

In terms of triangles then you are looking at a maximum buffer size of 524287 (about 174762 triangles). But realistically 1000 low poly houses would probably something more like 25,000 triangles.

So how to best render them (and this would probably apply if you had even 100 houses)?
I think you would be right to store them all in one buffer, but they doesn’t mean you have to draw them all in one go, the drawTiiangles command also can take a firstIndex and a numTriangles parameter. With that in mind you could group your houses based on location say into 4-10 (numbers plucked out of the sky) groups, keeping the houses nearest each other in one group. Then perform some culling on a bounding volume per group and only render the ones on screen. This leads to more draw calls (0-10 based on the numbers above) but no context changes at all. What you do gain though is the chance to not blindly render the whole lot if they are not even going to be on screen. In some cases (where the triangle count is an issue) this will save you some time. But this would require testing to see if the added few draw calls is worth the saving. Personally I like the idea of not sending things to be draw if easily avoidable.

1000 moving zombies… ok this is trickier but I think the solution would be to 1st upload your zombie mesh then make one draw call for each one updating any constants as you need (matrix information, colours etc…) trying to modify buffers on the fly is something to avoid like the plague so let the GPU do the work on this one. It would also allow you to very easily add zombies with almost no overhead as no buffers need to change… only more draw calls.

Example 2:

I’m not going to dwell on this one too much, but in essence yes, you want to avoid rendering in great detail stuff that is far away. I suggest looking into geomipmaps/geoclipmaps, there are a lot of references around but I haven’t tackled it yet.
Proof it is possible in flash though:
http://www.youtube.com/watch?v=a5lojhTl88o
Maybe track down this guy?!?

Example 3:

Dynamic particles are harder in stage3d as we cannot read textures in the vertex shader 🙁 so all the really cool stuff is out of reach. Again I stress avoid messing with the buffers on the CPU, flash is slow at the best of times so to do everything where possible on the GPU is the rule here. With particles you can upload speeds and directions as attributes but any dynamicness (cant think of the word) will have to come from nifty tricks and formulas in the shaders. So proper collisions will be almost impossible but you could emulate simple collisions with your shader code.
Here’s a link to Simo’s blog, he has a lot of info on particle rendering:
http://www.simppa.fi/blog/

Hope that makes some sort of sense and helps clarify some of your questions, it was written in a mad rush so will might be a bit dodge, but I’ll have a read over it later and correct anything.

Ben

By: Dave

Dave — Wed, 20 Jun 2012 18:50:36 +0000

Hi!

Thanks for the tweet (you may regret that in a minute ;-)), I think one of my main questions is around this subject thread. I’ve tried to keep this short and totally failed… sorry!

I’ve just read your post on optimisation and having read around, I’d arrived at the same conclusions about how to make the most of the GPU. Excellent start, we’re on the same page!

What I’m unsure about currently is the best way to handle lots of instances of the same geometry but with different properties and then adding more of them at some future time.

Three examples, all slightly different but homing in on the same problem I think…

Example #1
Rendering a village of a 1000 houses of the same model that don’t ever change / animat and are identical, is probably most efficiently done by just pre-compiling all of the triangles into one big fat vertex buffer (assuming all the data would fit) and (excusing the big upload time) just getting the GPU too spit out all those triangles in one draw call with no context changes.

However, how would you go about having 1000 zombies running in between the houses? Just thinking about the most simple scenario: All of the basic transformation matrices are going to be different, they might all be at some different animation frame, they’ll all be following their own pathfinding algorithm result and so on. Then what if I arbitrarily decided to add another 100 zombies to the horde?

Example #2
I have a landscape scene and I can just throw all the geometry at the GPU but that’s pretty wasteful in the distance and I’d rather use geometry with a lower level of detail. The advantage so far is that I’ve uploaded one vertex buffer and just left the GPU to it, but if I wanted to start dynamically changing which mountains I’m rendering or replacing them with lower res versions then I’m going to be doing a lot of context switching – right?

Example #3
I decide to write a particle system. I could pre calculate all the animation and upload the animation frames to the vertex buffer and that would look good in some scenarios. However I want to apply collisions to the particles (for instance) and that involves I guess, either injecting values into the vertex shader or having the CPU handle everything and re-uploading the vertex buffer like Starling does – ouch! No?

I can think of a few ways to achieve these effects but none of them seem to be in the spirit of minimising context changes and draw calls.

Do you know if there’s a preferred / recommend approach to this? Are the context switches just unavoidable? Feel free to point me at a link or something rather than write a big reply if you know of any! 🙂

Many thanks for your time in advance!

Dave