Quick tip, go full screen in your browser to maintain access to all the keyboard short-cuts. (click the title bar and hit F11 is the usual command I think)
As before any feedback on performance would be great, there is still room for gains but fir the time being I am happy. Would love to hear what someone with an i7 and a beast of a graphics card can achieve.
Having been playing with Stage3D for a while now, I though I would write a small piece on optimisation.
With great power comes great responsibility!
Stage3D give you GPU access, which can expose some serious rendering horsepower, but if you don’t treat it with respect your going to find you run into limitations pretty quick!
So what follows is a rough (very) guide on how to squeeze the most out of the new 3D apis.
Rule 1 (of 1)
CPU’s are fast, GPU’s are faster, communication between the two however is probably the biggest bottleneck you will face.
Therefore: Reduce this wherever possible.
This means minimise calls to the following Context3D functions;
Context3D.drawTriangles();
Context3D.setProgram();
Context3D.setProgramConstants…();
(actually most of Context3D’s functions but the above are the real doozies)
drawTriangles()
GPU’s can draw triangles fast, and lots of them, millions of them every second without breaking into a sweat.
So you might think, “I can call drawTriangles() 50,000* times no sweat as long as I am only drawing a few triangles in each call.”.
WRONG!! This command is a mighty expensive one so use it very wisely!
How: When you call drawTriangles you pass it a vector of ids that, per three, represent one triangle. Given that this call is expensive it then makes sense that you pass it as many triangles as possible in one call. Sadly this doesn’t quite mean you can just group your geometry into big chunks as you cannot change state (alter the material or any parameters) during this call meaning everything that is sent through will be rendered with the same program and set of constants. It does mean however that static (non moving elements) that share the same program/material, can be combined into one list. Things such as trees, grass and any other repeatable geometry are good candidates for this. You can do this for dynamic geometry also but it gets complex as you have to upload transformation data in a separate buffer, this is one way particle systems can be created. The downside is that each time there is a change to any of the objects the whole buffer will need to be re-uploaded. It is also vital that you only try and draw objects that will be seen on screen, so don’t draw that which is out of view of the camera -> frustum culling saves the day.
side note:
Even high end games rarely want to be issuing more than 1000-2000 draw calls, but the likes of battlefield 3 can get up-to the 3000 mark in some of the environments. Newer consoles however, can issue over 10,000 draw calls and do it much faster due to a more direct access to the hardware.
setProgram()
Assuming you have now done everything in your power to reduce the number of draw calls you issue, next thing to look at is state changes (changing the current program).
Changing state on the GPU might seem like a trivial thing but it is actually something you want to keep to a minimum to be able to squeeze the most out of your graphics card.
There are a few things you can do here to reduce this problem.
1. Group the objects that require drawing by their material/program! For example suppose you had 100 cubes, 50 of them with one material and 50 of them with another. Now if you had a list containing all of those cubes and blindly sent them to be rendered you could end up having to change state a large number of times. If it so happened that each cube in the list had a different material to the object before it, then the program will have to be updated for every draw call. Not good. If that list however was sorted so that
– even if you are only drawing one triangle with this call there
*there is an actual limit of 32,768 drawTriangles() calls per present() call.
setProgramConstants…()
This function is what allows us to upload constants to the gpu. It is how we upload our matrices and any float1/2/3/4..s that we want to utilise in our shaders. While it may not be a huge bottleneck it still has a noticeable impact on performance in my experiments.
So how to optimise?
Any constant that is likely to be reused by different materials, then upload only once per frame not once per object rendered. So what are the likely culprits?
The view projection matrix! This is 16 float values that will not change between objects so it makes sense to upload it once! 16 numbers vs 16,000 for a 1000 objects and 999 less calls to setProgramConstants, and that is a good thing!
The same applies to anything else that will not vary between objects, camera and light positions or common numbers used in shaders (0,0.5,1,-1..).
What this shows is that it is important to have some sort of system to manage uploads so you can keep track of what is already uploaded and only upload data that isn’t already there!
side note:
This also translates into how you write shaders, knowing that each constant requires an upload should make you rethink sometimes about how to achieve something whilst using minimal constants, take unpacking a normal from a texture. Usually you would multiple the value from the texture by 2 then subtract one (2 floats required) but the same result can also be achieved with a subtract by 0.5 then a divide 0.5 (1 float required). Perhaps not the best example but I am sure you get the idea. REUSE is your friend!
—————————————–
While I only focused on 3 methods of the context3d, almost all of them will incur some penalty but those highlighted are the ones I have found to be a more serious problem.
Quick additions:
Drawing to a bitmapdata from the gpu is slow, so if you have to do it, ensure it is at a small a resolution as possible, in theory a 1×1 pixel readback should be big enough for picking!
Resizing the back buffer is slow!
Creating textures is slow, don’t do it on the fly. Pre allocate if possible then pick from a pool (more relevant for post processing).
At some point in the future, I hope to write some test examples that highlight the cost of the functions mentioned above (have already done quite a few but they have been tied into other things rather than dedicated standalone tests).
I hope all of that makes some form of sense to someone 🙂 If you have any additions, corrections or questions… fire away. Will probably update this from time to time to add in more that I have missed out, it’s a broad area with many possible optimisations!
Super Quick Summary:
REDUCE DRAW CALLS! Group items where possible and use culling to ensure you are only drawing what is neccessary.
REUSE MATERIALS and BATCH RENDERING by material if you can.
UPLOAD THE MINIMUM NUMBER OF CONSTANTS you can get away with 🙂
key:
Click to start.
WASD or LEFT/RIGHT/UP/DOWN to fly
Mouse move to look
Shift to fly faster
Space to toggle rotation
+ to add 500 doughnuts
– to remove 500 doughnuts
m to change material (4 available)
Once started double click to toggle fullscreen (NOTE all keys bar arrow keys will be disabled)
link to play standalone version (better experience): here
Let me know how it runs (assuming it does) if you can: frame-rate, number of objects it can handle etcetera.
video 1 (looks like crap so am uploading another one):
video 2 (hopfully looks a bit better..still looks balls, ah well)
Starts to slow down with 2000 – 3000 and above primitives on screen 🙁 there is still room for some efficiency improvements but not bad for now
Demonstrates how essential good culling is, despite what the stats say whilst recording I am able to cull a full scene of 10,000 objects in under 1 ms on the release player no problems…without that it would probably die (with it you can happily navigate a scene with 50,000 objects in it at 60fps on a good machine as the majority are being culled away leaving perhaps only 500-1500 visible at any one time when they are spread out like they are in the demo)
not sure why projectVector != projectVectors result but its close enough… wasted too much time trying to find out why they are not the same so any ideas on why that is – let me know!
In the demo above there are 3 buttons.
The first button arranges the content in a hbox style with a padding of 5
The second button arranges the content in a vbox style with a padding of 5
The third button arranges the content in a grid style with fixed number of columns (in brackets) and a padding of 5
So what happens when you click one of those buttons is the above function is called with a parameter “layout” based on which button was clicked.
The sprite “container” holds 5 simple display objects and the functions hbox/vbox/grid all operate automatically on the children of the 1st parameter.
To to arrange display objects horizontally or vertically or in a grid only takes one line of code… 🙂
Event functions
//adds a listener to the stage for Event.RESIZE and calls the onResize function when triggered
addListener(stage,Event.RESIZE, onResize);//adds a listener to mc1 for MouseEvent.CLICK and calls the handleClick1 function when triggered
addListener(mc1,MouseEvent.CLICK, handleClick1);//notice how no event parameter is needed in the functionprivatefunction handleClick1():void{trace("handleClick1");}//adds a listener to mc2 for MouseEvent.CLICK and calls the handleClick2 function when triggered//but also passes mc2 in as a parameter
addListener(mc2,MouseEvent.CLICK, handleClick2, mc2);privatefunction handleClick2(mc:Movieclip):void{trace("handleClick2", mc);}//adds a listener to mc3 for MouseEvent.CLICK and calls the handleClick3 function when triggered//but also passes mc3 as a parameter and tells it to also return the event as a parameter//thanks to the return event flag (the true after mc3)
addListener(mc3,MouseEvent.CLICK, handleClick3, mc3,true);privatefunction handleClick3(mc:Movieclip, e:MouseEvent):void{trace("handleClick3", mc, e.ctrlKey);}//adds a listener to mc4 for MouseEvent.CLICK and calls the handleClick4 function when triggered//but also passes an array containing mc4 and an alpha value
addListener(mc4,MouseEvent.CLICK, handleClick4,[mc4,0.5]);privatefunction handleClick4(array:Array):void{
array[0].alpha = array[1];}//adds a listener to mc5 for MouseEvent.CLICK and calls the handleClick5 function when triggered//but also passes the parameters mc5 and 0.5 in sequence thanks to the apply parameters flag (the true at the end)!
addListener(mc5,MouseEvent.CLICK, handleClick5,[mc5,0.5],false,true);privatefunction handleClick5(mc:MovieClip,alpha:Number):void{
mc.alpha = alpha;}//adds a listener to mc6 for MouseEvent.CLICK and calls the handleClick6 function when triggered//but also passes the parameters mc6 and 0.5 in sequence and also sends the event though
addListener(mc5,MouseEvent.CLICK, handleClick5,[mc5,0.5],true,true);privatefunction handleClick6(mc:MovieClip,alpha:Number, e:MouseEvent):void{
mc.alpha = alpha;trace(e.ctrlKey);}//notice how no event parameter is needed in the functionprivatefunction onResize():void{//aligns this centrally on the stage (using the align function included in bwhiting.swc)align(this,stage);}
//adds a listener to the stage for Event.RESIZE and calls the onResize function when triggered
addListener(stage, Event.RESIZE, onResize);
//adds a listener to mc1 for MouseEvent.CLICK and calls the handleClick1 function when triggered
addListener(mc1, MouseEvent.CLICK, handleClick1);
//notice how no event parameter is needed in the function
private function handleClick1():void
{
trace("handleClick1");
}
//adds a listener to mc2 for MouseEvent.CLICK and calls the handleClick2 function when triggered
//but also passes mc2 in as a parameter
addListener(mc2, MouseEvent.CLICK, handleClick2, mc2);
private function handleClick2(mc:Movieclip):void
{
trace("handleClick2", mc);
}
//adds a listener to mc3 for MouseEvent.CLICK and calls the handleClick3 function when triggered
//but also passes mc3 as a parameter and tells it to also return the event as a parameter
//thanks to the return event flag (the true after mc3)
addListener(mc3, MouseEvent.CLICK, handleClick3, mc3, true);
private function handleClick3(mc:Movieclip, e:MouseEvent):void
{
trace("handleClick3", mc, e.ctrlKey);
}
//adds a listener to mc4 for MouseEvent.CLICK and calls the handleClick4 function when triggered
//but also passes an array containing mc4 and an alpha value
addListener(mc4, MouseEvent.CLICK, handleClick4, [mc4, 0.5]);
private function handleClick4(array:Array):void
{
array[0].alpha = array[1];
}
//adds a listener to mc5 for MouseEvent.CLICK and calls the handleClick5 function when triggered
//but also passes the parameters mc5 and 0.5 in sequence thanks to the apply parameters flag (the true at the end)!
addListener(mc5, MouseEvent.CLICK, handleClick5, [mc5, 0.5], false, true);
private function handleClick5(mc:MovieClip, alpha:Number):void
{
mc.alpha = alpha;
}
//adds a listener to mc6 for MouseEvent.CLICK and calls the handleClick6 function when triggered
//but also passes the parameters mc6 and 0.5 in sequence and also sends the event though
addListener(mc5, MouseEvent.CLICK, handleClick5, [mc5, 0.5], true, true);
private function handleClick6(mc:MovieClip, alpha:Number, e:MouseEvent):void
{
mc.alpha = alpha;
trace(e.ctrlKey);
}
//notice how no event parameter is needed in the function
private function onResize():void
{
//aligns this centrally on the stage (using the align function included in bwhiting.swc)
align(this, stage);
}
This has real potential to save some typing it is also pretty powerful to if you use your imagination.
i.e.
//quick dragging! - not the best example but shows what it can do
addListener(box,MouseEvent.MOUSE_DOWN, box.startDrag,[false,newRectangle(0,0,stage.stageWidth-box.width,stage.stageHeight-box.height)],false,true);
//quick dragging! - not the best example but shows what it can do
addListener(box, MouseEvent.MOUSE_DOWN, box.startDrag, [false, new Rectangle(0,0,stage.stageWidth-box.width, stage.stageHeight-box.height)], false, true);
you can also clear a listener or remove all registered listeners with one call (must have been registered with “addListener”)
i.e.
//remove single listener
removeListener(box,MouseEvent.MOUSE_DOWN);//remove all listeners attached to box
removeAllListeners(box);
//remove single listener
removeListener(box, MouseEvent.MOUSE_DOWN);
//remove all listeners attached to box
removeAllListeners(box);
How many times have you typed something like this to centre align a display object?
child.x = (parent.width-child.width)/2;
child.x = (parent.width - child.width) / 2;
Well no more!
introducing align.as!!!
This is a class I have been using for years and always meant to share but never got round to it.
Here is how it works:
//to centre align a display object with its parent:align(child);//to top-right align a display object with its parent:align(child,null,[1,0]);//to bottom-right align a display object with its parent:align(child,null,[1,1]);//to centre align a display object with respect to the stage:align(child,stage);//to top-centre align a display object with respect to the stage and offset it by 25 pixels in the y direction:align(child,stage,[0.5,1],[0,25]);
//to centre align a display object with its parent:
align(child);
//to top-right align a display object with its parent:
align(child, null, [1, 0]);
//to bottom-right align a display object with its parent:
align(child, null, [1, 1]);
//to centre align a display object with respect to the stage:
align(child, stage);
//to top-centre align a display object with respect to the stage and offset it by 25 pixels in the y direction:
align(child, stage, [0.5,1], [0,25]);
Why it works:
parameter 1 – the display object to align
parameter 2 – the object to align against (if null it will use the child’s parent)
note: this can be a display object or an array of length 2 (element 0 being width and 1 being height) comes in handy if you want to align against some values not relating to a specific display object
parameter 3- the alignment in array form! element 0 is the alignment in x-direction and element 1 is the alignment in the y-direction… [0,0] = top left, [1,1] = bottom right, [0.5,0.5] = middle centre (also the default if you pass in null)
parameter 4 – an optional offset array in pixels i.e. an array here of [50,-100] would offset the display object by +50 in the x direction and -100 in the y direction, after the object has been aligned!
Its not perfect but I use it all the time!!! Brilliant in resize event handlers:
i.e.
just a small snippet from a resize handler in one of my projects… really really speeds things up when it comes to aligning
please use it and please feedback, especially if it breaks/dies/explodes
I plan to add support for detecting bounds as at the moment it is geared at top left aligned display objects only (although you can use the offset to overcome this if you wish)
let me know what you think
b
EDIT: the following link has been updated to include the features as described in the next post, ensure you download the latest version number
the download link —-> bwhiting v1.1 <----
This was actually pretty easy to achieve, it basically involves rendering the scene 6 times with a wide angle lens each time rendering to a side of a cube map.
This cube map can then be used as a basis for a reflection shader.
Noteworthy observations:
this will be slow for complex scenes as there could potentially be 7 times more draw calls!!
forgetting to convert degrees to radians can be a pain in the but (doh)
frustum cull with each face render to reduce the number of draw calls
to get it to reflect another reflective object will really start to complicate things..so avoid like hell
could probably get away with rendering to really small textures, as perfect reflections can look less realistic anyway.
there is something wrong with my reflection math as those of you with keen eyes will have noticed the reflected cubes are moving backwards – will fix this soon I hope.
All this was achieved in a very short space of time, about 30 mins of coding and about 2 hours to realise I forgot to convert degrees to radians.
Controls:
Mouse to move camera
Mouse wheel to zoom
Space bar to toggle mouse movement
check box to turn the post processing on or off
number 0-9 on keyboard to try some preloaded filters (emboss, blur, edge detection etc…)
slider to control the sample offset
you can enter your own values into the matrix if you like, but be nice as there isn’t any error checking
let me know if it works for you!
will need flash player 11 to view get flash player
video of it running if the demo fails:
So what is a convolution filter and how does it work?
Basically a convolution filter (with regards to images) is where for each pixel to be sampled, a set of surrounding pixels is also sampled. The influence of the surrounding pixels is controlled by a matrix.
i.e.
[0,0,0,
0,1,0,
0,0,0]
The current pixel is represented by the middle value.
The 1st value in the matrix is a sample taken from the pixel North-West of the current pixel
2nd is North
3rd is North-East
4th is East
…
i.e.
[NW, N, NE,
W , x, E
SW , S, SE]
x = the current pixel
so the positions in the matrix determine where the samples come from and the values in the matrix determine their influence.
in the matrix
[0,0,0,
0,1,0,
0,0,0]
all the surrounding pixels will have no (0) influence on the output
with the following matrix
[1,1,1,
1,1,1,
1,1,1]
they all have equal weighting effectively blurring the image
various effects can be achieved by using difference weight combinations such as edge detection and sharpening.
where output is the destination register
where sample is a temporary register for storing the modified uvs
where texture is the texture to sample from
where uv is the current uv coord of the sample (usually passed from the vertex shader)
where offset* is the value to offset the samples by (you will need to calculate this with something like 1/viewportWidth)
where matrix** is the convolution matrix
*at the moment the same offset is used for x and y when in reality there should be two (one for each axis) will add this in at a later date as it hasn’t caused an issue so far.
**
sample code to upload a matrix of 9 values from a vector of numbers called “_filter” into fragment constant 0 – 2:
for this to work properly you should normalise you matrix so that the sum of the values in your matrix == 1.
here is some code to do that:
var sum:Number = 0;for(var i:int = 0; i < _filter.length; i++){
sum += _filter[i];}for(i = 0; i < _filter.length; i++){if(_filter[i]!= 0&& sum != 0)_filter[i]/= sum;}
var sum:Number = 0;
for(var i:int = 0 ; i < _filter.length; i++)
{
sum += _filter[i];
}
for(i = 0 ; i < _filter.length; i++)
{
if(_filter[i] != 0 && sum != 0)_filter[i] /= sum;
}
Super busy at the moment but would like to smash out a new post.
Will be one of the following:
1. Rendering a Depth Texture with molehill
2. Applying Convolution filters with molehill (blur, edge detection, sharpening)
3. Will update the AGAL helper with some examples (diffuse shading, specular shading, rim shading, etc..)
Any preferences? If not ill just do whichever one I have time to do.