Updated Stress Test (4000+ individually transformed normal mapped meshes at 60FPS)

Should run a bit faster across all machines.


update includes:

  • faster matrix composition
  • bare minimum uploads to GPU by combining the various techniques used in the material into one on the fly.
  • added another material into the demo (normal mapped)
  • added other meshes to you can try and identify where the limits are.. geometry bound or draw call bound


link to updated demo:


controls (as before but also):

p – turns of mouse look

n – changes the mesh being rendered


Quick tip, go full screen in your browser to maintain access to all the keyboard short-cuts. (click the title bar and hit F11 is the usual command I think)

As before any feedback on performance would be great, there is still room for gains but fir the time being I am happy. Would love to hear what someone with an i7 and a beast of a graphics card can achieve.



3d speed

Stage3D optimisation

Having been playing with Stage3D for a while now, I though I would write a small piece on optimisation.

With great power comes great responsibility!

Stage3D give you GPU access, which can expose some serious rendering horsepower, but if you don’t treat it with respect your going to find you run into limitations pretty quick!

So what follows is a rough (very) guide on how to squeeze the most out of the new 3D apis.

Rule 1 (of 1)
CPU’s are fast, GPU’s are faster, communication between the two however is probably the biggest bottleneck you will face.
Therefore: Reduce this wherever possible.
This means minimise calls to the following Context3D functions;
(actually most of Context3D’s functions but the above are the real doozies)

GPU’s can draw triangles fast, and lots of them, millions of them every second without breaking into a sweat.
So you might think, “I can call drawTriangles() 50,000* times no sweat as long as I am only drawing a few triangles in each call.”.
WRONG!! This command is a mighty expensive one so use it very wisely!
How: When you call drawTriangles you pass it a vector of ids that, per three, represent one triangle. Given that this call is expensive it then makes sense that you pass it as many triangles as possible in one call. Sadly this doesn’t quite mean you can just group your geometry into big chunks as you cannot change state (alter the material or any parameters) during this call meaning everything that is sent through will be rendered with the same program and set of constants. It does mean however that static (non moving elements) that share the same program/material, can be combined into one list. Things such as trees, grass and any other repeatable geometry are good candidates for this. You can do this for dynamic geometry also but it gets complex as you have to upload transformation data in a separate buffer, this is one way particle systems can be created. The downside is that each time there is a change to any of the objects the whole buffer will need to be re-uploaded. It is also vital that you only try and draw objects that will be seen on screen, so don’t draw that which is out of view of the camera -> frustum culling saves the day.

side note:
Even high end games rarely want to be issuing more than 1000-2000 draw calls, but the likes of battlefield 3 can get up-to the 3000 mark in some of the environments. Newer consoles however, can issue over 10,000 draw calls and do it much faster due to a more direct access to the hardware.

Assuming you have now done everything in your power to reduce the number of draw calls you issue, next thing to look at is state changes (changing the current program).
Changing state on the GPU might seem like a trivial thing but it is actually something you want to keep to a minimum to be able to squeeze the most out of your graphics card.
There are a few things you can do here to reduce this problem.
1. Group the objects that require drawing by their material/program! For example suppose you had 100 cubes, 50 of them with one material and 50 of them with another. Now if you had a list containing all of those cubes and blindly sent them to be rendered you could end up having to change state a large number of times. If it so happened that each cube in the list had a different material to the object before it, then the program will have to be updated for every draw call. Not good. If that list however was sorted so that

– even if you are only drawing one triangle with this call there

*there is an actual limit of 32,768 drawTriangles() calls per present() call.

This function is what allows us to upload constants to the gpu. It is how we upload our matrices and any float1/2/3/4..s that we want to utilise in our shaders. While it may not be a huge bottleneck it still has a noticeable impact on performance in my experiments.
So how to optimise?
Any constant that is likely to be reused by different materials, then upload only once per frame not once per object rendered. So what are the likely culprits?
The view projection matrix! This is 16 float values that will not change between objects so it makes sense to upload it once! 16 numbers vs 16,000 for a 1000 objects and 999 less calls to setProgramConstants, and that is a good thing!
The same applies to anything else that will not vary between objects, camera and light positions or common numbers used in shaders (0,0.5,1,-1..).
What this shows is that it is important to have some sort of system to manage uploads so you can keep track of what is already uploaded and only upload data that isn’t already there!

side note:
This also translates into how you write shaders, knowing that each constant requires an upload should make you rethink sometimes about how to achieve something whilst using minimal constants, take unpacking a normal from a texture. Usually you would multiple the value from the texture by 2 then subtract one (2 floats required) but the same result can also be achieved with a subtract by 0.5 then a divide 0.5 (1 float required). Perhaps not the best example but I am sure you get the idea. REUSE is your friend!


While I only focused on 3 methods of the context3d, almost all of them will incur some penalty but those highlighted are the ones I have found to be a more serious problem.

Quick additions:
Drawing to a bitmapdata from the gpu is slow, so if you have to do it, ensure it is at a small a resolution as possible, in theory a 1×1 pixel readback should be big enough for picking!
Resizing the back buffer is slow!
Creating textures is slow, don’t do it on the fly. Pre allocate if possible then pick from a pool (more relevant for post processing).

At some point in the future, I hope to write some test examples that highlight the cost of the functions mentioned above (have already done quite a few but they have been tied into other things rather than dedicated standalone tests).

I hope all of that makes some form of sense to someone 🙂 If you have any additions, corrections or questions… fire away. Will probably update this from time to time to add in more that I have missed out, it’s a broad area with many possible optimisations!


Super Quick Summary:

REDUCE DRAW CALLS! Group items where possible and use culling to ensure you are only drawing what is neccessary.

REUSE MATERIALS and BATCH RENDERING by material if you can.




Stage3D stress test (b3d engine) 20,000 primitives ~ 15,000,000 triangles (updated with playable demo)

Click to start.
Mouse move to look
Shift to fly faster
Space to toggle rotation
+ to add 500 doughnuts
– to remove 500 doughnuts
m to change material (4 available)
Once started double click to toggle fullscreen (NOTE all keys bar arrow keys will be disabled)

demo (requires flash player 11 download):

link to play standalone version (better experience):

Let me know how it runs (assuming it does) if you can: frame-rate, number of objects it can handle etcetera.

video 1 (looks like crap so am uploading another one):

video 2 (hopfully looks a bit better..still looks balls, ah well)

Starts to slow down with 2000 – 3000 and above primitives on screen 🙁 there is still room for some efficiency improvements but not bad for now

Demonstrates how essential good culling is, despite what the stats say whilst recording I am able to cull a full scene of 10,000 objects in under 1 ms on the release player no problems…without that it would probably die (with it you can happily navigate a scene with 50,000 objects in it at 60fps on a good machine as the majority are being culled away leaving perhaps only 500-1500 visible at any one time when they are spread out like they are in the demo)


Utils3D.projectVectors vs Utils3D.projectVector vs matrix.transformVector… note to self

just a note to self really…

Utils3D.projectVectors(matrix, vertices, projected, uvt);
vector1 = Utils3D.projectVector(vector0);
vector2 = matrix.transformVector(vector0);

projected[0] == vector2.x/vector2.w ≈≈ vector1.x
projected[1] == vector2.y/vector2.w ≈≈ vector1.y
uvt[2] == 1/vector2.w == 1/vector1.w

not sure why projectVector != projectVectors result but its close enough… wasted too much time trying to find out why they are not the same so any ideas on why that is – let me know!


As3 short-cut library :- bwhiting swc

updated: 17.01.2012

Quick follow-on from the previous post.

Added a few more features/functions that I use so feel free to give it a whirl see how you get on, please feedback with any problems or suggestions.


Quick run down of the functions:

  • addListener – adds an event listener to a target with some added magical powers
  • removeListener – removes a single listener assigned by the “addListener” function
  • removeAllListeners – removes all listeners assigned by the “addListener” function
  • buttonify – assigns some button like functionality to an interactive object
  • unbuttonify – removes the functionality added by a call to the “buttonify” function
  • grid – arranges the target’s children in a grid
  • hbox – arranges the target’s children in a hbox
  • vbox – arranges the target’s children in a vbox
  • addChildren – a small short-cut to add a few children to a DisplayObjectContainer at once
  • removeAllChildren – short-cut to remove all the children of a DisplayObjectContainer


more details (with code and samples) to follow but more info can be found in the docs below.

download swc:
bwhiting v1.2

click com.bwhiting.utilities

Code and Samples:

Layout functions

demo here

In the demo above there are 3 buttons.
The first button arranges the content in a hbox style with a padding of 5
The second button arranges the content in a vbox style with a padding of 5
The third button arranges the content in a grid style with fixed number of columns (in brackets) and a padding of 5

The code that drives it is very simple…

private function updateLayout(layout:String):void
	if(layout == "hbox") hbox(container, padding);
	else if(layout == "vbox") vbox(container, padding);
	else if(layout == "grid") grid(container, cols, padding);

So what happens when you click one of those buttons is the above function is called with a parameter “layout” based on which button was clicked.
The sprite “container” holds 5 simple display objects and the functions hbox/vbox/grid all operate automatically on the children of the 1st parameter.
To to arrange display objects horizontally or vertically or in a grid only takes one line of code… 🙂

Event functions

//adds a listener to the stage for Event.RESIZE and calls the onResize function when triggered
addListener(stage, Event.RESIZE, onResize);
//adds a listener to mc1 for MouseEvent.CLICK and calls the handleClick1 function when triggered
addListener(mc1, MouseEvent.CLICK, handleClick1);
//notice how no event parameter is needed in the function
private function handleClick1():void
//adds a listener to mc2 for MouseEvent.CLICK and calls the handleClick2 function when triggered
//but also passes mc2 in as a parameter
addListener(mc2, MouseEvent.CLICK, handleClick2, mc2);
private function handleClick2(mc:Movieclip):void
	trace("handleClick2", mc);
//adds a listener to mc3 for MouseEvent.CLICK and calls the handleClick3 function when triggered
//but also passes mc3 as a parameter and tells it to also return the event as a parameter
//thanks to the return event flag (the true after mc3)
addListener(mc3, MouseEvent.CLICK, handleClick3, mc3, true);
private function handleClick3(mc:Movieclip, e:MouseEvent):void
	trace("handleClick3", mc, e.ctrlKey);
//adds a listener to mc4 for MouseEvent.CLICK and calls the handleClick4 function when triggered
//but also passes an array containing mc4 and an alpha value
addListener(mc4, MouseEvent.CLICK, handleClick4, [mc4, 0.5]);
private function handleClick4(array:Array):void
	array[0].alpha = array[1];
//adds a listener to mc5 for MouseEvent.CLICK and calls the handleClick5 function when triggered
//but also passes the parameters mc5 and 0.5 in sequence thanks to the apply parameters flag (the true at the end)!
addListener(mc5, MouseEvent.CLICK, handleClick5, [mc5, 0.5], false, true);
private function handleClick5(mc:MovieClip, alpha:Number):void
	mc.alpha = alpha;
//adds a listener to mc6 for MouseEvent.CLICK and calls the handleClick6 function when triggered
//but also passes the parameters mc6 and 0.5 in sequence and also sends the event though
addListener(mc5, MouseEvent.CLICK, handleClick5, [mc5, 0.5], true, true);
private function handleClick6(mc:MovieClip, alpha:Number, e:MouseEvent):void
	mc.alpha = alpha;
//notice how no event parameter is needed in the function
private function onResize():void
	//aligns this centrally on the stage (using the align function included in bwhiting.swc)
	align(this, stage);

This has real potential to save some typing it is also pretty powerful to if you use your imagination.

//quick dragging! - not the best example but shows what it can do
addListener(box, MouseEvent.MOUSE_DOWN, box.startDrag, [false, new Rectangle(0,0,stage.stageWidth-box.width, stage.stageHeight-box.height)], false, true);

you can also clear a listener or remove all registered listeners with one call (must have been registered with “addListener”)

//remove single listener
removeListener(box, MouseEvent.MOUSE_DOWN);
//remove all listeners attached to box



Aligning in Actionscript

Ever had to align something in as3 before?

How many times have you typed something like this to centre align a display object?

child.x = (parent.width - child.width) / 2;

Well no more!


This is a class I have been using for years and always meant to share but never got round to it.

Here is how it works:

//to centre align a display object with its parent:
//to top-right align a display object with its parent:
align(child, null, [1, 0]);
//to bottom-right align a display object with its parent:
align(child, null, [1, 1]);
//to centre align a display object with respect to the stage:
align(child, stage);
//to top-centre align a display object with respect to the stage and offset it by 25 pixels in the y direction:
align(child, stage, [0.5,1], [0,25]);

Why it works:

parameter 1 – the display object to align
parameter 2 – the object to align against (if null it will use the child’s parent)
note: this can be a display object or an array of length 2 (element 0 being width and 1 being height) comes in handy if you want to align against some values not relating to a specific display object
parameter 3- the alignment in array form! element 0 is the alignment in x-direction and element 1 is the alignment in the y-direction… [0,0] = top left, [1,1] = bottom right, [0.5,0.5] = middle centre (also the default if you pass in null)
parameter 4 – an optional offset array in pixels i.e. an array here of [50,-100] would offset the display object by +50 in the x direction and -100 in the y direction, after the object has been aligned!

Its not perfect but I use it all the time!!! Brilliant in resize event handlers:

align(_contact, stage, [0.5, 1], [0,-25]);
align(_content, stage);
align(_bg, stage, null, [0,-50]);
align(_spinner, _content, null, [_content.x+_spinner.width/2, _content.y+_spinner.height/2]);

just a small snippet from a resize handler in one of my projects… really really speeds things up when it comes to aligning

please use it and please feedback, especially if it breaks/dies/explodes

I plan to add support for detecting bounds as at the moment it is geared at top left aligned display objects only (although you can use the offset to overcome this if you wish)

let me know what you think


EDIT: the following link has been updated to include the features as described in the next post, ensure you download the latest version number
the download link —-> bwhiting v1.1 <----


Dynamic Reflections with Stage3D

First up the video:

This was actually pretty easy to achieve, it basically involves rendering the scene 6 times with a wide angle lens each time rendering to a side of a cube map.
This cube map can then be used as a basis for a reflection shader.

Noteworthy observations:

this will be slow for complex scenes as there could potentially be 7 times more draw calls!!
forgetting to convert degrees to radians can be a pain in the but (doh)
frustum cull with each face render to reduce the number of draw calls
to get it to reflect another reflective object will really start to complicate avoid like hell
could probably get away with rendering to really small textures, as perfect reflections can look less realistic anyway.
there is something wrong with my reflection math as those of you with keen eyes will have noticed the reflected cubes are moving backwards – will fix this soon I hope.

All this was achieved in a very short space of time, about 30 mins of coding and about 2 hours to realise I forgot to convert degrees to radians.

example code:

public function rendertoCubeMap(cubeTexture:CubeTexture, exclude:Object3D = null):void
	this.exclude = exclude;
	var cameraPositionCache:Vector3D = camera.position.clone();
	var cameraTargetCache:Vector3D =;
	var fovCache:Number = camera.fov;		
	var aspectCache:Number = _aspect;		
	camera.fov = cubeFov;
	_aspect = 1;
	context3D.setRenderToTexture(cubeTexture, true, 0, 0); - 1, camera.position.y, camera.position.z);
	context3D.setRenderToTexture(cubeTexture, true, 0, 1); + 1, camera.position.y, camera.position.z);
	context3D.setRenderToTexture(cubeTexture, true, 0, 2);, camera.position.y + 1, camera.position.z+0.001);	//get some NaNs if z = 0 here
	context3D.setRenderToTexture(cubeTexture, true, 0, 3);, camera.position.y - 1, camera.position.z-0.001);		//get some NaNs if z = 0 here
	context3D.setRenderToTexture(cubeTexture, true, 0, 4);, camera.position.y, camera.position.z + 1);
	context3D.setRenderToTexture(cubeTexture, true, 0, 5);, camera.position.y, camera.position.z - 1);
	_aspect = aspectCache;
	camera.fov = fovCache;, cameraTargetCache.y, cameraTargetCache.z);
	camera.position.setTo(cameraPositionCache.x, cameraPositionCache.y, cameraPositionCache.z);
	this.exclude = null;

and that’s it!



A few screen captures of some stage3d experiments


Molehill Convolution

Updated: 17/11/2011

First up here is the demo:
click here

Mouse to move camera
Mouse wheel to zoom
Space bar to toggle mouse movement
check box to turn the post processing on or off
number 0-9 on keyboard to try some preloaded filters (emboss, blur, edge detection etc…)
slider to control the sample offset
you can enter your own values into the matrix if you like, but be nice as there isn’t any error checking

let me know if it works for you!
will need flash player 11 to view
get flash player

video of it running if the demo fails:

So what is a convolution filter and how does it work?

Basically a convolution filter (with regards to images) is where for each pixel to be sampled, a set of surrounding pixels is also sampled. The influence of the surrounding pixels is controlled by a matrix.


The current pixel is represented by the middle value.

The 1st value in the matrix is a sample taken from the pixel North-West of the current pixel
2nd is North
3rd is North-East
4th is East

[NW, N, NE,
W , x, E
SW , S, SE]

x = the current pixel

so the positions in the matrix determine where the samples come from and the values in the matrix determine their influence.

in the matrix
all the surrounding pixels will have no (0) influence on the output

with the following matrix
they all have equal weighting effectively blurring the image

various effects can be achieved by using difference weight combinations such as edge detection and sharpening.

How to do it with the AGAL helper:

simply call:
AGAL.convolve(output, sample, texture, uv, offset, matrix)+

where output is the destination register
where sample is a temporary register for storing the modified uvs
where texture is the texture to sample from
where uv is the current uv coord of the sample (usually passed from the vertex shader)
where offset* is the value to offset the samples by (you will need to calculate this with something like 1/viewportWidth)
where matrix** is the convolution matrix

*at the moment the same offset is used for x and y when in reality there should be two (one for each axis) will add this in at a later date as it hasn’t caused an issue so far.

sample code to upload a matrix of 9 values from a vector of numbers called “_filter” into fragment constant 0 – 2:

context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 0, Vector.<Number>([_filter[0],_filter[1],_filter[2], 1]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 1, Vector.<Number>([_filter[3],_filter[4],_filter[5], 1]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 2, Vector.<Number>([_filter[6],_filter[7],_filter[8], 1]));

for this to work properly you should normalise you matrix so that the sum of the values in your matrix == 1.
here is some code to do that:

var sum:Number = 0;
for(var i:int = 0 ; i < _filter.length; i++)
	sum += _filter[i];
for(i = 0 ; i < _filter.length; i++)
	if(_filter[i] != 0 && sum != 0)_filter[i] /= sum;

this would turn the matrix




good luck 🙂


Next Post?

Super busy at the moment but would like to smash out a new post.

Will be one of the following:
1. Rendering a Depth Texture with molehill
2. Applying Convolution filters with molehill (blur, edge detection, sharpening)
3. Will update the AGAL helper with some examples (diffuse shading, specular shading, rim shading, etc..)

Any preferences? If not ill just do whichever one I have time to do.