Termite – a vector renderer for stage3d (and possibly other targets)


Just a quick post to introduce Termite. It has been in the making for a long time and shelved for the best part of it.
That said I think I am at a stage now where I think it could be used to create a basic game to test out its capabilities.
For those who don’t know, it is a library that takes content generated in Adobe Flash or Animate and converts it into a GPU friendly format for rendering with stage3d. (I have also done a test in Unity that worked well). It enables developers to render their vector content from the ide on desktop and devices with much higher performance than the traditional cpu renderer.

There are still a number of things I have not yet implemented but plan to and as the video below shows, it has the basics covered!

Here is a quick demo I put together, all the graphics are resolution independent and can be viewed crisply at any zoom.
(Best viewed in fullscreen at maximum quality but still doesn’t do it true justice)

Download slightly better quality here: download

Screenshot of the source fla:

I am thinking I might try to partner up with a game developer or studio to get funding for the last bits of development in exchange for support with the library but am still not 100% on a course of action there.

It works well and performance is good on modern devices too, the above demo runs at 60 fps on my test devices and there is still room for improvement.

I will post more information about features and limitations over the coming weeks and look forward to where the library goes.



Preview 2: of 3D file parsing tool

Well sort of! Here is an example I made that makes use of the files exported from the tool.

First up the demo:

Left/Right arrows to change the model, and mouse wheel to zoom
(sorry it is super lame but in my defence it was put together in under an hour)

Here is an external link: click me

The swf itself is under 5k, so the library is small!
In a nutshell the swf loads binary assets that were exported from the air app I discussed in a previous post.
Once loaded it simply traverses the scene of the file and renders any meshes it finds using the native flash drawing api (i.e. crap and slow but good enough for demo purposes).

I will attach the project zip later so you can have a look at it and also provide links to the library swc and the test files featured above so you can have a play if you want to.

I just wanted to show you how easy it was to use the lib and the files it parses, the whole demo above including all the imports, listeners and other boilerplate stuff is still only just over 200 lines… not bad!

It should be very easy to integrate with away3d, flare3d or anything else really. I will probably also make a JSON export so you can use it with javascript projects too (if you should choose too).

So a little more about the library, it is just a collection of classes that describe a 3d scene. I will outline the main classes here but these are subject to change so it is just so you can get a feel for how it might work. – Use this to convert the loaded binary into a AssimpScene with a static method. – This is the main file you will be working with, it contains information about the scene such as the number of meshes, materials etc, it also contains a reference to the rootNode. – This is a node that represents an object in the scene graph. It may or may not have a mesh, and may or may not contain child nodes. It will contain a transform that describes it position in 3d space relative to its parent. If it has a parent, you will get a pointer to it and if it contains a mesh (or multiple meshes) you will get a vector of indices that point to the mesh in the AssimpScene.meshes vector via index. – a property of a node, it contains a matrix and a getter to obtain the local to world matrix based on its parents. – the class that contains the information about the mesn; its vertices, ids, uvs and optionally normals, tangents etc..

So in its most basic form you could load in the binary file or embed it. Then convert that ByteArray into an AssimpScene using the Assimp class static method loadFromCompressedBinary:

var scene:AssimpScene = Assimp.loadFromCompressedBinary(data);

Once you have the scene you could loop through all the meshes:

for (var i:int = 0; i < scene.numMeshes; i++)
	var mesh:AssimpMesh = scene.meshes[i];
	renderMesh(mesh.ids, mesh.vertices);

Or to render things more true to the scene graph you could do something like I did in the demo above, so please feel free to download the project file and copy code from in there.

assimp.swc <-- you need this to handle the binaries assimp project <-- a simple sample flash develop project A few 3d binaries (these are already in the project): present scene
heart scene
candle scene
cctv scene

Do note that this is a very early beta version (probably an alpha) so there will be some things missing, but they are coming do not worry!

Next time round expect more (any/some) documentation, more demos and more features – hope to have animations and bones in there too!

Sorry I had to rush through that all so fast – crazy busy at the moment! Getting there though!
Let me know if you run into any problems (hopefully not too many).

Keep on flashing!

heart, candle and present models from, and the cctv model was from a random forum somewhere (cheers mystery person)


3D Procedural Geometry using revolutions

Many of the basic primitives we utilize in 3D can be constructed procedurally. All the way from simple planes to complex shapes.

Today I thought I would really quickly share something I use a lot – a revolution mesh. The idea has been around for ages and can be found most 3D packages i.e. the lathe modifier in 3DSMax and its in as3 3D libs like Away3D I think. The good thing about this technique is that it can be used to produce a number of different shapes all with one function (which makes for a smaller code base as well as flexibility).

The idea is simple, pass in a list of 2D points and rotate them about an axis for any given number of times and then construct a solid mesh out of the result.

First up, a demo:

link to standalone version: here

Okay so not very impressive but all of those objects were created in the same way, just revolving a series of points about and axis (in this instance the y/up axis).

So the code to generate them looks like this:

numRevolutions = 25;
revolutions = new Vector.<Object3D>();
revolutions.push(generateRevolutionObject(SplineBuilder.generateArc(5, 5, 0, 0.25)));
private function generateRevolutionObject(spline:Vector.<Number>):Object3D
	//builds the mesh based on the number of revolutions and the points
	var mesh:RevolutionMesh = new RevolutionMesh(numRevolutions, spline);	
	var object:Object3D = new Object3D().build(mesh, material);
	//gives the object a parent transform so can be rotated about the scene origin
	object.transform.parent = transformer;					
	return object;

The magic happens in the SplineBuilder and the RevolutionMesh. The SplineBuilder is just a really simple class with static methods to generate lists of points:

//couple of example functions from SplineBuilder
public static function generateCone(height:Number = 10, radius:Number = 5):Vector.<Number>
	var spline:Vector.<Number> = new Vector.<Number>();
	spline.push(0, height, 0);
	spline.push(radius, 0, 0);
	spline.push(0, 0, 0);
	return spline;
public static function generateCylinder(height:Number = 10, radius:Number = 5):Vector.<Number>
	var spline:Vector.<Number> = new Vector.<Number>();
	spline.push(0,  height/2, 0);
	spline.push(radius, height/2, 0);
	spline.push(radius, -height/2, 0);
	spline.push(0, -height/2, 0);
	return spline;
public static function generateTube(height:Number = 10, radius:Number = 5):Vector.<Number>
	var spline:Vector.<Number> = new Vector.<Number>();
	spline.push(radius/2,  height/2, 0);
	spline.push(radius, height/2, 0);
	spline.push(radius, -height/2, 0);
	spline.push(radius/2, -height/2, 0);
	spline.push(radius/2,  height/2, 0);
	return spline;

See, super simple! You can of course come up with these however you like such as drawn user input or curve interpolations.

So the final piece is the RevolutionMesh (or lathe or whatever you want to call it). Its job is to take the input, transform it around an axis and storing the vertices. Then build the uv’s and indices. (You can then go on to generate normals and tangents etc…)

private function build():void
	if(vertices)		vertices.length = 0;
	else 			vertices = new Vector.<Number>();			
	if(uvs)		uvs.length = 0;
	else 			uvs = new Vector.<Number>();
	if(ids)		ids.length = 0;
	else 			ids = new Vector.<uint>();
	if(normals)		normals.length = 0;
	else 			normals = new Vector.<Number>();	
	//doesn't have to revolve the whole way round, _start and _end are values from 0-1 so a start of 0 and end of 0.5 would mean a 0 - 180 degrees of revolution
	var totalAngle:Number = (Math.PI*2)*(_end-_start);
	var angle:Number = totalAngle/_divisions;
	var startAngle:Number = (Math.PI*2)*_start;
	for (var i : int = 0; i < _divisions+1; i++)
		var vout:Vector.<Number> = new Vector.<Number>();
		transformer.appendRotation((startAngle*Math3D.RADIANS_TO_DEGREES) + (angle*i*Math3D.RADIANS_TO_DEGREES), _axix);	
		transformer.transformVectors(_spline, vout);
		vertices = vertices.concat(vout);
		//could do the same for normals if supplied ;)
	var splineLength:int = _spline.length/3;
	for (i = 0; i < _divisions+1; i++)
		for (var j:int = 0; j < splineLength; j++)
			uvs.push(i/(_divisions), j/(splineLength-1));
	for (i = 0; i < _divisions; i++)
		for (j = 0; j < splineLength-1; j++)
			var id0:uint = ((i+1)*splineLength)+j;
			var id1:uint = (i*splineLength)+j;
			var id2:uint = ((i+1)*splineLength)+j+1;
			var id3:uint = (i*splineLength)+j+1;
			ids.push(id0, id2, id1);
			ids.push(id3, id1, id2);
	//go forth and generate normals etc...

It works by firstly using a matrix3D to rotate the points and copy them into a vertex list. Next it calculates the uvs based upon vertices’ index in the original point list and the revolution index (u based on the rev index and v based on the position in the point list). Once that is done its a simple case of assigning the indices!

There you have it, some fairly simple code which can be reused to create a crap load of geometry.

Hope that helps someone out, shout if you have and questions or spot any problems etc…


3d speed

Frustum Culling – on steroids!

Am not going to really explain how frustum culling works here or how to set it up, there are already some good articles out there will link to a couple of Actionscript ones at the end. This article will explain how to use caching to very easily boost the speed of your frustum culling. As far as testing goes I haven’t yet seen anything as fast as this so I thought I would share it so anyone else can benefit from it or maybe some bright spark can find a way to improve it… there is always room for improvement where flash is concerned 🙂

Just a super quick reasoning why frustum culling is a good thing if done properly:

If the sphere’s position is the same (i.e. a pointer to) the object that it describes position then NO transformations are needed to move the sphere into world space. Consider trying to check a bounding box, while it might be a tighter fit (woop), it would require that the frustum be transformed into local space or the box into world space… either way that is gonna be seriously slow for large numbers of objects when compared to the bounding sphere checks. Frustum checking is easy to implement and can save you a vast amount of time, no rendering of off screen objects!

This is what the standard frustum culling code looks like:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
	var plane:Plane3D;
	var distance:Number;
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;		//object.visible = false;

What the above code is doing is looping through all of you objects or bounding spheres and for each one it projects its position onto each plane that makes up the frustum. If any of the tests show the sphere to be outside of the frustum then it can be marked as not visible and then it will not need to be rendered. As soon as it fails on one plane you can move on to test the next object/sphere but if it passes you will need to go on to check against the next plane until you are sure that it passes all test. (You can see why this is inefficient for objects IN the frustum as 6 checks are required just to confirm that where as the best case is when you can discard an object on the first plane check*)

Now this could be sped up a little bit by unrolling the inner loop but it will only make a very small difference in speed, but it will help.

Introducing the “cache” variable into the mix! Okay here’s where the huge speed up can be gained. In the vast majority of cases, if an object is culled by a frustum plane in one frame and is still culled in the next frame the likelihood that it was culled by the same plane is extremely high! With that in mind if we store the plane that the sphere failed the test on and then test against that one first, for a large number of cases we can eradicate a number of unnecessary plane tests.

Bring on the updated code:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
	var plane:Plane3D;
	var distance:Number;
	var cache:int = sphere.cache;		//the id of the last plane failed on (-1 if none)
	if(cache != -1)
		plane = planes[cache];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0)
			object.cullFlag = -1;	//object.visible = false;
			continue;			//objtect still culled so move to next object
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
		if(j == cache) continue;
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;	//object.visible = false;
			sphere.cache = j;	//set the cache to match the current plane
	//if it's got this far then the object is not culled so the cache can be reset
	sphere.cache = -1;

This time round the code checks to see if a plane has been cached and if so then it checks that one first. This means in the best case (a stationary camera with a stationary scene) all the objects, once they have been culled once already will be culled again with only 1 plane test instead of a maximum of 6! Muchos muchos faster!

There are still things that can be optimised such as not testing the cached plane again in the second section if it passes the check in the first section. You could also unroll all the loops to remove any array/vector lookups for a speed up too.

The method can also easily be extend to check for intersections as well a 1/0, true/false or hit/not hit result. Allowing for other checks such as a bounding box or something more tightly fitting.

I have found this approach to be the fastest yet and my implementation can chew through 10,000 objects in under 1 ms without a problem and this is without any bounding volume hierarchies. (Most flash games will probably not even have that many objects in them anyway but still good to know).

Any questions at all then do ask! Or if I have made any booboos then do let me know.

couple o links: <– nice code (lacking optimisations but I am sure it is just to show the implementation) <– includes some camera code to so you can get up and running


*With that in mind maybe one should check the plane most likely to fail most objects, probably the near and far planes, but not necessarily. You could easy count the the number of culls against each plane as you navigate around your scene and get an average to work out which one is the best culler and check that one first… it will depend on your needs though and could be total overkill, but hey this is flash right and flash IS slow, so every little helps!


SSAO in stage 3D (source added)

I had deveoped my own, simple implementation of SSAO (Screen Space Ambient Occlusion)  and thought I would try some other implementations to get better resukts.

I’ll use this post to share video’s of progress and demos to follow as well.


Video 1:

SSAO pre blur with 8 samples per pixel (hits the agal instruction limit real fast). Still some kinks to iron out for sure, depth buffer is currently really short to help with the accuracy but should be able to improve this I have been able to encode it across 4 channels now not 1.

Runs pretty quick, 60fps is still achievable.






related/useful links:


DEPTH ENCODING     <– nice function at the end of this one


Here it is folks, later than I would have liked but I have been busier than badger in Spring.
I have done my best to comment what is going on.
It took a lot of playing around with to get it working and muchos muchos trial and error.

PLEASE do take this and try and make it better I am sure there is room for improvement!!!

Also feel free to ask any questions at all and I will try and answer them.
Cheers Daniel Holden (orange duck for the code from which this is ported)
pure depth ssao
(Although I use a normal texture to save on operations)

var flags:String = (smooth) ? "linear" : "nearest";
//varying registers
var uv_in:String = "v0";
var tex_normal:String = "fs1";
var tex_depth:String = "fs2";
var tex_noise:String = "fs3";
//float3 - float4
var colour:String = "ft0";
var normal:String = "ft1";
var position:String = "ft2";
var random:String = "ft3";
var ray:String = "ft4";
var hemi_ray:String = "ft5";
var depth:String = "ft6.x";
var radiusDepth:String = "ft6.y";
var occlusion:String = "ft6.z";
var occ_depth:String = "ft6.w";
var difference:String = "ft7.x";
var temp:String = "ft7.y";
var temp2:String = "ft7.z";
var temp3:String = "ft7.w";			//NOT USED
var fallOff:String = "ft0.x";		//NOT USED
var uv:String = "ft0";
var radius:String = "fc0.z";
var scale:String = "fc0.w";
var decoder:String = "fc2.z";
var zero:String = "fc0.x";
var one:String = "fc0.y";
var two:String = "fc1.z";			//NOT USED
var thresh:String = "fc1.w";
var neg_one:String = "fc2.y";		//NOT USED
var depth_decoder:String = "fc3.xyzw";
var area:String = "fc1.x";			//NOT USED
var falloff:String = "fc1.y";
var total_strength:String = "fc2.x";
var base:String = "fc4.x";
var invSamples:String = "fc2.w";
//sample normal at current fragment, and decode
AGAL.tex(normal, uv_in, tex_normal, "2d", "clamp", flags);//ex ft0, v0, fs0 <2d,wrap,linear> \n"+
AGAL.decode(normal, normal, decoder);
//sample deopth at current fragment, and decode
AGAL.tex(colour, uv_in, tex_depth, "2d", "clamp", flags);//ex ft0, v0, fs0 <2d,wrap,linear> \n"+
AGAL.decodeFloatFromRGBA(depth, colour, depth_decoder);
//use this instead if depth is not encoded
//"oc.a", one);
//"oc", depth);//col+".xyz");
//sample random vector, uv_in);
AGAL.mul(uv, uv, scale);					
AGAL.tex(random, uv, tex_noise, "2d", "wrap", flags);
//AGAL.mul(random+".z", random+".z", neg_one);		//not sure if negation needed?
//position".xy", uv_in+".xy");".z", depth);
AGAL.div(radiusDepth, radius, depth);
//occlusion, zero);					
for(var i:int=0; i < 8; i++)
	//reflect the random normal against the current normal and size accoring to depth, further should be larger
	AGAL.reflect(ray,"fc"+(5+(i*2)), random);	//could just add but will look crap?
	AGAL.mul(ray, ray, radiusDepth);
	//dot the ray against normal
	AGAL.dp3(hemi_ray, ray, normal);
	AGAL.sign(hemi_ray, hemi_ray, temp);
	AGAL.mul(hemi_ray, hemi_ray, ray);
	AGAL.add(hemi_ray, hemi_ray, position+".xyz");
	//use position to sample from 
	AGAL.sat(hemi_ray+".xy", hemi_ray+".xy");
	AGAL.tex(colour, hemi_ray+".xy", tex_depth, "2d", "clamp", flags);
	AGAL.decodeFloatFromRGBA(occ_depth, colour, depth_decoder);
	//gets the difference in depth between the current depth and sampled depth
	AGAL.sub(difference, depth, occ_depth);				
	AGAL.sge(temp, difference, thresh);	// 1 if difference is bigger than the threshold, 0 otherwise
	AGAL.slt(temp2, difference, falloff);	// 1 if difference is less than the falloff, 0 otherwise
	//set difference to range 0 - 1 (and clamp)
	AGAL.div(difference, difference, falloff);
	AGAL.mul(difference, temp, difference);						
	AGAL.mul(difference, temp2, difference);
	//accumulate the occusion
	AGAL.add(occlusion, occlusion, difference);
//bring back into range 0-1
AGAL.mul(occlusion, occlusion, invSamples);
//apply any multiplier
AGAL.mul(occlusion, occlusion, total_strength);
//add it to a base value
AGAL.add(occlusion, occlusion, base);
//invert and boom headshot
AGAL.sub("oc", one, occlusion);
var fragmentShader:String = AGAL.code;

here are the constants used:: (some of them didn’t end up being used so they can be left out – too busy/lazy to do that myself yet)

//uv sample offset
var radius:Number = 0.002;		//you should derive from texture size
//noise uv scale
var scaler : Number = 24;		//much smaller and the noise blocks become more apparent
//unused at the mo
var falloff : Number = 0.05;		//not using this so ignore it / remove it
//unused at the mo
var area : Number = 5;			//not using this so ignore it / remove it
//the depth difference threshold
var depthThresh:Number = 0.0001;
//strength of the effect
var total_strength : Number = 1;
//base value for the effect
var base : Number = 0;
var sample_sphere:Vector.<Number> = new Vector.<Number>();
sample_sphere.push( 0.5381, 0.1856,-0.4319, 0);
sample_sphere.push( 0.1379, 0.2486, 0.4430, 0);
sample_sphere.push( 0.3371, 0.5679,-0.0057, 0); 
sample_sphere.push(-0.6999,-0.0451,-0.0019, 0);
sample_sphere.push( 0.0689,-0.1598,-0.8547, 0); 
sample_sphere.push( 0.0560, 0.0069,-0.1843, 0);
sample_sphere.push(-0.0146, 0.1402, 0.0762, 0); 
sample_sphere.push( 0.0100,-0.1924,-0.0344, 0);
sample_sphere.push(-0.3577,-0.5301,-0.4358, 0); 
sample_sphere.push(-0.3169, 0.1063, 0.0158, 0);
sample_sphere.push( 0.0103,-0.5869, 0.0046, 0); 
sample_sphere.push(-0.0897,-0.4940, 0.3287, 0);
sample_sphere.push( 0.7119,-0.0154,-0.0918, 0); 
sample_sphere.push(-0.0533, 0.0596,-0.5411, 0);
sample_sphere.push( 0.0352,-0.0631, 0.5460, 0); 
sample_sphere.push(-0.4776, 0.2847,-0.0271, 0);
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 0, Vector.<Number>([0, 1, radius, scaler]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 1, Vector.<Number>([area, falloff, 2, depthThresh]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 2, Vector.<Number>([total_strength, -1, 0.5, 1/8]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 3, Vector <Number>[1/(255*255*255), 1/(255*255), 1/255, 1]);
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 4, Vector.<Number>([base, 0, 0, 0]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 5, sample_sphere);

Enjoy! On-line demo to follow soonish, and please to feedback with any comments or improvements.

the values in this shader are very VERY important, tiny tweaks/mistakes can throw the whole thing off so do be carefull and don’t be supprised if the world explodes when you play around with it.




3D turntables using b3d (a stage3d engine)

Hooked up b3d to bdj, a little mp3 wrapper for advanced mp3 playback and this was the result:


Added another video (higher res)…

demo features:

(runs at 60 fps on my machine, just down to 25-26 while recording)


  • crossfade
  • volume
  • pitch adjustment
  • addition bend
  • and all the usual stuff like: play pause stop seek cue…


  • mouse ray generation
  • ray plane intersections (for the mouse dragging)
  • as3 native rendering for the interactive areas
  • screen space anti aliasing (experimental)
  • a fast bloom filter
  • a fast DOF filter

If anyone wants more info, or for me to upload a better/longer video or a live playable demo just shout. and I will see what I can do.


things to note:

in order to play tracks backwards and pitch them smoothly I had to cache the whole track as a bytearray 1st, although you can extract on the fly it takes about twice as long (not an issue on higher end machines but a problem all the same… oh for a faster as3).




Lee Brimelow (gave me the idea when he first posted this…)

The code for playing an mp3 in reverse came from a response to lees post.

Andre Michelle (some great code)

DanceDreemer for the model on turbosquid

Music: Usher – Yeah (Speedbreaker remix), Scatman 2003

3d speed

Stage3D optimisation

Having been playing with Stage3D for a while now, I though I would write a small piece on optimisation.

With great power comes great responsibility!

Stage3D give you GPU access, which can expose some serious rendering horsepower, but if you don’t treat it with respect your going to find you run into limitations pretty quick!

So what follows is a rough (very) guide on how to squeeze the most out of the new 3D apis.

Rule 1 (of 1)
CPU’s are fast, GPU’s are faster, communication between the two however is probably the biggest bottleneck you will face.
Therefore: Reduce this wherever possible.
This means minimise calls to the following Context3D functions;
(actually most of Context3D’s functions but the above are the real doozies)

GPU’s can draw triangles fast, and lots of them, millions of them every second without breaking into a sweat.
So you might think, “I can call drawTriangles() 50,000* times no sweat as long as I am only drawing a few triangles in each call.”.
WRONG!! This command is a mighty expensive one so use it very wisely!
How: When you call drawTriangles you pass it a vector of ids that, per three, represent one triangle. Given that this call is expensive it then makes sense that you pass it as many triangles as possible in one call. Sadly this doesn’t quite mean you can just group your geometry into big chunks as you cannot change state (alter the material or any parameters) during this call meaning everything that is sent through will be rendered with the same program and set of constants. It does mean however that static (non moving elements) that share the same program/material, can be combined into one list. Things such as trees, grass and any other repeatable geometry are good candidates for this. You can do this for dynamic geometry also but it gets complex as you have to upload transformation data in a separate buffer, this is one way particle systems can be created. The downside is that each time there is a change to any of the objects the whole buffer will need to be re-uploaded. It is also vital that you only try and draw objects that will be seen on screen, so don’t draw that which is out of view of the camera -> frustum culling saves the day.

side note:
Even high end games rarely want to be issuing more than 1000-2000 draw calls, but the likes of battlefield 3 can get up-to the 3000 mark in some of the environments. Newer consoles however, can issue over 10,000 draw calls and do it much faster due to a more direct access to the hardware.

Assuming you have now done everything in your power to reduce the number of draw calls you issue, next thing to look at is state changes (changing the current program).
Changing state on the GPU might seem like a trivial thing but it is actually something you want to keep to a minimum to be able to squeeze the most out of your graphics card.
There are a few things you can do here to reduce this problem.
1. Group the objects that require drawing by their material/program! For example suppose you had 100 cubes, 50 of them with one material and 50 of them with another. Now if you had a list containing all of those cubes and blindly sent them to be rendered you could end up having to change state a large number of times. If it so happened that each cube in the list had a different material to the object before it, then the program will have to be updated for every draw call. Not good. If that list however was sorted so that

– even if you are only drawing one triangle with this call there

*there is an actual limit of 32,768 drawTriangles() calls per present() call.

This function is what allows us to upload constants to the gpu. It is how we upload our matrices and any float1/2/3/4..s that we want to utilise in our shaders. While it may not be a huge bottleneck it still has a noticeable impact on performance in my experiments.
So how to optimise?
Any constant that is likely to be reused by different materials, then upload only once per frame not once per object rendered. So what are the likely culprits?
The view projection matrix! This is 16 float values that will not change between objects so it makes sense to upload it once! 16 numbers vs 16,000 for a 1000 objects and 999 less calls to setProgramConstants, and that is a good thing!
The same applies to anything else that will not vary between objects, camera and light positions or common numbers used in shaders (0,0.5,1,-1..).
What this shows is that it is important to have some sort of system to manage uploads so you can keep track of what is already uploaded and only upload data that isn’t already there!

side note:
This also translates into how you write shaders, knowing that each constant requires an upload should make you rethink sometimes about how to achieve something whilst using minimal constants, take unpacking a normal from a texture. Usually you would multiple the value from the texture by 2 then subtract one (2 floats required) but the same result can also be achieved with a subtract by 0.5 then a divide 0.5 (1 float required). Perhaps not the best example but I am sure you get the idea. REUSE is your friend!


While I only focused on 3 methods of the context3d, almost all of them will incur some penalty but those highlighted are the ones I have found to be a more serious problem.

Quick additions:
Drawing to a bitmapdata from the gpu is slow, so if you have to do it, ensure it is at a small a resolution as possible, in theory a 1×1 pixel readback should be big enough for picking!
Resizing the back buffer is slow!
Creating textures is slow, don’t do it on the fly. Pre allocate if possible then pick from a pool (more relevant for post processing).

At some point in the future, I hope to write some test examples that highlight the cost of the functions mentioned above (have already done quite a few but they have been tied into other things rather than dedicated standalone tests).

I hope all of that makes some form of sense to someone 🙂 If you have any additions, corrections or questions… fire away. Will probably update this from time to time to add in more that I have missed out, it’s a broad area with many possible optimisations!


Super Quick Summary:

REDUCE DRAW CALLS! Group items where possible and use culling to ensure you are only drawing what is neccessary.

REUSE MATERIALS and BATCH RENDERING by material if you can.




Stage3D stress test (b3d engine) 20,000 primitives ~ 15,000,000 triangles (updated with playable demo)

Click to start.
Mouse move to look
Shift to fly faster
Space to toggle rotation
+ to add 500 doughnuts
– to remove 500 doughnuts
m to change material (4 available)
Once started double click to toggle fullscreen (NOTE all keys bar arrow keys will be disabled)

demo (requires flash player 11 download):

link to play standalone version (better experience):

Let me know how it runs (assuming it does) if you can: frame-rate, number of objects it can handle etcetera.

video 1 (looks like crap so am uploading another one):

video 2 (hopfully looks a bit better..still looks balls, ah well)

Starts to slow down with 2000 – 3000 and above primitives on screen 🙁 there is still room for some efficiency improvements but not bad for now

Demonstrates how essential good culling is, despite what the stats say whilst recording I am able to cull a full scene of 10,000 objects in under 1 ms on the release player no problems…without that it would probably die (with it you can happily navigate a scene with 50,000 objects in it at 60fps on a good machine as the majority are being culled away leaving perhaps only 500-1500 visible at any one time when they are spread out like they are in the demo)


Utils3D.projectVectors vs Utils3D.projectVector vs matrix.transformVector… note to self

just a note to self really…

Utils3D.projectVectors(matrix, vertices, projected, uvt);
vector1 = Utils3D.projectVector(vector0);
vector2 = matrix.transformVector(vector0);

projected[0] == vector2.x/vector2.w ≈≈ vector1.x
projected[1] == vector2.y/vector2.w ≈≈ vector1.y
uvt[2] == 1/vector2.w == 1/vector1.w

not sure why projectVector != projectVectors result but its close enough… wasted too much time trying to find out why they are not the same so any ideas on why that is – let me know!


Dynamic Reflections with Stage3D

First up the video:

This was actually pretty easy to achieve, it basically involves rendering the scene 6 times with a wide angle lens each time rendering to a side of a cube map.
This cube map can then be used as a basis for a reflection shader.

Noteworthy observations:

this will be slow for complex scenes as there could potentially be 7 times more draw calls!!
forgetting to convert degrees to radians can be a pain in the but (doh)
frustum cull with each face render to reduce the number of draw calls
to get it to reflect another reflective object will really start to complicate avoid like hell
could probably get away with rendering to really small textures, as perfect reflections can look less realistic anyway.
there is something wrong with my reflection math as those of you with keen eyes will have noticed the reflected cubes are moving backwards – will fix this soon I hope.

All this was achieved in a very short space of time, about 30 mins of coding and about 2 hours to realise I forgot to convert degrees to radians.

example code:

public function rendertoCubeMap(cubeTexture:CubeTexture, exclude:Object3D = null):void
	this.exclude = exclude;
	var cameraPositionCache:Vector3D = camera.position.clone();
	var cameraTargetCache:Vector3D =;
	var fovCache:Number = camera.fov;		
	var aspectCache:Number = _aspect;		
	camera.fov = cubeFov;
	_aspect = 1;
	context3D.setRenderToTexture(cubeTexture, true, 0, 0); - 1, camera.position.y, camera.position.z);
	context3D.setRenderToTexture(cubeTexture, true, 0, 1); + 1, camera.position.y, camera.position.z);
	context3D.setRenderToTexture(cubeTexture, true, 0, 2);, camera.position.y + 1, camera.position.z+0.001);	//get some NaNs if z = 0 here
	context3D.setRenderToTexture(cubeTexture, true, 0, 3);, camera.position.y - 1, camera.position.z-0.001);		//get some NaNs if z = 0 here
	context3D.setRenderToTexture(cubeTexture, true, 0, 4);, camera.position.y, camera.position.z + 1);
	context3D.setRenderToTexture(cubeTexture, true, 0, 5);, camera.position.y, camera.position.z - 1);
	_aspect = aspectCache;
	camera.fov = fovCache;, cameraTargetCache.y, cameraTargetCache.z);
	camera.position.setTo(cameraPositionCache.x, cameraPositionCache.y, cameraPositionCache.z);
	this.exclude = null;

and that’s it!