Archive for the ‘3d’ Category

3D Procedural Geometry using revolutions

Thursday, August 9th, 2012

Many of the basic primitives we utilize in 3D can be constructed procedurally. All the way from simple planes to complex shapes.

Today I thought I would really quickly share something I use a lot – a revolution mesh. The idea has been around for ages and can be found most 3D packages i.e. the lathe modifier in 3DSMax and its in as3 3D libs like Away3D I think. The good thing about this technique is that it can be used to produce a number of different shapes all with one function (which makes for a smaller code base as well as flexibility).

The idea is simple, pass in a list of 2D points and rotate them about an axis for any given number of times and then construct a solid mesh out of the result.

First up, a demo:

link to standalone version: here

Okay so not very impressive but all of those objects were created in the same way, just revolving a series of points about and axis (in this instance the y/up axis).

So the code to generate them looks like this:

numRevolutions = 25;
revolutions = new Vector.<Object3D>();
 
revolutions.push(generateRevolutionObject(SplineBuilder.generateTorus(5,2,10)));
revolutions.push(generateRevolutionObject(SplineBuilder.generateCircle(5,15)));
revolutions.push(generateRevolutionObject(SplineBuilder.generateCone()));
revolutions.push(generateRevolutionObject(SplineBuilder.generateTube()));
revolutions.push(generateRevolutionObject(SplineBuilder.generateDisc()));
revolutions.push(generateRevolutionObject(SplineBuilder.generateCylinder()));
revolutions.push(generateRevolutionObject(SplineBuilder.generateArc(5, 5, 0, 0.25)));
revolutions.push(generateRevolutionObject(SplineBuilder.generateWave(10,20,5,0.5)));
revolutions.push(generateRevolutionObject(SplineBuilder.generateWaveAbs(10,20,5,0.5)));
revolutions.push(generateRevolutionObject(SplineBuilder.generateSquareWave(10,20,5,0.5)));
revolutions.push(generateRevolutionObject(SplineBuilder.generateLine(5,5,5,-5)));
 
scene.addObjects(revolutions);
 
private function generateRevolutionObject(spline:Vector.<Number>):Object3D
{
	//builds the mesh based on the number of revolutions and the points
	var mesh:RevolutionMesh = new RevolutionMesh(numRevolutions, spline);	
	var object:Object3D = new Object3D().build(mesh, material);
	//gives the object a parent transform so can be rotated about the scene origin
	object.transform.parent = transformer;					
	return object;
}

The magic happens in the SplineBuilder and the RevolutionMesh. The SplineBuilder is just a really simple class with static methods to generate lists of points:

//couple of example functions from SplineBuilder
public static function generateCone(height:Number = 10, radius:Number = 5):Vector.<Number>
{
	var spline:Vector.<Number> = new Vector.<Number>();
	spline.push(0, height, 0);
	spline.push(radius, 0, 0);
	spline.push(0, 0, 0);
	return spline;
}
public static function generateCylinder(height:Number = 10, radius:Number = 5):Vector.<Number>
{
	var spline:Vector.<Number> = new Vector.<Number>();
	spline.push(0,  height/2, 0);
	spline.push(radius, height/2, 0);
	spline.push(radius, -height/2, 0);
	spline.push(0, -height/2, 0);
	return spline;
}
public static function generateTube(height:Number = 10, radius:Number = 5):Vector.<Number>
{
	var spline:Vector.<Number> = new Vector.<Number>();
	spline.push(radius/2,  height/2, 0);
	spline.push(radius, height/2, 0);
	spline.push(radius, -height/2, 0);
	spline.push(radius/2, -height/2, 0);
	spline.push(radius/2,  height/2, 0);
	return spline;
}

See, super simple! You can of course come up with these however you like such as drawn user input or curve interpolations.

So the final piece is the RevolutionMesh (or lathe or whatever you want to call it). Its job is to take the input, transform it around an axis and storing the vertices. Then build the uv’s and indices. (You can then go on to generate normals and tangents etc…)

private function build():void
{
	if(vertices)		vertices.length = 0;
	else 			vertices = new Vector.<Number>();			
	if(uvs)		uvs.length = 0;
	else 			uvs = new Vector.<Number>();
	if(ids)		ids.length = 0;
	else 			ids = new Vector.<uint>();
	if(normals)		normals.length = 0;
	else 			normals = new Vector.<Number>();	
 
	//doesn't have to revolve the whole way round, _start and _end are values from 0-1 so a start of 0 and end of 0.5 would mean a 0 - 180 degrees of revolution
	var totalAngle:Number = (Math.PI*2)*(_end-_start);
	var angle:Number = totalAngle/_divisions;
	var startAngle:Number = (Math.PI*2)*_start;
 
	for (var i : int = 0; i < _divisions+1; i++)
	{
		var vout:Vector.<Number> = new Vector.<Number>();
		transformer.identity();
		transformer.appendRotation((startAngle*Math3D.RADIANS_TO_DEGREES) + (angle*i*Math3D.RADIANS_TO_DEGREES), _axix);	
		transformer.transformVectors(_spline, vout);
		vertices = vertices.concat(vout);
		//could do the same for normals if supplied ;)
	}
	var splineLength:int = _spline.length/3;
	for (i = 0; i < _divisions+1; i++)
	{
		for (var j:int = 0; j < splineLength; j++)
		{
			uvs.push(i/(_divisions), j/(splineLength-1));
		}
	}
	for (i = 0; i < _divisions; i++)
	{
		for (j = 0; j < splineLength-1; j++)
		{
			var id0:uint = ((i+1)*splineLength)+j;
			var id1:uint = (i*splineLength)+j;
			var id2:uint = ((i+1)*splineLength)+j+1;
			var id3:uint = (i*splineLength)+j+1;
			ids.push(id0, id2, id1);
			ids.push(id3, id1, id2);
		}				
	}			
	//go forth and generate normals etc...
}

It works by firstly using a matrix3D to rotate the points and copy them into a vertex list. Next it calculates the uvs based upon vertices’ index in the original point list and the revolution index (u based on the rev index and v based on the position in the point list). Once that is done its a simple case of assigning the indices!

There you have it, some fairly simple code which can be reused to create a crap load of geometry.

Hope that helps someone out, shout if you have and questions or spot any problems etc…

b

Frustum Culling – on steroids!

Monday, June 18th, 2012

Am not going to really explain how frustum culling works here or how to set it up, there are already some good articles out there will link to a couple of Actionscript ones at the end. This article will explain how to use caching to very easily boost the speed of your frustum culling. As far as testing goes I haven’t yet seen anything as fast as this so I thought I would share it so anyone else can benefit from it or maybe some bright spark can find a way to improve it… there is always room for improvement where flash is concerned :-)

Just a super quick reasoning why frustum culling is a good thing if done properly:

If the sphere’s position is the same (i.e. a pointer to) the object that it describes position then NO transformations are needed to move the sphere into world space. Consider trying to check a bounding box, while it might be a tighter fit (woop), it would require that the frustum be transformed into local space or the box into world space… either way that is gonna be seriously slow for large numbers of objects when compared to the bounding sphere checks. Frustum checking is easy to implement and can save you a vast amount of time, no rendering of off screen objects!

This is what the standard frustum culling code looks like:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
{
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
 
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
 
	var plane:Plane3D;
	var distance:Number;
 
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
	{
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;		//object.visible = false;
			break;
		}
	}
}

What the above code is doing is looping through all of you objects or bounding spheres and for each one it projects its position onto each plane that makes up the frustum. If any of the tests show the sphere to be outside of the frustum then it can be marked as not visible and then it will not need to be rendered. As soon as it fails on one plane you can move on to test the next object/sphere but if it passes you will need to go on to check against the next plane until you are sure that it passes all test. (You can see why this is inefficient for objects IN the frustum as 6 checks are required just to confirm that where as the best case is when you can discard an object on the first plane check*)

Now this could be sped up a little bit by unrolling the inner loop but it will only make a very small difference in speed, but it will help.

Introducing the “cache” variable into the mix! Okay here’s where the huge speed up can be gained. In the vast majority of cases, if an object is culled by a frustum plane in one frame and is still culled in the next frame the likelihood that it was culled by the same plane is extremely high! With that in mind if we store the plane that the sphere failed the test on and then test against that one first, for a large number of cases we can eradicate a number of unnecessary plane tests.

Bring on the updated code:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
{
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
 
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
 
	var plane:Plane3D;
	var distance:Number;
	var cache:int = sphere.cache;		//the id of the last plane failed on (-1 if none)
 
	if(cache != -1)
	{
		plane = planes[cache];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0)
		{
			object.cullFlag = -1;	//object.visible = false;
			continue;			//objtect still culled so move to next object
		}
	}
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
	{
		if(j == cache) continue;
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;	//object.visible = false;
			sphere.cache = j;	//set the cache to match the current plane
			break;
		}
	}
	//if it's got this far then the object is not culled so the cache can be reset
	sphere.cache = -1;
}

This time round the code checks to see if a plane has been cached and if so then it checks that one first. This means in the best case (a stationary camera with a stationary scene) all the objects, once they have been culled once already will be culled again with only 1 plane test instead of a maximum of 6! Muchos muchos faster!

There are still things that can be optimised such as not testing the cached plane again in the second section if it passes the check in the first section. You could also unroll all the loops to remove any array/vector lookups for a speed up too.

The method can also easily be extend to check for intersections as well a 1/0, true/false or hit/not hit result. Allowing for other checks such as a bounding box or something more tightly fitting.

I have found this approach to be the fastest yet and my implementation can chew through 10,000 objects in under 1 ms without a problem and this is without any bounding volume hierarchies. (Most flash games will probably not even have that many objects in them anyway but still good to know).

Any questions at all then do ask! Or if I have made any booboos then do let me know.

couple o links:

http://blogs.aerys.in/jeanmarc-leroux/2009/12/13/frustum-culling-in-flash-10/ <– nice code (lacking optimisations but I am sure it is just to show the implementation)
http://jacksondunstan.com/articles/1811 <– includes some camera code to so you can get up and running
http://www.flipcode.com/archives/Frustum_Culling.shtml
http://www.crownandcutlass.com/features/technicaldetails/frustum.html

 

*With that in mind maybe one should check the plane most likely to fail most objects, probably the near and far planes, but not necessarily. You could easy count the the number of culls against each plane as you navigate around your scene and get an average to work out which one is the best culler and check that one first… it will depend on your needs though and could be total overkill, but hey this is flash right and flash IS slow, so every little helps!

SSAO in stage 3D (source added)

Friday, June 15th, 2012

I had deveoped my own, simple implementation of SSAO (Screen Space Ambient Occlusion)  and thought I would try some other implementations to get better resukts.

I’ll use this post to share video’s of progress and demos to follow as well.

 

Video 1:

SSAO pre blur with 8 samples per pixel (hits the agal instruction limit real fast). Still some kinks to iron out for sure, depth buffer is currently really short to help with the accuracy but should be able to improve this I have been able to encode it across 4 channels now not 1.

Runs pretty quick, 60fps is still achievable.

 

 

 

 

 

related/useful links:

SSAO

http://www.gamerendering.com/category/lighting/ssao-lighting/
http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/a-simple-and-practical-approach-to-ssao-r2753
http://www.john-chapman.net/content.php?id=8
http://blog.nextrevision.com/?p=76
http://www.gamerendering.com/2008/09/28/linear-depth-texture/
http://www.theorangeduck.com/page/pure-depth-ssao

DEPTH ENCODING

http://www.gamedev.net/topic/516146-how-to-encode-the-depth-value-in-a-32bit-rgba-texture/
http://www.gamedev.net/topic/486847-encoding-16-and-32-bit-floating-point-value-into-rgba-byte-texture/
http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-the-final/
http://aras-p.info/blog/2007/03/03/a-day-well-spent-encoding-floats-to-rgba/
http://asgl.googlecode.com/svn-history/r490/trunk/ASGL_FP11/src/asgl/shaders/agal/AGALCodec.as
http://www.gamerendering.com/2008/09/25/packing-a-float-value-in-rgba/
http://www.gamedev.net/topic/585330-packing-a-generic-float-into-a-rgba-texture/     <– nice function at the end of this one
http://www.gamedev.net/topic/442138-packing-a-float-into-a-a8r8g8b8-texture-shader/

SOURCE CODE::

Here it is folks, later than I would have liked but I have been busier than badger in Spring.
I have done my best to comment what is going on.
It took a lot of playing around with to get it working and muchos muchos trial and error.

PLEASE do take this and try and make it better I am sure there is room for improvement!!!

Also feel free to ask any questions at all and I will try and answer them.
Cheers Daniel Holden (orange duck for the code from which this is ported)
pure depth ssao
(Although I use a normal texture to save on operations)

 
var flags:String = (smooth) ? "linear" : "nearest";
 
//varying registers
var uv_in:String = "v0";
 
//samplers
var tex_normal:String = "fs1";
var tex_depth:String = "fs2";
var tex_noise:String = "fs3";
 
//float3 - float4
var colour:String = "ft0";
var normal:String = "ft1";
var position:String = "ft2";
var random:String = "ft3";
var ray:String = "ft4";
var hemi_ray:String = "ft5";
 
//float1
var depth:String = "ft6.x";
var radiusDepth:String = "ft6.y";
var occlusion:String = "ft6.z";
var occ_depth:String = "ft6.w";
var difference:String = "ft7.x";
var temp:String = "ft7.y";
var temp2:String = "ft7.z";
var temp3:String = "ft7.w";			//NOT USED
var fallOff:String = "ft0.x";		//NOT USED
 
//float2
var uv:String = "ft0";
 
//constants
var radius:String = "fc0.z";
var scale:String = "fc0.w";
var decoder:String = "fc2.z";
var zero:String = "fc0.x";
var one:String = "fc0.y";
var two:String = "fc1.z";			//NOT USED
var thresh:String = "fc1.w";
var neg_one:String = "fc2.y";		//NOT USED
 
var depth_decoder:String = "fc3.xyzw";
 
var area:String = "fc1.x";			//NOT USED
var falloff:String = "fc1.y";
var total_strength:String = "fc2.x";
var base:String = "fc4.x";
 
var invSamples:String = "fc2.w";
 
 
//SHADER OF DOOOOOOOOOM
AGAL.init();
//sample normal at current fragment, and decode
AGAL.tex(normal, uv_in, tex_normal, "2d", "clamp", flags);//ex ft0, v0, fs0 <2d,wrap,linear> \n"+
AGAL.decode(normal, normal, decoder);
//sample deopth at current fragment, and decode
AGAL.tex(colour, uv_in, tex_depth, "2d", "clamp", flags);//ex ft0, v0, fs0 <2d,wrap,linear> \n"+
AGAL.decodeFloatFromRGBA(depth, colour, depth_decoder);
 
//use this instead if depth is not encoded
//AGAL.mov("oc.a", one);
//AGAL.mov("oc", depth);//col+".xyz");
 
//sample random vector
AGAL.mov(uv, uv_in);
AGAL.mul(uv, uv, scale);					
AGAL.tex(random, uv, tex_noise, "2d", "wrap", flags);
//AGAL.mul(random+".z", random+".z", neg_one);		//not sure if negation needed?
 
//position
AGAL.mov(position+".xy", uv_in+".xy");
AGAL.mov(position+".z", depth);
 
//radiusDepth
AGAL.div(radiusDepth, radius, depth);
 
//occlusion
AGAL.mov(occlusion, zero);					
 
for(var i:int=0; i < 8; i++)
{
	//reflect the random normal against the current normal and size accoring to depth, further should be larger
	AGAL.reflect(ray,"fc"+(5+(i*2)), random);	//could just add but will look crap?
	AGAL.mul(ray, ray, radiusDepth);
 
	//dot the ray against normal
	AGAL.dp3(hemi_ray, ray, normal);
	AGAL.sign(hemi_ray, hemi_ray, temp);
	AGAL.mul(hemi_ray, hemi_ray, ray);
	AGAL.add(hemi_ray, hemi_ray, position+".xyz");
 
	//use position to sample from 
	AGAL.sat(hemi_ray+".xy", hemi_ray+".xy");
	AGAL.tex(colour, hemi_ray+".xy", tex_depth, "2d", "clamp", flags);
	AGAL.decodeFloatFromRGBA(occ_depth, colour, depth_decoder);
 
	//gets the difference in depth between the current depth and sampled depth
	AGAL.sub(difference, depth, occ_depth);				
	AGAL.sge(temp, difference, thresh);	// 1 if difference is bigger than the threshold, 0 otherwise
	AGAL.slt(temp2, difference, falloff);	// 1 if difference is less than the falloff, 0 otherwise
 
	//set difference to range 0 - 1 (and clamp)
	AGAL.div(difference, difference, falloff);
	AGAL.mul(difference, temp, difference);						
	AGAL.mul(difference, temp2, difference);
 
	//accumulate the occusion
	AGAL.add(occlusion, occlusion, difference);
}
//bring back into range 0-1
AGAL.mul(occlusion, occlusion, invSamples);
//apply any multiplier
AGAL.mul(occlusion, occlusion, total_strength);
//add it to a base value
AGAL.add(occlusion, occlusion, base);
//invert and boom headshot
AGAL.sub("oc", one, occlusion);
 
var fragmentShader:String = AGAL.code;

here are the constants used:: (some of them didn’t end up being used so they can be left out – too busy/lazy to do that myself yet)

//uv sample offset
var radius:Number = 0.002;		//you should derive from texture size
//noise uv scale
var scaler : Number = 24;		//much smaller and the noise blocks become more apparent
//unused at the mo
var falloff : Number = 0.05;		//not using this so ignore it / remove it
//unused at the mo
var area : Number = 5;			//not using this so ignore it / remove it
//the depth difference threshold
var depthThresh:Number = 0.0001;
//strength of the effect
var total_strength : Number = 1;
//base value for the effect
var base : Number = 0;
 
var sample_sphere:Vector.<Number> = new Vector.<Number>();
 
sample_sphere.push( 0.5381, 0.1856,-0.4319, 0);
sample_sphere.push( 0.1379, 0.2486, 0.4430, 0);
sample_sphere.push( 0.3371, 0.5679,-0.0057, 0); 
sample_sphere.push(-0.6999,-0.0451,-0.0019, 0);
sample_sphere.push( 0.0689,-0.1598,-0.8547, 0); 
sample_sphere.push( 0.0560, 0.0069,-0.1843, 0);
sample_sphere.push(-0.0146, 0.1402, 0.0762, 0); 
sample_sphere.push( 0.0100,-0.1924,-0.0344, 0);
sample_sphere.push(-0.3577,-0.5301,-0.4358, 0); 
sample_sphere.push(-0.3169, 0.1063, 0.0158, 0);
sample_sphere.push( 0.0103,-0.5869, 0.0046, 0); 
sample_sphere.push(-0.0897,-0.4940, 0.3287, 0);
sample_sphere.push( 0.7119,-0.0154,-0.0918, 0); 
sample_sphere.push(-0.0533, 0.0596,-0.5411, 0);
sample_sphere.push( 0.0352,-0.0631, 0.5460, 0); 
sample_sphere.push(-0.4776, 0.2847,-0.0271, 0);
 
//...
 
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 0, Vector.<Number>([0, 1, radius, scaler]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 1, Vector.<Number>([area, falloff, 2, depthThresh]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 2, Vector.<Number>([total_strength, -1, 0.5, 1/8]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 3, Vector <Number>[1/(255*255*255), 1/(255*255), 1/255, 1]);
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 4, Vector.<Number>([base, 0, 0, 0]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 5, sample_sphere);

Enjoy! On-line demo to follow soonish, and please to feedback with any comments or improvements.

IMPORTANT NOTE:
the values in this shader are very VERY important, tiny tweaks/mistakes can throw the whole thing off so do be carefull and don’t be supprised if the world explodes when you play around with it.

:)

b

3D turntables using b3d (a stage3d engine)

Wednesday, May 30th, 2012

Hooked up b3d to bdj, a little mp3 wrapper for advanced mp3 playback and this was the result:

 

Added another video (higher res)…


demo features:

(runs at 60 fps on my machine, just down to 25-26 while recording)

bdj

  • crossfade
  • volume
  • pitch adjustment
  • addition bend
  • and all the usual stuff like: play pause stop seek cue…

b3d

  • mouse ray generation
  • ray plane intersections (for the mouse dragging)
  • as3 native rendering for the interactive areas
  • screen space anti aliasing (experimental)
  • a fast bloom filter
  • a fast DOF filter

If anyone wants more info, or for me to upload a better/longer video or a live playable demo just shout. and I will see what I can do.

 

things to note:

in order to play tracks backwards and pitch them smoothly I had to cache the whole track as a bytearray 1st, although you can extract on the fly it takes about twice as long (not an issue on higher end machines but a problem all the same… oh for a faster as3).

 

 

Acknowledgements:

Lee Brimelow (gave me the idea when he first posted this…)

http://www.leebrimelow.com/?p=1129

The code for playing an mp3 in reverse came from a response to lees post.

Andre Michelle (some great code)

http://blog.andre-michelle.com/2009/pitch-mp3/

DanceDreemer for the model on turbosquid

Music: Usher – Yeah (Speedbreaker remix), Scatman 2003

Stage3D optimisation

Monday, March 19th, 2012

Having been playing with Stage3D for a while now, I though I would write a small piece on optimisation.

With great power comes great responsibility!

Stage3D give you GPU access, which can expose some serious rendering horsepower, but if you don’t treat it with respect your going to find you run into limitations pretty quick!

So what follows is a rough (very) guide on how to squeeze the most out of the new 3D apis.

Rule 1 (of 1)
CPU’s are fast, GPU’s are faster, communication between the two however is probably the biggest bottleneck you will face.
Therefore: Reduce this wherever possible.
This means minimise calls to the following Context3D functions;
Context3D.drawTriangles();
Context3D.setProgram();
Context3D.setProgramConstants…();
(actually most of Context3D’s functions but the above are the real doozies)

drawTriangles()
GPU’s can draw triangles fast, and lots of them, millions of them every second without breaking into a sweat.
So you might think, “I can call drawTriangles() 50,000* times no sweat as long as I am only drawing a few triangles in each call.”.
WRONG!! This command is a mighty expensive one so use it very wisely!
How: When you call drawTriangles you pass it a vector of ids that, per three, represent one triangle. Given that this call is expensive it then makes sense that you pass it as many triangles as possible in one call. Sadly this doesn’t quite mean you can just group your geometry into big chunks as you cannot change state (alter the material or any parameters) during this call meaning everything that is sent through will be rendered with the same program and set of constants. It does mean however that static (non moving elements) that share the same program/material, can be combined into one list. Things such as trees, grass and any other repeatable geometry are good candidates for this. You can do this for dynamic geometry also but it gets complex as you have to upload transformation data in a separate buffer, this is one way particle systems can be created. The downside is that each time there is a change to any of the objects the whole buffer will need to be re-uploaded. It is also vital that you only try and draw objects that will be seen on screen, so don’t draw that which is out of view of the camera -> frustum culling saves the day.

side note:
Even high end games rarely want to be issuing more than 1000-2000 draw calls, but the likes of battlefield 3 can get up-to the 3000 mark in some of the environments. Newer consoles however, can issue over 10,000 draw calls and do it much faster due to a more direct access to the hardware.

setProgram()
Assuming you have now done everything in your power to reduce the number of draw calls you issue, next thing to look at is state changes (changing the current program).
Changing state on the GPU might seem like a trivial thing but it is actually something you want to keep to a minimum to be able to squeeze the most out of your graphics card.
There are a few things you can do here to reduce this problem.
1. Group the objects that require drawing by their material/program! For example suppose you had 100 cubes, 50 of them with one material and 50 of them with another. Now if you had a list containing all of those cubes and blindly sent them to be rendered you could end up having to change state a large number of times. If it so happened that each cube in the list had a different material to the object before it, then the program will have to be updated for every draw call. Not good. If that list however was sorted so that

- even if you are only drawing one triangle with this call there

*there is an actual limit of 32,768 drawTriangles() calls per present() call.

setProgramConstants…()
This function is what allows us to upload constants to the gpu. It is how we upload our matrices and any float1/2/3/4..s that we want to utilise in our shaders. While it may not be a huge bottleneck it still has a noticeable impact on performance in my experiments.
So how to optimise?
Any constant that is likely to be reused by different materials, then upload only once per frame not once per object rendered. So what are the likely culprits?
The view projection matrix! This is 16 float values that will not change between objects so it makes sense to upload it once! 16 numbers vs 16,000 for a 1000 objects and 999 less calls to setProgramConstants, and that is a good thing!
The same applies to anything else that will not vary between objects, camera and light positions or common numbers used in shaders (0,0.5,1,-1..).
What this shows is that it is important to have some sort of system to manage uploads so you can keep track of what is already uploaded and only upload data that isn’t already there!

side note:
This also translates into how you write shaders, knowing that each constant requires an upload should make you rethink sometimes about how to achieve something whilst using minimal constants, take unpacking a normal from a texture. Usually you would multiple the value from the texture by 2 then subtract one (2 floats required) but the same result can also be achieved with a subtract by 0.5 then a divide 0.5 (1 float required). Perhaps not the best example but I am sure you get the idea. REUSE is your friend!

—————————————–

While I only focused on 3 methods of the context3d, almost all of them will incur some penalty but those highlighted are the ones I have found to be a more serious problem.

Quick additions:
Drawing to a bitmapdata from the gpu is slow, so if you have to do it, ensure it is at a small a resolution as possible, in theory a 1×1 pixel readback should be big enough for picking!
Resizing the back buffer is slow!
Creating textures is slow, don’t do it on the fly. Pre allocate if possible then pick from a pool (more relevant for post processing).

At some point in the future, I hope to write some test examples that highlight the cost of the functions mentioned above (have already done quite a few but they have been tied into other things rather than dedicated standalone tests).

I hope all of that makes some form of sense to someone :) If you have any additions, corrections or questions… fire away. Will probably update this from time to time to add in more that I have missed out, it’s a broad area with many possible optimisations!

 

Super Quick Summary:

REDUCE DRAW CALLS! Group items where possible and use culling to ensure you are only drawing what is neccessary.

REUSE MATERIALS and BATCH RENDERING by material if you can.

UPLOAD THE MINIMUM NUMBER OF CONSTANTS you can get away with :)

 

Stage3D stress test (b3d engine) 20,000 primitives ~ 15,000,000 triangles (updated with playable demo)

Thursday, February 23rd, 2012

key:
Click to start.
WASD or LEFT/RIGHT/UP/DOWN to fly
Mouse move to look
Shift to fly faster
Space to toggle rotation
+ to add 500 doughnuts
- to remove 500 doughnuts
m to change material (4 available)
Once started double click to toggle fullscreen (NOTE all keys bar arrow keys will be disabled)

demo (requires flash player 11 download):

link to play standalone version (better experience):
here

Let me know how it runs (assuming it does) if you can: frame-rate, number of objects it can handle etcetera.

video 1 (looks like crap so am uploading another one):

video 2 (hopfully looks a bit better..still looks balls, ah well)

Starts to slow down with 2000 – 3000 and above primitives on screen :( there is still room for some efficiency improvements but not bad for now

Demonstrates how essential good culling is, despite what the stats say whilst recording I am able to cull a full scene of 10,000 objects in under 1 ms on the release player no problems…without that it would probably die (with it you can happily navigate a scene with 50,000 objects in it at 60fps on a good machine as the majority are being culled away leaving perhaps only 500-1500 visible at any one time when they are spread out like they are in the demo)

Utils3D.projectVectors vs Utils3D.projectVector vs matrix.transformVector… note to self

Thursday, January 5th, 2012

just a note to self really…

given:
Utils3D.projectVectors(matrix, vertices, projected, uvt);
vector1 = Utils3D.projectVector(vector0);
vector2 = matrix.transformVector(vector0);

then:
projected[0] == vector2.x/vector2.w ≈≈ vector1.x
projected[1] == vector2.y/vector2.w ≈≈ vector1.y
uvt[2] == 1/vector2.w == 1/vector1.w

not sure why projectVector != projectVectors result but its close enough… wasted too much time trying to find out why they are not the same so any ideas on why that is – let me know!

Dynamic Reflections with Stage3D

Thursday, November 24th, 2011

First up the video:

This was actually pretty easy to achieve, it basically involves rendering the scene 6 times with a wide angle lens each time rendering to a side of a cube map.
This cube map can then be used as a basis for a reflection shader.

Noteworthy observations:

this will be slow for complex scenes as there could potentially be 7 times more draw calls!!
forgetting to convert degrees to radians can be a pain in the but (doh)
frustum cull with each face render to reduce the number of draw calls
to get it to reflect another reflective object will really start to complicate things..so avoid like hell
could probably get away with rendering to really small textures, as perfect reflections can look less realistic anyway.
there is something wrong with my reflection math as those of you with keen eyes will have noticed the reflected cubes are moving backwards – will fix this soon I hope.

All this was achieved in a very short space of time, about 30 mins of coding and about 2 hours to realise I forgot to convert degrees to radians.

example code:

public function rendertoCubeMap(cubeTexture:CubeTexture, exclude:Object3D = null):void
{
	this.exclude = exclude;
	var cameraPositionCache:Vector3D = camera.position.clone();
	var cameraTargetCache:Vector3D = camera.target.clone();
	var fovCache:Number = camera.fov;		
	var aspectCache:Number = _aspect;		
	camera.fov = cubeFov;
	_aspect = 1;
 
	camera.position.setTo(0,0,0);
 
	context3D.setRenderToTexture(cubeTexture, true, 0, 0);
	camera.target.setTo(camera.position.x - 1, camera.position.y, camera.position.z);
	update();
	renderAll();
 
	context3D.setRenderToTexture(cubeTexture, true, 0, 1);
	camera.target.setTo(camera.position.x + 1, camera.position.y, camera.position.z);
	update();
	renderAll();
 
	context3D.setRenderToTexture(cubeTexture, true, 0, 2);
	camera.target.setTo(camera.position.x, camera.position.y + 1, camera.position.z+0.001);	//get some NaNs if z = 0 here
	update();
	renderAll();
 
	context3D.setRenderToTexture(cubeTexture, true, 0, 3);
	camera.target.setTo(camera.position.x, camera.position.y - 1, camera.position.z-0.001);		//get some NaNs if z = 0 here
	update();
	renderAll();
 
	context3D.setRenderToTexture(cubeTexture, true, 0, 4);
	camera.target.setTo(camera.position.x, camera.position.y, camera.position.z + 1);
	update();
	renderAll();
 
	context3D.setRenderToTexture(cubeTexture, true, 0, 5);
	camera.target.setTo(camera.position.x, camera.position.y, camera.position.z - 1);
	update();
	renderAll();
 
	context3D.setRenderToBackBuffer();
 
	_aspect = aspectCache;
	camera.fov = fovCache;
	camera.target.setTo(cameraTargetCache.x, cameraTargetCache.y, cameraTargetCache.z);
	camera.position.setTo(cameraPositionCache.x, cameraPositionCache.y, cameraPositionCache.z);
 
	this.exclude = null;
}

and that’s it!

:)

A few screen captures of some stage3d experiments

Monday, November 21st, 2011

Molehill Convolution

Friday, October 21st, 2011

Updated: 17/11/2011

First up here is the demo:
click here

Controls:
Mouse to move camera
Mouse wheel to zoom
Space bar to toggle mouse movement
check box to turn the post processing on or off
number 0-9 on keyboard to try some preloaded filters (emboss, blur, edge detection etc…)
slider to control the sample offset
you can enter your own values into the matrix if you like, but be nice as there isn’t any error checking

let me know if it works for you!
will need flash player 11 to view
get flash player

video of it running if the demo fails:

So what is a convolution filter and how does it work?

Basically a convolution filter (with regards to images) is where for each pixel to be sampled, a set of surrounding pixels is also sampled. The influence of the surrounding pixels is controlled by a matrix.

i.e.
[0,0,0,
0,1,0,
0,0,0]

The current pixel is represented by the middle value.

The 1st value in the matrix is a sample taken from the pixel North-West of the current pixel
2nd is North
3rd is North-East
4th is East

i.e.
[NW, N, NE,
W , x, E
SW , S, SE]

x = the current pixel

so the positions in the matrix determine where the samples come from and the values in the matrix determine their influence.

in the matrix
[0,0,0,
0,1,0,
0,0,0]
all the surrounding pixels will have no (0) influence on the output

with the following matrix
[1,1,1,
1,1,1,
1,1,1]
they all have equal weighting effectively blurring the image

various effects can be achieved by using difference weight combinations such as edge detection and sharpening.

How to do it with the AGAL helper:

simply call:
AGAL.convolve(output, sample, texture, uv, offset, matrix)+

where output is the destination register
where sample is a temporary register for storing the modified uvs
where texture is the texture to sample from
where uv is the current uv coord of the sample (usually passed from the vertex shader)
where offset* is the value to offset the samples by (you will need to calculate this with something like 1/viewportWidth)
where matrix** is the convolution matrix

*at the moment the same offset is used for x and y when in reality there should be two (one for each axis) will add this in at a later date as it hasn’t caused an issue so far.

**
sample code to upload a matrix of 9 values from a vector of numbers called “_filter” into fragment constant 0 – 2:

context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 0, Vector.<Number>([_filter[0],_filter[1],_filter[2], 1]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 1, Vector.<Number>([_filter[3],_filter[4],_filter[5], 1]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 2, Vector.<Number>([_filter[6],_filter[7],_filter[8], 1]));

for this to work properly you should normalise you matrix so that the sum of the values in your matrix == 1.
here is some code to do that:

var sum:Number = 0;
for(var i:int = 0 ; i < _filter.length; i++)
{
	sum += _filter[i];
}
for(i = 0 ; i < _filter.length; i++)
{
	if(_filter[i] != 0 && sum != 0)_filter[i] /= sum;
}

this would turn the matrix

[1,1,1,
1,1,1,
1,1,1]

into

[1/9,1/9,1/9,
1/9,1/9,1/9,
1/9,1/9,1/9]

good luck :)