Categories
3d speed

Frustum Culling – on steroids!

Am not going to really explain how frustum culling works here or how to set it up, there are already some good articles out there will link to a couple of Actionscript ones at the end. This article will explain how to use caching to very easily boost the speed of your frustum culling. As far as testing goes I haven’t yet seen anything as fast as this so I thought I would share it so anyone else can benefit from it or maybe some bright spark can find a way to improve it… there is always room for improvement where flash is concerned 🙂

Just a super quick reasoning why frustum culling is a good thing if done properly:

If the sphere’s position is the same (i.e. a pointer to) the object that it describes position then NO transformations are needed to move the sphere into world space. Consider trying to check a bounding box, while it might be a tighter fit (woop), it would require that the frustum be transformed into local space or the box into world space… either way that is gonna be seriously slow for large numbers of objects when compared to the bounding sphere checks. Frustum checking is easy to implement and can save you a vast amount of time, no rendering of off screen objects!

This is what the standard frustum culling code looks like:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
{
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
 
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
 
	var plane:Plane3D;
	var distance:Number;
 
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
	{
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;		//object.visible = false;
			break;
		}
	}
}

What the above code is doing is looping through all of you objects or bounding spheres and for each one it projects its position onto each plane that makes up the frustum. If any of the tests show the sphere to be outside of the frustum then it can be marked as not visible and then it will not need to be rendered. As soon as it fails on one plane you can move on to test the next object/sphere but if it passes you will need to go on to check against the next plane until you are sure that it passes all test. (You can see why this is inefficient for objects IN the frustum as 6 checks are required just to confirm that where as the best case is when you can discard an object on the first plane check*)

Now this could be sped up a little bit by unrolling the inner loop but it will only make a very small difference in speed, but it will help.

Introducing the “cache” variable into the mix! Okay here’s where the huge speed up can be gained. In the vast majority of cases, if an object is culled by a frustum plane in one frame and is still culled in the next frame the likelihood that it was culled by the same plane is extremely high! With that in mind if we store the plane that the sphere failed the test on and then test against that one first, for a large number of cases we can eradicate a number of unnecessary plane tests.

Bring on the updated code:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
{
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
 
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
 
	var plane:Plane3D;
	var distance:Number;
	var cache:int = sphere.cache;		//the id of the last plane failed on (-1 if none)
 
	if(cache != -1)
	{
		plane = planes[cache];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0)
		{
			object.cullFlag = -1;	//object.visible = false;
			continue;			//objtect still culled so move to next object
		}
	}
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
	{
		if(j == cache) continue;
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;	//object.visible = false;
			sphere.cache = j;	//set the cache to match the current plane
			break;
		}
	}
	//if it's got this far then the object is not culled so the cache can be reset
	sphere.cache = -1;
}

This time round the code checks to see if a plane has been cached and if so then it checks that one first. This means in the best case (a stationary camera with a stationary scene) all the objects, once they have been culled once already will be culled again with only 1 plane test instead of a maximum of 6! Muchos muchos faster!

There are still things that can be optimised such as not testing the cached plane again in the second section if it passes the check in the first section. You could also unroll all the loops to remove any array/vector lookups for a speed up too.

The method can also easily be extend to check for intersections as well a 1/0, true/false or hit/not hit result. Allowing for other checks such as a bounding box or something more tightly fitting.

I have found this approach to be the fastest yet and my implementation can chew through 10,000 objects in under 1 ms without a problem and this is without any bounding volume hierarchies. (Most flash games will probably not even have that many objects in them anyway but still good to know).

Any questions at all then do ask! Or if I have made any booboos then do let me know.

couple o links:

http://blogs.aerys.in/jeanmarc-leroux/2009/12/13/frustum-culling-in-flash-10/ <– nice code (lacking optimisations but I am sure it is just to show the implementation)
http://jacksondunstan.com/articles/1811 <– includes some camera code to so you can get up and running
http://www.flipcode.com/archives/Frustum_Culling.shtml
http://www.crownandcutlass.com/features/technicaldetails/frustum.html

 

*With that in mind maybe one should check the plane most likely to fail most objects, probably the near and far planes, but not necessarily. You could easy count the the number of culls against each plane as you navigate around your scene and get an average to work out which one is the best culler and check that one first… it will depend on your needs though and could be total overkill, but hey this is flash right and flash IS slow, so every little helps!

Categories
3d

SSAO in stage 3D (source added)

I had deveoped my own, simple implementation of SSAO (Screen Space Ambient Occlusion)  and thought I would try some other implementations to get better resukts.

I’ll use this post to share video’s of progress and demos to follow as well.

 

Video 1:

SSAO pre blur with 8 samples per pixel (hits the agal instruction limit real fast). Still some kinks to iron out for sure, depth buffer is currently really short to help with the accuracy but should be able to improve this I have been able to encode it across 4 channels now not 1.

Runs pretty quick, 60fps is still achievable.

 

 

 

 

 

related/useful links:

SSAO

http://www.gamerendering.com/category/lighting/ssao-lighting/
http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/a-simple-and-practical-approach-to-ssao-r2753
http://www.john-chapman.net/content.php?id=8
http://blog.nextrevision.com/?p=76
http://www.gamerendering.com/2008/09/28/linear-depth-texture/
http://www.theorangeduck.com/page/pure-depth-ssao

DEPTH ENCODING

http://www.gamedev.net/topic/516146-how-to-encode-the-depth-value-in-a-32bit-rgba-texture/
http://www.gamedev.net/topic/486847-encoding-16-and-32-bit-floating-point-value-into-rgba-byte-texture/
http://aras-p.info/blog/2009/07/30/encoding-floats-to-rgba-the-final/
http://aras-p.info/blog/2007/03/03/a-day-well-spent-encoding-floats-to-rgba/
http://asgl.googlecode.com/svn-history/r490/trunk/ASGL_FP11/src/asgl/shaders/agal/AGALCodec.as
http://www.gamerendering.com/2008/09/25/packing-a-float-value-in-rgba/
http://www.gamedev.net/topic/585330-packing-a-generic-float-into-a-rgba-texture/     <– nice function at the end of this one
http://www.gamedev.net/topic/442138-packing-a-float-into-a-a8r8g8b8-texture-shader/

SOURCE CODE::

Here it is folks, later than I would have liked but I have been busier than badger in Spring.
I have done my best to comment what is going on.
It took a lot of playing around with to get it working and muchos muchos trial and error.

PLEASE do take this and try and make it better I am sure there is room for improvement!!!

Also feel free to ask any questions at all and I will try and answer them.
Cheers Daniel Holden (orange duck for the code from which this is ported)
pure depth ssao
(Although I use a normal texture to save on operations)

 
var flags:String = (smooth) ? "linear" : "nearest";
 
//varying registers
var uv_in:String = "v0";
 
//samplers
var tex_normal:String = "fs1";
var tex_depth:String = "fs2";
var tex_noise:String = "fs3";
 
//float3 - float4
var colour:String = "ft0";
var normal:String = "ft1";
var position:String = "ft2";
var random:String = "ft3";
var ray:String = "ft4";
var hemi_ray:String = "ft5";
 
//float1
var depth:String = "ft6.x";
var radiusDepth:String = "ft6.y";
var occlusion:String = "ft6.z";
var occ_depth:String = "ft6.w";
var difference:String = "ft7.x";
var temp:String = "ft7.y";
var temp2:String = "ft7.z";
var temp3:String = "ft7.w";			//NOT USED
var fallOff:String = "ft0.x";		//NOT USED
 
//float2
var uv:String = "ft0";
 
//constants
var radius:String = "fc0.z";
var scale:String = "fc0.w";
var decoder:String = "fc2.z";
var zero:String = "fc0.x";
var one:String = "fc0.y";
var two:String = "fc1.z";			//NOT USED
var thresh:String = "fc1.w";
var neg_one:String = "fc2.y";		//NOT USED
 
var depth_decoder:String = "fc3.xyzw";
 
var area:String = "fc1.x";			//NOT USED
var falloff:String = "fc1.y";
var total_strength:String = "fc2.x";
var base:String = "fc4.x";
 
var invSamples:String = "fc2.w";
 
 
//SHADER OF DOOOOOOOOOM
AGAL.init();
//sample normal at current fragment, and decode
AGAL.tex(normal, uv_in, tex_normal, "2d", "clamp", flags);//ex ft0, v0, fs0 <2d,wrap,linear> \n"+
AGAL.decode(normal, normal, decoder);
//sample deopth at current fragment, and decode
AGAL.tex(colour, uv_in, tex_depth, "2d", "clamp", flags);//ex ft0, v0, fs0 <2d,wrap,linear> \n"+
AGAL.decodeFloatFromRGBA(depth, colour, depth_decoder);
 
//use this instead if depth is not encoded
//AGAL.mov("oc.a", one);
//AGAL.mov("oc", depth);//col+".xyz");
 
//sample random vector
AGAL.mov(uv, uv_in);
AGAL.mul(uv, uv, scale);					
AGAL.tex(random, uv, tex_noise, "2d", "wrap", flags);
//AGAL.mul(random+".z", random+".z", neg_one);		//not sure if negation needed?
 
//position
AGAL.mov(position+".xy", uv_in+".xy");
AGAL.mov(position+".z", depth);
 
//radiusDepth
AGAL.div(radiusDepth, radius, depth);
 
//occlusion
AGAL.mov(occlusion, zero);					
 
for(var i:int=0; i < 8; i++)
{
	//reflect the random normal against the current normal and size accoring to depth, further should be larger
	AGAL.reflect(ray,"fc"+(5+(i*2)), random);	//could just add but will look crap?
	AGAL.mul(ray, ray, radiusDepth);
 
	//dot the ray against normal
	AGAL.dp3(hemi_ray, ray, normal);
	AGAL.sign(hemi_ray, hemi_ray, temp);
	AGAL.mul(hemi_ray, hemi_ray, ray);
	AGAL.add(hemi_ray, hemi_ray, position+".xyz");
 
	//use position to sample from 
	AGAL.sat(hemi_ray+".xy", hemi_ray+".xy");
	AGAL.tex(colour, hemi_ray+".xy", tex_depth, "2d", "clamp", flags);
	AGAL.decodeFloatFromRGBA(occ_depth, colour, depth_decoder);
 
	//gets the difference in depth between the current depth and sampled depth
	AGAL.sub(difference, depth, occ_depth);				
	AGAL.sge(temp, difference, thresh);	// 1 if difference is bigger than the threshold, 0 otherwise
	AGAL.slt(temp2, difference, falloff);	// 1 if difference is less than the falloff, 0 otherwise
 
	//set difference to range 0 - 1 (and clamp)
	AGAL.div(difference, difference, falloff);
	AGAL.mul(difference, temp, difference);						
	AGAL.mul(difference, temp2, difference);
 
	//accumulate the occusion
	AGAL.add(occlusion, occlusion, difference);
}
//bring back into range 0-1
AGAL.mul(occlusion, occlusion, invSamples);
//apply any multiplier
AGAL.mul(occlusion, occlusion, total_strength);
//add it to a base value
AGAL.add(occlusion, occlusion, base);
//invert and boom headshot
AGAL.sub("oc", one, occlusion);
 
var fragmentShader:String = AGAL.code;

here are the constants used:: (some of them didn’t end up being used so they can be left out – too busy/lazy to do that myself yet)

//uv sample offset
var radius:Number = 0.002;		//you should derive from texture size
//noise uv scale
var scaler : Number = 24;		//much smaller and the noise blocks become more apparent
//unused at the mo
var falloff : Number = 0.05;		//not using this so ignore it / remove it
//unused at the mo
var area : Number = 5;			//not using this so ignore it / remove it
//the depth difference threshold
var depthThresh:Number = 0.0001;
//strength of the effect
var total_strength : Number = 1;
//base value for the effect
var base : Number = 0;
 
var sample_sphere:Vector.<Number> = new Vector.<Number>();
 
sample_sphere.push( 0.5381, 0.1856,-0.4319, 0);
sample_sphere.push( 0.1379, 0.2486, 0.4430, 0);
sample_sphere.push( 0.3371, 0.5679,-0.0057, 0); 
sample_sphere.push(-0.6999,-0.0451,-0.0019, 0);
sample_sphere.push( 0.0689,-0.1598,-0.8547, 0); 
sample_sphere.push( 0.0560, 0.0069,-0.1843, 0);
sample_sphere.push(-0.0146, 0.1402, 0.0762, 0); 
sample_sphere.push( 0.0100,-0.1924,-0.0344, 0);
sample_sphere.push(-0.3577,-0.5301,-0.4358, 0); 
sample_sphere.push(-0.3169, 0.1063, 0.0158, 0);
sample_sphere.push( 0.0103,-0.5869, 0.0046, 0); 
sample_sphere.push(-0.0897,-0.4940, 0.3287, 0);
sample_sphere.push( 0.7119,-0.0154,-0.0918, 0); 
sample_sphere.push(-0.0533, 0.0596,-0.5411, 0);
sample_sphere.push( 0.0352,-0.0631, 0.5460, 0); 
sample_sphere.push(-0.4776, 0.2847,-0.0271, 0);
 
//...
 
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 0, Vector.<Number>([0, 1, radius, scaler]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 1, Vector.<Number>([area, falloff, 2, depthThresh]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 2, Vector.<Number>([total_strength, -1, 0.5, 1/8]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 3, Vector <Number>[1/(255*255*255), 1/(255*255), 1/255, 1]);
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 4, Vector.<Number>([base, 0, 0, 0]));
context3D.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 5, sample_sphere);

Enjoy! On-line demo to follow soonish, and please to feedback with any comments or improvements.

IMPORTANT NOTE:
the values in this shader are very VERY important, tiny tweaks/mistakes can throw the whole thing off so do be carefull and don’t be supprised if the world explodes when you play around with it.

🙂

b