Frustum Culling – on steroids!

Am not going to really explain how frustum culling works here or how to set it up, there are already some good articles out there will link to a couple of Actionscript ones at the end. This article will explain how to use caching to very easily boost the speed of your frustum culling. As far as testing goes I haven’t yet seen anything as fast as this so I thought I would share it so anyone else can benefit from it or maybe some bright spark can find a way to improve it… there is always room for improvement where flash is concerned :-)

Just a super quick reasoning why frustum culling is a good thing if done properly:

If the sphere’s position is the same (i.e. a pointer to) the object that it describes position then NO transformations are needed to move the sphere into world space. Consider trying to check a bounding box, while it might be a tighter fit (woop), it would require that the frustum be transformed into local space or the box into world space… either way that is gonna be seriously slow for large numbers of objects when compared to the bounding sphere checks. Frustum checking is easy to implement and can save you a vast amount of time, no rendering of off screen objects!

This is what the standard frustum culling code looks like:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
{
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
 
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
 
	var plane:Plane3D;
	var distance:Number;
 
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
	{
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;		//object.visible = false;
			break;
		}
	}
}

What the above code is doing is looping through all of you objects or bounding spheres and for each one it projects its position onto each plane that makes up the frustum. If any of the tests show the sphere to be outside of the frustum then it can be marked as not visible and then it will not need to be rendered. As soon as it fails on one plane you can move on to test the next object/sphere but if it passes you will need to go on to check against the next plane until you are sure that it passes all test. (You can see why this is inefficient for objects IN the frustum as 6 checks are required just to confirm that where as the best case is when you can discard an object on the first plane check*)

Now this could be sped up a little bit by unrolling the inner loop but it will only make a very small difference in speed, but it will help.

Introducing the “cache” variable into the mix! Okay here’s where the huge speed up can be gained. In the vast majority of cases, if an object is culled by a frustum plane in one frame and is still culled in the next frame the likelihood that it was culled by the same plane is extremely high! With that in mind if we store the plane that the sphere failed the test on and then test against that one first, for a large number of cases we can eradicate a number of unnecessary plane tests.

Bring on the updated code:

//loop thorugh your objects or spheres
for (var i : int = 0; i < length; i++)
{
	var object:Object3D = objects[i];
	object.cullFlag = 1;		//object.visible = true;
 
	var sphere:BoundingSphere = object.sphere;
	var position:Vector3D = sphere.position;
	var radius:Number = sphere.radius;
 
	var plane:Plane3D;
	var distance:Number;
	var cache:int = sphere.cache;		//the id of the last plane failed on (-1 if none)
 
	if(cache != -1)
	{
		plane = planes[cache];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0)
		{
			object.cullFlag = -1;	//object.visible = false;
			continue;			//objtect still culled so move to next object
		}
	}
	for (var j : int = 0; j < 6; j++)	//6 planes in the frustum (planes.length)
	{
		if(j == cache) continue;
		plane = planes[j];
		distance = plane.a * position.x + plane.b * position.y + plane.c * position.z + plane.d;
		if (distance + radius < 0.0){
			object.cullFlag = -1;	//object.visible = false;
			sphere.cache = j;	//set the cache to match the current plane
			break;
		}
	}
	//if it's got this far then the object is not culled so the cache can be reset
	sphere.cache = -1;
}

This time round the code checks to see if a plane has been cached and if so then it checks that one first. This means in the best case (a stationary camera with a stationary scene) all the objects, once they have been culled once already will be culled again with only 1 plane test instead of a maximum of 6! Muchos muchos faster!

There are still things that can be optimised such as not testing the cached plane again in the second section if it passes the check in the first section. You could also unroll all the loops to remove any array/vector lookups for a speed up too.

The method can also easily be extend to check for intersections as well a 1/0, true/false or hit/not hit result. Allowing for other checks such as a bounding box or something more tightly fitting.

I have found this approach to be the fastest yet and my implementation can chew through 10,000 objects in under 1 ms without a problem and this is without any bounding volume hierarchies. (Most flash games will probably not even have that many objects in them anyway but still good to know).

Any questions at all then do ask! Or if I have made any booboos then do let me know.

couple o links:

http://blogs.aerys.in/jeanmarc-leroux/2009/12/13/frustum-culling-in-flash-10/ <– nice code (lacking optimisations but I am sure it is just to show the implementation)
http://jacksondunstan.com/articles/1811 <– includes some camera code to so you can get up and running
http://www.flipcode.com/archives/Frustum_Culling.shtml
http://www.crownandcutlass.com/features/technicaldetails/frustum.html

 

*With that in mind maybe one should check the plane most likely to fail most objects, probably the near and far planes, but not necessarily. You could easy count the the number of culls against each plane as you navigate around your scene and get an average to work out which one is the best culler and check that one first… it will depend on your needs though and could be total overkill, but hey this is flash right and flash IS slow, so every little helps!

5 Responses to “Frustum Culling – on steroids!”

  1. Nice trick. Thanks for sharing. Small question – when you cull objects what is the next step you do before rendering? You create new geometry for batch without culled objects? Or you call drawTriangles on every object inside frustrum (yeah, it’s slowly, but who knows…)? Or maybe something else?

  2. bwhiting says:

    After I have worked out what objects are visible to the camera the next step is to simply, as you mentioned, just loop through them and render them with draw triangles.

    You could get more advanced sort the visible objects into groups to minimise state changes such as changing the current program on the GPU or what buffers are being used etc… Or you could even go hardcore and try to perform occlusion culling as another step before rendering.

    Generally though frustum culling will be enough for most flash use cases. If, for instance, you started with say 2000 objects and 1500 got culled, rendering the 500 remaining objects shouldn’t really be a problem for most machines.

  3. [...] Frustum Culling – on steroids! [...]

  4. Gary says:

    Tried implementing this in XNA 4.0. For a few thousand models, I get no difference in frame rate between this and the inbuilt BoundingFrustum check with XNA. Maybe they’re already using this?

  5. bwhiting says:

    Not sure about XNA, they may well be doing it already. It is more geared for folks who are rolling their own culling.

    If you compared your own version with itself (both with and without the caching) you should then see the difference. Also might be worth getting the actual timings from the function, the speed gain/loss may not be enough to see in frame rate but may show in milliseconds.

    Also when testing this I keep a count of the number of frustum checks in a variables then toggled the caching and the difference was very big! Scene dependant of course but if your objects are well spread out in a scene then benefits are biggest. Also note that with every object contained in the frustum this method will actually be a tiny bit slower… but in most situations not everything is in view at one time.

Leave a Reply