Speeding up Swift code using an enum

The most significant limitation of SwiftRay is its slowness. A few weeks ago, I had already sped it up by exploiting more than one core. Since my Mac has 4 cores, it ran about 3,5 times faster.

The algorithm

The first version of SwiftRay was built after a book, which I bought the second volume, Ray Tracing: the Next Week. It exposes an algorithm, the Bounding Volume Hierarchy, which consists in sorting primitives — only spheres at the moment — in a binary tree. The principle is that instead of testing if a ray hits any object in the scene, we first test if it hits the volume of the whole hierarchy at all. If it does, then we test for the two halves, then their two halves, and so on, until we get to a leaf of the tree — that is a primitive.

The speed-up comes from the facts that:

  1. a lot of rays won’t hit any object at all
  2. we can get quickly to the object that does receive the ray

Straight from C++

The original C++ code is the following:

A bvh_node inherits the hitable “protocol” (thought there’s no such thing in C++). Its constructor method takes a list of hitables, which are actually the primitives in the scene.

I will spare you the implementation of the constructor, which actually produces the binary tree. When the tree has been built, each bvh_node has a left hitable and a right hitable as children, which either have two children as well, or point on the same hitable.

The following method determines whether the volume (bounding box) of the BVH node is hit. If it is, it calls itself recursively for its left and right children to see which one is closer, or it hit at all:

 

Let us take a look at my implementation. It is nearly a straight port, with the exception that my Swift code makes use of optionals instead of booleans to tell if there is a hit:

 

When the optimization slows down computations

I had a problem: images still rendered, but computations were actually much slower ! By a factor of 2 to 40 depending on the scene.

Since I’m only human, my premise was that my code had an error. And yes it had: the code to determine if a Bounding Box was hit was wrong. This would make the process slower. But once fixed, it still was very slow.

My second thought was that maybe building the tree was buggy. This would also slow down by big amounts the code since it would not tell quickly if a volume was hit. It was very hard to see in the debugger, so I eventually introduced Rectangle primitives, so I could draw Bounding Boxes. And to my surprise, my code was correct!

That was the time I thought that profiling my code could provide me an insight, so I fired up Instruments > Time Profiler, to state that the code spent a lot of time into

Searching the web, I learned that “protocol witness” is a virtualization table to solve the conformance to protocol.  That was odd to me. BoundingNode being a struct, I expected to have a straight call to the hit() method, with no indirection.

I watched a video of the WWDC ’16 which proved me I was wrong. Yes, Swift does Dynamic Dispatch on structs when they conform to a protocol.

Once, I learned a little Haskell

A few months ago, I took some time off to learn the basics of Haskell. A lot of exercises in the books have to do with… Trees.

A binary tree in Haskell may be declared as:

In English: A Tree is either a Leaf (with a as data) or a Node with two child trees.

I have watched a presentation which was boring for the most part, but had an interesting point: Haskell’s Algebraic Data types can be implemented as Enums in Swift:

The indirect keyword is a recent addition to Swift (version 3?) and informs the compiler that the recursive definition of the enum is voluntary.

In case you have not yet understood: I am going to replace the struct calling its own method with an enum, to get rid of Dynamic Dispatch.

The new code

I don’t find the new implementation very elegant, since I need the box() method to extract the Bounding Box from the Tree, but I don’t know Swift enough yet to envision a better solution.

Results

I won’t make you wait: yes, the implementation is efficient now. I’m very happy for the improvement, since I was hopping for a speed improvement of 2 times.

Simple Scene

200 x 200 px @ 100 rays/px
without BVH: 10.76 s
with BVH: 11.01 s
speed-up: 0.977 times

In this Scene, there is a slight slow down, which is not surprising since there are only 4 spheres, and hierarchizing them does not provide any benefit.

Sphere Array Scene

200 x 200 px @ 100 rays/px
without BVH: 42.25 s
with BVH:  11.42 s
speed-up: 3.7 times

I designed this scene knowing that the algorithm would be efficient. Removing the rectangles makes it more efficient.

Big and Small Spheres

400 x 266 px @ 100 rays/px
without BVH: 523 s
with BVH: 48s
speed-up: 10.9 times

The algorithm becomes really efficient when there are a lot of objects, which only occupy small portions of the space. The speed up is impressive in this reference scene.