# Efficient Bounding Rhombi

A *bounding box* generally refers to an axis-aligned rectangular region of
space used as a first, coarse step of collision detection. Every object is
given a bounding box which covers all space the object could possibly occupy.
If the bounding boxes of two objects overlap, the simulation needs to do a more
precise but expensive collision check; but if they do not overlap, they
certainly do not collide and the pair can be skipped.

Since *Hexeline* uses an oblique coordinate
system, the same approach results in
bounding rhombi instead of boxes, though that distinction is not particularly
interesting here. What is interesting is how bounding boxes/rhombi can be
handled extremely efficiently with SIMD. I fully expect that this technique has
been discovered by someone else previously, but it’s still worth discussing.

The rest of this post will just use “bounding box” and normal (X,Y) coordinates so familiarity with “hexagonal coordinates” is not necessary.

There are a couple ways a bounding box can be represented. The one we’re interested in here is the 2D interval representation, wherein we have two points $(x_0,y_0)$ and $(x_1,y_1)$ such that $x_0 ≤ x_1$ and $y_0 ≤ y_1$. We then say that any point $(x,y)$ is within the box if $x_0≤x≤x_1$ and $y_0≤y≤y_1$.

Notice there are exactly four coordinate values. Four is a magic number here:
it means we can make full use of a SIMD word! In other words, we can pack
$(x_0,y_0,x_1,y_1)$ in a single `i32x4`

(i.e., a SIMD vector of 4 32-bit
signed integers):

```
1 // A "newtype" which gets us type-safety from accidentally mixing with other
2 // uses of `i32x4`.
3 #[derive(Clone, Copy, Debug)]
4 struct BoundingBox(i32x4);
5 impl BoundingBox {
6 fn new(x0: i32, y0: i32, x1: i32, y1: i32) -> Self {
7 BoundingBox(i32x4::new(x0, y0, x1, y1))
8 }
9 }
```

Before getting into how we check these for overlap, let’s think about another
operation: union. In order to efficiently find candidate objects for collision
checks, a common approach is to build some form of tree of bounding boxes. This
requires each node to have a bounding box that encompasses at least all space
of its children, i.e., to be a *union* of its children.

We can certainly implement union with the current representation.

```
1 impl BoundingBox {
2 fn union(self, other: Self) -> Self {
3 // x0 and y0 are chosen from the minima of the inputs
4 let min = self.0.min(other.0);
5 // x1 and y1 are chosen from the maxima of the inputs
6 let max = self.0.max(other.0);
7 // Take x0 and y0 from `min`, x1 and y1 from `max`
8 BoundingBox(min.blend(max, 00, 01, 12, 13))
9 }
10 }
```

Unfortunately, this isn’t great. With SSE4.1, it’s just a `pminsd`

, `pmaxsd`

,
`pblendw`

sequence. But for systems without `min`

and `max`

instructions, these
need to be emulated, and the fact that there are two of them causes a large
portion of the register file to be used just for this operation. It could be
improved by inlining the `min`

and `max`

emulation to remove a lot of the
redundant work, but we can still make something better for both cases.

The only reason we needed the extra instructions here is that the condition for
choosing the lower-bound coordinate is different from that of choosing the
upper-bound coordinate. With a simple tweak to the representation, we can make
that condition the same. We simply *negate* $x_0$ and $y_0$, which allows
finding the minimum $x_0$ and $y_0$ to be done by finding the maximum of their
negated values. Thus, all lanes are selected by the same criterion.

```
1 impl BoundingBox {
2 fn new(x0: i32, y0: i32, x1: i32, y1: i32) -> Self {
3 BoundingBox(i32x4::new(-x0, -y0, x1, y1)) // Note negative signs
4 }
5
6 fn union(self, other: Self) -> Self {
7 BoundingBox(self.0.max(other.0))
8 }
9 }
```

With SSE4.1, this produces a single `pmaxsd`

instruction. ARM with Neon is also
a single instruction. `max`

still needs to be emulated elsewhere, but there’s
now a lot less to do.

Checking whether two interval-based bounding boxes overlap is simply a matter of checking whether both one-dimensional intervals they comprise overlap.

Suppose we have a pair of one-dimensional intervals, $(l_0,l_1)$ and
$(r_0,r_1)$. It may be non-obvious at first, but they overlap if and only if
$l_0 ≤ r_1$ and $r_0 ≤ l_1$. However, in our bounding box representation, we
don’t have the lower bounds directly; we have the *negated* lower bounds, so we
would need to check $–(–l_0) ≤ r_1$ and $–(–r_0) ≤ l_1$.

This looks pretty inconvenient from a SIMD perspective: we’re doing a different operation on each lane. But it can actually be simplified quite a bit. We can start by putting all the $l$ terms on the left-hand side.

$–(–l_0) ≤ r_1$ and $l_1 ≥ –(–r_0)$

One of the $r$ terms is negated here; the other is not, so multiply both sides of that equation by $–1$ (which also changes ≤ to ≥).

$(–l_0) ≥ –r_1$ and $l_1 ≥ –(–r_0)$

We now have something SIMD-friendly. Every lane of the right-hand side is negated, and we then perform the same comparison on every lane after a shuffle. We can view this as a two-step process: “invert” the right-hand side by negating all the terms and putting the upper bounds in the lower bounds’ lanes and vice-versa, then perform the comparison.

The first is fairly easy to implement:

```
1 // Another "newtype" around i32x4 so that we can't accidentally forget the
2 // invert step
3 #[derive(Clone, Copy, Debug)]
4 struct InvertedBoundingBox(i32x4);
5 impl BoundingBox {
6 fn invert(self) -> InvertedBoundingBox {
7 // Lanes: 0 1 2 3
8 // Input: (-x0, -y0, x1, y1)
9 // Output: (-x1, -y1, x0, y0)
10 InvertedBoundingBox((-self.0).shuf(2, 3, 0, 1))
11 }
12 }
```

For the comparison step, we want to check that every lane of the left-hand side
is greater than or equal to the corresponding lane of the right-hand side.
Assuming space is not so large that overflow is a concern, we know that if
$a≥b$, the result of $a-b$ will never have its sign bit set. The “move mask”
primitive (`movmskps`

in SSE speak) will give us the sign bit of all lanes in a
single integer. Putting these together, we can simply subtract the two vectors
and test that all the sign bits are zero.

```
1 impl BoundingBox {
2 fn overlaps(self, other: InvertedBoundingBox) -> bool {
3 let diff = self.0 - other.0;
4 0 == diff.movemask()
5 }
6 }
```

That’s about it for bounding boxes. It may seem like all this just makes already fast operations even faster, but both unions and overlap tests must be performed multiple times per update for every object and so count against Hexeline’s 100 nanosecond budget, so making them as fast as possible is extremely important.