# Hexagonal coordinate representations in Hexeline

One of the central design principles in my still extremely early space shooter
*Hexeline* (read: has neither space nor shooting) is that spacecraft will be
defined in terms of a hexagonal grid, rather than cells of varying shape, which
simplifies things for both the software and the player while still allowing
more interesting shapes than a square grid.

A useful property of regular grids is that they are *directly addressable*;
i.e., you can put the elements of an array and immediately access an element
just by knowing its coordinate. Ideally, it’s also efficient to go from
continuous spatial coordinates to a grid coordinate. This is trivial in a
square grid: depending on how you define your coordinates, you can usually just
divide spatial coordinates by the cell size to get the coordinate of the cell a
point falls within.

In effort to get a similar property for hexagonal grids, I’ve taken the unusual step of basing a Newtonian physics engine on non-cartesian coordinates. It turns out this is not as bad as it sounds, and quite a few useful properties fall out of it.

Note: This post is going to be comparatively math-heavy and assumes some familiarity with linear algebra.

## Performance Concerns

Before going into doing crazy things in the name of performance, I think it’s necessary to go over why the performance limits are so tight.

First, some simple target numbers: A minimum requirements system should be able to handle a simulation of 10’000 objects with 10 ms resolution, i.e., 100 frames per second. This initially looks like we’re talking about doing one million object updates per second, or 1 μs each on average.

The reality is more delicate due to the plans for handling networked play.
There’s two general approaches to networking: *state replication* and *event
replication*. State replication is conceptually simpler, and simply involves
each participant occasionally transmitting their current state. This is what my
prior space shooter, *Abendstern*, did, as well as quite a few other games. The
main downsides are high bandwidth use if the state is large (Abendstern had an
exceedingly complicated protocol to alleviate this) and that it is hard or
impossible to prevent cheating by simply modifying the client.

On the other hand, with event replication, only events from outside the game (i.e., player controls in our case) are transmitted, and all participants derive the state by running the simulation with those particular events. Cheating by modifying the client is impossible because other clients won’t have that modification; e.g., if you make a “speed hack”, you may well see yourself zooming around, but other players will see you slowly flying about like an idiot.

Supporting event replication generally requires building it in from the ground
up. Calls to things like the C standard `rand()`

are off-limits because
different clients would derive different states. If a participant is not fast
enough to run the simulation at the required speed, they *cannot participate*
because there is no way to reduce the simulation granularity like most games do
as a by-product of reduced frame rate.

Real-time games using event replication also have the awkward problem that they
can learn about an event after that event’s time has passed. This necessitates
being able to “rewind” time to apply the event, then “fast forward” back to the
current time. With a real network in play, events will *always* be received
after their nominal time. If we receive, say, 50 event batches per second, we
need to rewind/fast-forward time 50 times per second. The amount of time is
equal to the network latency; assuming that averages 200ms, we end up playing
200ms of time forward 50 times per second, or 10 seconds of simulation time per
second of wall time.

In other words, the game needs to be able to run at ten times its natural
speed, so rather than one microsecond average per object update, we have one
hundred *nanoseconds*. On a 2 GHz processor, that’s two hundred *clock
cycles*, or around four hundred instructions if the code manages to saturate a
two-unit superscalar processor. That’s about the same amount of time that a
single integer division takes.

Also, keep in mind that updating an object is not just a question of updating
position/velocity/acceleration; the engine needs to also do collision detection
at the same time. Most objects need to be significantly *faster* than the
100 ns limit to make room for the occasional update that takes longer.

Having established “counting clock cycles” territory, let’s move on to coordinate systems.

## Hexagonal Coordinates

The original idea here came from Hexagon grids: coordinate systems and distance calculations. I highly recommend reading that article, but a short summary of the relevant bits follows anyway.

There are a few ways to assign axes to a hexagonal grid. To avoid confusion with normal cartesian coordinates, we’ll be using A, B, … to reference these axes instead of the usual X, Y, … convention.

A common method is to “zig-zag” one of the axes, as shown below. This results in a grid that can be thought of as a square grid with every odd column offset by half a square, since the zig-zagged axis is still perpendicular to the other axis on average.

The downsides of having a zig-zag axis is that the resulting space is
*non-linear*. Any code which considers the area around such a hexagonal
coordinate needs separate cases for even/odd columns. As described in the
article, distance calculations are pretty tricky. There is also no obvious
interpretation to continuous coordinates in this space — where would (0.5,0.5)
fall on the above grid?

In order to make the space linear, we need to make the axes non-perpendicular.
As described in the article, we can make distances work out by adding a *third
axis* with the constraint that the sum of the coordinates is zero (i.e.,
$A+B+C=0$). This results in a grid that looks like this:

This coordinate system is a bit counter-intuitive at first, especially given
the way *all* the coordinates change as one moves horizontally.

There are two equivalent ways to interpret $(a,b,c)$ coordinates:

They represent $(x’,y’,z’)$ cartesian coordinates on the 3-dimensional plane $x’+y’+z’=0$. A point on this plane corresponds to the 2-dimensional point $(x,y)$ where $(x,y)$ is the orthographic projection of $(x’,y’,z’)$ such that the projected axes are at 120° angles to each other.

$(a,b,c)$ are coordinates in a vector space that transforms to $(x,y,z)$ coordinates with $z=0$.

Given these interpretations, a great deal of extremely useful properties reveal themselves:

There is a sensible interpretation of continuous coordinates.

Given the correct scaling factor, the L2 length of vector $(a,b,c)$ can be made to equal the L2 length of the corresponding $(x,y)$ vector.

An affine transform can be used to convert between cartesian and hexagonal coordinates. In other words, translating between the two is simply a single matrix-vector multiplication. Additionally, since each axis constrained entirely by the other two, the $(a,b)$ space by itself is also an affine transform of $(x,y)$. $$\table [\table a;b;c] = [\table √2/√3, 0, 0; -1/{√6}, 1/√2, 0; -1/{√6}, -1/√2, 0] [\table x;y;0]; [\table x;y] = [\table √3/√2, 0; 1/√2, √2] [\table a; b]$$

Any affine transform against cartesian space can itself be transformed to have the same effect against hexagonal coordinates.

Because length is always preserved and this is an affine transform from cartesian space, angles in hexagonal space measure the same as they do in cartesian space.

Since the $c$ coordinate can always be calculated as $-a-b$, hexagonal coordinates can generally be represented simply as $(a,b)$, an oblique two-dimensional space. This is Hexeline’s primary coordinate system. This has some interesting fallout, such as turning the common “bounding box” into a bounding rhombus.

Of course, the decision to use hexagonal coordinates pervasively was made for reasons beyond them being similar enough to cartesian coordinates to be workable; the properties above simply make them minimally practical. What makes them desirable are the operations that can be done efficiently on hexagonal coordinates but not on cartesian coordinates.

## Translating to cell coordinates

As previously mentioned, one of the main motivations for using a hexagonal grid was to make cells directly addressible. For this to be useful, we need a way to efficiently determine a cell coordinate from an arbitrary point in space. This turns out to be quite efficient in hexagonal space.

The first step is to approximate the cell coordinate by simply rounding the $(a,b)$ coordinates towards negative infinity. For example, in the graph to the right, all points in the shaded area are rounded to the top left corner of that area.

This might look like a strange approximation at first: only a minority of the rhombus is placed into the correct cell, and only a minority of that cell is covered by the rhombus. However, it’s actually ideal for the next step, as every rhombus covers exactly four cells in exactly the same way, and every cell is covered by four rhombi in exactly the same way. As we’ll shortly see, four happens to be a very convenient number of candidates to have.

Once we’ve obtained the approximation, we know the point falls within one of four cells, offset from the approximation by $(0,0)$, $(0,1)$, $(1,0)$, and $(1,1)$, respectively. To determine the final answer, we simply determine which of those candidates actually contains the point in question. To do this, we find the displacement $(Δa,Δb,Δc)$ from the point to the centre of a hexagon. The point is within the hexagon if $max (Δa,Δb,Δc)- min (Δa,Δb,Δc)≤1$.

There isn’t an obvious way to illustrate the geometry behind what makes this condition work. To make it marginally more intuitive, note that there are exactly 6 combinations of displacements that this formula can output — one for each edge of the hexagon. The correctness can be verified by plotting coordinates along the edges of the unit hexagon and observing that it holds.

Along the edges of the hexagons, this test finds that a point is in two hexagons; similarly, at the vertices, it finds that the point is in all three neighbours. In Hexeline, handling these edge cases precisely doesn’t matter, so one hit is chosen arbitrarily.

That sounds like a lot to do, but with a few tricks we can reduce it down to fairly few operations.

In Hexeline, coordinates are represented as signed 32-bit integers, or `i32`

in
Rust parlance. The cell size (i.e., “1.0” in the abstract math above) is
selected to be a power of two in order for the modulus and division operations
to be reducible to bit masks and shifts. Finally, an `i32x4`

is a vector of 4
`i32`

s, essentially an SSE XMM register. With that said, we can start
translating all this to real code.

We’ll start with an $(a,b)$ coordinate in one register; i.e., $a$ is in lane 0 and $b$ in lane 1.

```
1 let coord: i32x4 = /* some point */;
```

Now, recall how we have exactly four candidate hexagons? We also have exactly
four SIMD lanes, so we can test all four *in parallel*. To prepare, we set up
two SSE registers containing just the $a$ coordinate repeated four times, and
just the $b$ coordinate repeated four times.

```
1 let base_a = i32x4::splat(coord.extract(0));
2 let base_b = i32x4::splat(coord.extract(1));
```

From these, we can calculate the coordinates of the four candidate hexagons as well as their $Δa$ and $Δb$.

```
1 // Offset both coordinates by 0 or 1 cells.
2 let off_a = base_a + i32x4::new(0, 0, CELL_SIZE, CELL_SIZE);
3 let off_b = base_b + i32x4::new(0, CELL_SIZE, 0, CELL_SIZE);
4 // Round down to the "whole integer" coordinates. We can do this by just
5 // zeroing out the lower few bits of each lane.
6 let candidate_a = base_a & i32x4::splat(MASK);
7 let candidate_b = base_b & i32x4::splat(MASK);
8
9 let disp_a = candidate_a - base_a;
10 let disp_b = candidate_b - base_b;
```

What about $Δc$? Well, since we know that $a + b + c = 0$, i.e. $c = -a-b$, we know that $Δc = -Δa-Δb$, so we can simply calculate it.

```
1 let disp_c = - disp_a - disp_b;
```

Calculating the distance function is pretty straight-forward.

```
1 // .max() and .min() expand to the `pmaxsd` and `pminsd` SSE4.1
2 // instructions when available and are emulated otherwise.
3 let max_disp = disp_a.max(disp_b).max(disp_c);
4 let min_disp = disp_a.min(disp_b).min(disp_c);
5 let distance = max_disp - min_disp;
```

Now we know the distance from all four candidates, we simply must select a
candidate whose lane in `distance`

is less than or equal to `CELL_SIZE`

. First,
we can subtract `CELL_SIZE+1`

from every lane to cause the sign bit to be set
on all lanes where the membership function is satisfied (assuming no overflow).
Then, we can use the `movmskps`

(“move mask”) SSE instruction to find out which
lanes have the sign bit set.

```
1 let valid_lanes = (distance - i32x4::splat(CELL_SIZE + 1)).movemask();
2 // valid_lanes is a 4-bit value containing a 1 bit for a candidate
3 // lane which is within `CELL_SIZE` distance or 0 otherwise.
```

An obvious way to go from here would be to select a lane, then extract that
lane from `candidate_a`

and `candidate_b`

. This works but is very slow, since
AMD64 does not have any way to access a runtime-determined lane from an SSE
register. If you do this, the compiler implements it by writing the SSE
register to memory and then treating it like an array and reading the desired
element back.

Instead, we should compute the top-left cell coordinate and then derive the $(a,b)$ offset of the candidate we want. The offset for each axis is either 0 or 1, so we can simply use an integer as a bitset and select a value out of it. I’m going to gloss over the exact contents of these lookup tables for this post since their derivation is uninteresting.

```
1 // Divide the input coordinate by `CELL_SIZE`
2 let base_cell = coord >> CELL_SHIFT;
3 // Compute A and B offsets directly from the movemask output.
4 // The binary constants control how we select from among the candidates.
5 let a_off = (0b11111100 >> (valid_lanes >> 1)) & 1;
6 let b_off = (0b11110010 >> (valid_lanes >> 1)) & 1;
7 // Finally, we can return the cell coordinate
8 (base_cell.extract(0) + a_off, base_cell.extract(1) + b_off)
```

This in fact compiles to extremely efficient code:

```
1 ; Put the A and B coordinates into their own registers
2 pshufd xmm1,xmm0,0x0
3 pshufd xmm2,xmm0,0x55
4 ; Calculate candidate coordinates
5 movdqa xmm3,XMMWORD PTR [rip+0x9fbee]
6 paddd xmm3,xmm1
7 movdqa xmm4,XMMWORD PTR [rip+0x9fbf2]
8 pand xmm3,xmm4
9 movdqa xmm5,XMMWORD PTR [rip+0x9fbf6]
10 paddd xmm5,xmm2
11 pand xmm5,xmm4
12 ; Calculate A, B, C displacements
13 psubd xmm3,xmm1
14 psubd xmm5,xmm2
15 movdqa xmm1,xmm3
16 paddd xmm1,xmm5
17 pxor xmm2,xmm2
18 psubd xmm2,xmm1
19 ; Calculate overall distance
20 movdqa xmm1,xmm3
21 pmaxsd xmm1,xmm5
22 pmaxsd xmm1,xmm2
23 pminsd xmm5,xmm3
24 pminsd xmm5,xmm2
25 ; Subtract CELL_SIZE-1.
26 movdqa xmm2,XMMWORD PTR [rip+0x9fcc6]
27 psubd xmm2,xmm5
28 ; Finish up distance calculation
29 paddd xmm2,xmm1
30 ; The move mask instruction to find out which candidate(s)
31 ; are within the right distance
32 movmskps ecx,xmm2
33 ; Select bits from the bitsets for A and B offset
34 shr ecx,1
35 mov edx,0xfc
36 shr edx,cl
37 and edx,0x1
38 mov esi,0xf2
39 shr esi,cl
40 and esi,0x1
41 ; Divide the input by CELL_SIZE
42 psrad xmm0,0x9
43 ; Extract the A and B coordinates and apply their offsets
44 movd eax,xmm0
45 add eax,edx
46 pextrd ecx,xmm0,0x1
47 add ecx,esi
48 ; Combine the result tuple and return
49 shl rcx,0x20
50 or rax,rcx
51 ret
```

That’s 36 instructions (excluding the return sequence), which my K10 processor can chew through in around 3 to 4 nanoseconds (12 to 16 clock cycles).

What’s this useful for? Collision detection, specifically testing whether a point particle collides with, e.g., a ship, after the course bounding-rhombus collision detection finds a candidate pair. With this capability, this test is simply a matter of finding the cell coordinate, then checking whether the ship has a something there.

Collision detection between ships uses a very similar algorithm to find pairs of overlapping hexagons.

## Conclusion

That about wraps it up for today, though there almost certainly will be more posts on this topic in the future.

If you want to read through more details, the source code is available online.