Efficiently handling endian differences using negative memory addressing

Posted in emulators

A common issue in emulation and virtualization is endian difference between the host and guest architectures. Common cases of that are N64, Wii, GC, PS3, and XBOX360 that are big endian emulated on x86/x64 (PC) that is little endian.

What is endian order?

Endian order is the order in which bytes are stored in memory. For small numbers all is good and great as they fit in a single byte, but when we have to split number into multiple bytes there are many options on how to arrange the information.

First, there is the little endian order, which the only option in x86, and supported by most other architectures. In little endian order the “least” important bytes come first. So when you have to write “1024” in 4 bytes you write 1024 = 2 * 256 + 0 -> 0, 2, 0, 0.

Then, there is the big endian order. PowerPC and motorolla (and Apple) have traditionally used this format. This order is more natural to how we write numbers on paper and the most important digits come first. So, “1024” is stored as 0, 0, 2, 0.

A neat solution - “The XOR trick”

A typical niceish way to handle the endian issue is to have the memory “pre swapped”. This means that you pick the most common access size (usually 32 or 64 bits) and the bytes in memory are arranged such that operations using that access size don’t need any conversion.

Xbox 360 memory layout:

[ #0, #1, #2, #3, #4, #5, #6, #7, #8, #9, #10, #11, #12, ... ]

Emulator memory layout with “32 bit pre-swapping”:

[ #3, #2, #1, #0, #7, #6, #5, #4, #11, #10, #9, #8, ... ]

This makes 32 bit access fast, but smaller accesses need XOR tricks:

mem_u8[address ^ 3]  -> correct byte access
mem_u16[address ^ 2] -> correct 16-bit access

But on modern cpus bswap is almost as fast as xor. And we still have to swap anything more than 32 bits.

A neater solution - “Negative memory addressing”

A long while ago I was pondering about getting nullDCe to run on the wii. During that period I had an idea that would solve the endian issue more elegantly.

Memory addressing usually starts from the beginning, so [0] is the first byte, [1] is the second byte. But if you access the data in the “other way around” — pointing to the end of the memory, with [-1] being the last byte — you get a mirrored, swapped view.

Emulator memory layout with “Negative Addressing”:

[ #11, #10, #9, #8, #7, #6, #5, #4, #3, #2, #1, #0 ]

The generic formula is mem[length - access_size_in_bytes - address] and works for arbitrary long accesses, including vectors and everything.

mem_8e[-0]  = mem_8[12-1-0]  = mem_8[11]  = #0   // Yay!
mem_16e[-0] = mem_16[12-2-0] = mem_16[10]  = [#1, #0]  // Yay!
mem_32e[-0] = mem_32[12-4-0] = mem_32[8]   = [#3, #2, #1, #0]  // Yay!
mem_64e[-0] = mem_64[12-8-0] = mem_64[4]   = [#7, #6, #5, #4, #3, #2, #1, #0]  // Amazing!

It also happens to work for misaligned access. Since the operations are linear, you can also keep side-counting pointers or even negate the value of registers that are used as pointers inside a basic block. This can be really beneficial when you have many memory accesses on the same block.

Lots of thanks to benvanik for starting the xenia project, that’s how all of this resurfaced from some dusty corner of my mind.