RISC-V HCF instructions

Its authors have recently updated the GhostWrite academic research paper. The MMU bypass vulnerability in T-Head's C910 processor core is the highlight of the paper, and gave it its nickname.

But the updated paper also reveals a so-called HFC Halt and Catch Fire instruction in the newer SpacemiT X60 processor core. This comes on top of another HFC found in T-Head's C908 already in the first public preprint version of the article. This means both the first and second commercially available processor designs featuring the RISC-V Vector extension (RVV) are susceptible to an HFC.


Not really but sort of advertising

Open-source multimedia RISC-V development, code review and maintenance in the FFmpeg multimedia framework and the VLC media player are currently almost at a standstill as funding dried up. I cannot realistically keep up entirely in my free time.

To put it bluntly, I need an RISC-V Vector expert to help with FFmpeg code reviews and/or a new sponsor to resume FFmpeg RISC-V work.


Decoding RISC-V instructions

The HFC instructions can be found within GhostWrite artifacts on GitHub in RISC-V inline assembler:

T-Head C908
.fill 1, 4, 0x20b00087
SpacemiT X60
xor s0, s0, s0
.fill 1, 4, 0xe0815407

Even by the low standards of assembler, this is rather esoteric. So let's unpack this step-by-step.

SPOILER ALERT: if you did not read the GhostWrite article, those instructions are invalid/reserved, but we can still try to decode them.

.fill

.fill is a pseudo-operation of the GNU assembler which repeats a pattern of a given byte size a given number of times. Here the pattern is repeated only 1 time and adds up to 4 bytes. On RISC-V, instructions are 4 bytes by default, and since there is no actual repeatition, so the .insn pseudo-operation would have been simpler, e.g.:

.insn 0x20b00087

Thus now we have to actually make sense of those two hexadecimal values 0x20b00087 and 0xe0815407, which are the actual machine instructions.

Opcode 0x07

According to specifications, RISC-V instructions terminate with a 7-bit opcode. More precisely, the bottom 2 bits are always ones with other values reserved for compression, and the 5 bits above constitute the true opcode.

Either way, the bottom 7 bits are 0b0000111 (7) in both cases. If you look it up in the main RISC-V Unprivileged ISA specification, you will find that that is the opcode for "LOAD-FP" for floating point register load instructions.

LOAD-FP instruction format

If you already read the research paper, you can probably guess that the instruction is in fact not an FP load. But if we pretend for a minute that we do not know that yet, then we will need to check the floating point instruction format in the same ISA specification.

Bit fields 31-2524-2019-15 14-1211-76-0
I-type imm[11:0]rs1 funct3rd opcode
F extension offset[11:0]base widthdestLOAD-FP
C908 HFC 001000001011 00000000000010000111
X60 HFC 111000001000 00010101010000000111

We already identified the opcode, so the next step is to decode the 3-bit funct3 field. In these cases, it is called the width field: it indicates the floating point data type coded as the binary logarithm of the type's byte size:

0b001 (1)
half precision (16-bit), requiring the Zfh extension
0b010 (2)
single precision (32-bit), requiring the F extension
0b011 (3)
double precision (64-bit), requiring the D extension
0b100 (4)
quadruple precision (128-bit), requiring the Q extension

It was a foregone conclusion, but width values 0b000 and 0b101 are not actually defined in the floating point extensions but in the Vector (V) extension instead. They nominally indicate the vector element size:

0b000 (0)
vector of 8-bit elements
0b101 (5)
vector of 16-bit elements
0b110 (6)
vector of 32-bit elements
0b111 (7)
vector of 64-bit elements

Note that these four values only are only assigned if bit 28 is zero. Otherwise they are reserved for future extensions (which could be either scalar or vector).

Vector load instruction formats

The vector load instruction format breaks the 12-bit immediate bit-field into many small parts:

Bit fields 31-292827-26 2524-2019-15 14-1211-76-0
Vector Load nfmew mopvm lumop
rs2
vs2
rs1 widthvd opcode
C908 HFC 001000001011 00000000000010000111
X60 HFC 111000001000 00010101010000000111

Depending on the mop field value, bits 24-20 can indicate either an additional code lumop, a second source general-purpose register rs2, or a source vector register vs2 operand. In the later two cases, the instruction format is technically the S-type rather than the I-type. However today we are concerned with mop value 0b00 for unit-stride loads, so we have lumop's rather than a second source operands.

More specifically, we now have two different cases, one for each core model:

X60: vector whole register load

In the X60, we have a whole register load instruction: vlNreW.v. This is a somewhat exotic instruction category intended to restore vector registers in context switching code or with custom ABIs.

In fact, it is so exotic that the Linux kernel even fails to use it where it should in its context switching code, preferring the usual unit-stride instructions for that purpose.

Accounting for all the remaining fields, we have:

Conclusion: the instruction sequence is:

xor s0, s0, s0
vl8re16.v v8, (sp), v0.t

...or at least it would be if that was a legal sequence, but whole register loads cannot be masked. That is to say VM must equal 1.

Indeed the closest valid instruction sequence is:

xor s0, s0, s0
vl8re16.v v8, (sp)

...where the second instruction assembles as 0xE2815407 rather than 0xE0815407.

C908: vector mask load

On the slightly older C908 processor, we have a mask load instruction: vlm.v. In this case, even if the instruction is completely nonsensical, as RS1=0 means that the source address is the zero register. In other words, the instruction loads data from a NULL pointer.

Leaving that aside, we again see VM=0 when the instruction requires VM=1, making the instruction not only nonsensical but plainly invalid. Similarly, NF=1 is invalid, as vector load mask instructions cannot be segmented (NF must equal 0). If we ignore all of those considerations, the imaginary instruction could be something like:

vlseg2m.v v1, (zero), v0.t

The closest valid instruction that I could find is:

vlm.v v1, (zero)

...which assembles as 0x02b00087 instead of 0x20b00087, though even that nominally valid instruction makes no practical sense on account of using zero as source address.

Notes

The exact 32-bit instructions are straight from the researchers' artefacts. It is likely that other similar 32-bit instruction would also halt, but I did not test this hypothesis.

Also this should be obvious, but I must emphasise that: These instructions constitute bugs in two specific processor proprietary core designs.