Its authors have recently updated the GhostWrite academic research paper. The MMU bypass vulnerability in T-Head's C910 processor core is the highlight of the paper, and gave it its nickname.
But the updated paper also reveals a so-called HFC Halt and Catch Fire instruction in the newer SpacemiT X60 processor core. This comes on top of another HFC found in T-Head's C908 already in the first public preprint version of the article. This means both the first and second commercially available processor designs featuring the RISC-V Vector extension (RVV) are susceptible to an HFC.
Open-source multimedia RISC-V development, code review and maintenance in the FFmpeg multimedia framework and the VLC media player are currently almost at a standstill as funding dried up. I cannot realistically keep up entirely in my free time.
To put it bluntly, I need an RISC-V Vector expert to help with FFmpeg code reviews and/or a new sponsor to resume FFmpeg RISC-V work.
The HFC instructions can be found within GhostWrite artifacts on GitHub in RISC-V inline assembler:
.fill 1, 4, 0x20b00087
xor s0, s0, s0
.fill 1, 4, 0xe0815407
Even by the low standards of assembler, this is rather esoteric. So let's unpack this step-by-step.
SPOILER ALERT: if you did not read the GhostWrite article, those instructions are invalid/reserved, but we can still try to decode them.
.fill.fill is a pseudo-operation of the GNU assembler
which repeats a pattern of a given byte size a given number of times.
Here the pattern is repeated only 1 time
and adds up to 4 bytes.
On RISC-V, instructions are 4 bytes by default,
and since there is no actual repeatition, so the .insn
pseudo-operation would have been simpler, e.g.:
.insn 0x20b00087
Thus now we have to actually make sense of those two hexadecimal values
0x20b00087 and 0xe0815407,
which are the actual machine instructions.
0x07According to specifications, RISC-V instructions terminate with a 7-bit opcode. More precisely, the bottom 2 bits are always ones with other values reserved for compression, and the 5 bits above constitute the true opcode.
Either way, the bottom 7 bits are 0b0000111 (7) in both cases.
If you look it up in the main RISC-V Unprivileged ISA specification,
you will find that that is the opcode for "LOAD-FP" for floating point
register load instructions.
LOAD-FP instruction formatIf you already read the research paper, you can probably guess that the instruction is in fact not an FP load. But if we pretend for a minute that we do not know that yet, then we will need to check the floating point instruction format in the same ISA specification.
| Bit fields | 31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
|---|---|---|---|---|---|---|
| I-type | imm[11:0] | rs1 |
funct3 | rd |
opcode |
|
| F extension | offset[11:0] | base | width | dest | LOAD-FP | |
| C908 HFC | 001000001011 | 00000 | 000 | 00001 | 0000111 | |
| X60 HFC | 111000001000 | 00010 | 101 | 01000 | 0000111 | |
We already identified the opcode,
so the next step is to decode the 3-bit funct3 field.
In these cases, it is called the width field:
it indicates the floating point data type
coded as the binary logarithm of the type's byte size:
It was a foregone conclusion, but width values 0b000 and
0b101 are not actually defined in the floating point extensions
but in the Vector (V) extension instead.
They nominally indicate the vector element size:
Note that these four values only are only assigned if bit 28 is zero. Otherwise they are reserved for future extensions (which could be either scalar or vector).
The vector load instruction format breaks the 12-bit immediate bit-field into many small parts:
| Bit fields | 31-29 | 28 | 27-26 | 25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
|---|---|---|---|---|---|---|---|---|---|
| Vector Load | nf | mew |
mop | vm |
lumop | rs1 |
width | vd |
opcode |
| C908 HFC | 001 | 0 | 00 | 0 | 01011 | 00000 | 000 | 00001 | 0000111 |
| X60 HFC | 111 | 0 | 00 | 0 | 01000 | 00010 | 101 | 01000 | 0000111 |
Depending on the mop field value, bits 24-20 can indicate
either an additional code lumop,
a second source general-purpose register rs2,
or a source vector register vs2 operand.
In the later two cases, the instruction format is technically the S-type
rather than the I-type.
However today we are concerned with mop value 0b00
for unit-stride loads, so we have lumop's
rather than a second source operands.
More specifically, we now have two different cases, one for each core model:
0b01000 means
whole register load,0b01011 means mask load.In the X60, we have a whole register load instruction:
vlNreW.v.
This is a somewhat exotic instruction category intended to restore vector
registers in context switching code or with custom ABIs.
In fact, it is so exotic that the Linux kernel even fails to use it where it should in its context switching code, preferring the usual unit-stride instructions for that purpose.
Accounting for all the remaining fields, we have:
x2,
a.k.a. the stack pointer sp,v8.Conclusion: the instruction sequence is:
xor s0, s0, s0
vl8re16.v v8, (sp), v0.t
...or at least it would be if that was a legal sequence, but whole register loads cannot be masked. That is to say VM must equal 1.
Indeed the closest valid instruction sequence is:
xor s0, s0, s0
vl8re16.v v8, (sp)
...where the second instruction assembles as 0xE2815407 rather than 0xE0815407.
On the slightly older C908 processor, we have a mask load instruction:
vlm.v.
In this case, even if the instruction is completely nonsensical,
as RS1=0 means that the source address is the zero register.
In other words, the instruction loads data from a NULL pointer.
Leaving that aside, we again see VM=0 when the instruction requires VM=1, making the instruction not only nonsensical but plainly invalid. Similarly, NF=1 is invalid, as vector load mask instructions cannot be segmented (NF must equal 0). If we ignore all of those considerations, the imaginary instruction could be something like:
vlseg2m.v v1, (zero), v0.t
The closest valid instruction that I could find is:
vlm.v v1, (zero)
...which assembles as 0x02b00087
instead of 0x20b00087,
though even that nominally valid instruction makes no practical sense
on account of using zero as source address.
The exact 32-bit instructions are straight from the researchers' artefacts. It is likely that other similar 32-bit instruction would also halt, but I did not test this hypothesis.
Also this should be obvious, but I must emphasise that: These instructions constitute bugs in two specific processor proprietary core designs.