RISC-V Vector extension draft

My newlast year's workhobby of writing optimised FFmpeg (and VLC) functions for the RISC-V Vector extension has been hampered by two factors. The first factor is, well, that it is just a hobby, and one of many at that. The other factor is the lack of available hardware, with which to run tests and benchmarks.


In all due fairness, the RISC-V V specification was ratified less than 2 years prior to my writing this piece, and just about 6 months prior to starting the aforementioned RISC-V porting activities. Releasing hardware is much slower than software, and nobody really expected any functioning commercial hardware less than 18 months after specification, at the very best. For comparison, more than 5 years passed from the publication of the final specification for ARM's Scalable Vector Extension to general availability of ARMv9-A processors in early 2022 (in the form of very expensive high-end mobile phones).

Nevertheless, Alibaba's subsidiary T-Head did release a processor design within months of the ratification of RISC-V V, which would be followed by several vastly improved designs since then. But there was and, at the time of writing, still is a big catch: The processors implement a pre-ratification draft version 0.7.1 of the vector extension, which is binary incompatible with ratified version 1.0.

This got several people wondering exactly how incompatible they are. However I have not been able to locate a summary of the incompatibilities. So here is a report from my modest attempt at figuring those out.


But first, a big disclaimer is necessary: I do not currently have any hardware with Vector support, whether per a draft of the specification, or per the ratified specification. This document is in no way official or authoritative, and comes with no warranty whatsoever. The rest of this article is based exclusively on reading and comparing different versions of the specification and I did not experimentally confirm or infirm any information therein.

Moreover, there are over 700 tracked changes between versions 0.7.1 and 1.0 of the specification. I did not review every single one of them in precise detail, and also will not do so on my free time. (If you want/need such a task performed, hire an engineer.)

Note for the future

This documented was prepared and edited in spring 2023. Already by then, more than a year had passed since the beginning of the RISC-V V draft implementation so-called controversy. Eventually, hardware conforming to the ratified specification should become broadly available.

Changed instructions

But with that exceedingly long foreword, lets start with the longest set of changes, which is to say changes to instruction opcodes.

Added instructions

These opcodes were added in the final standard which were missing in the draft:

Integer (sign/zero) extension
vsext.vf2 vsext.vf4 vsext.vf8
vzext.vf2 vzext.vf4 vzext.vf8
Integer averaging add/subtract
vaaddu.vx vaaddu.vx
vasubu.vv vasubu.vx
Float reciprocal estimate
vfrsqrt7.v vfrec7.v
Type conversion
vfcvt.rtz.xu.f.v vfcvt.rtz.x.f.v
vfncvt.rod.f.f.w vfncvt.rtz.xu.f.w vfncvt.rtz.x.f.w
vfwcvt.rtz.xu.f.v vfwcvt.rtz.x.f.v
Vector permutation
vmv1.r vmv2.r vmv4.r vmv8.r
vfslide1down.vf vfslide1up.vf

In addition to those vector computational instructions, a third vector configuration instruction, vsetivli, was also added separately.

Those instructions would obviously not work properly on affected processors. In most if not all cases, simple substitution sequences exist. But making those substitution would invalidate any benchmark for code that advantageously features any of these instructions listed herein.

Modified instructions

The following instructions have changed encoding:

Integer multiply-add
vwmaccsu.vv vwmaccsu.vx
vwmaccus.vv vwmaccus.vx
Integer average add/subtract
vaadd.vv vaadd.vx
vasub.vv vasub.vx
Vector-scalar move
vfmv.s.f vfmv.f.s
vmv.s.x vmv.x.s
Vector mask
vfirst.m vcpop.m

These instructions can be used with version 0.7.1, But they cannot be assembled, at least not with a conforming assembler. Trickery such as assembler macros may be required.

Floating-point unaries

Furthermore two encoding groups were modified, changing all instructions inside each of those groups: VFUNARY0 and VFUNARY1. This affects the following instructions:

Float square root
Float classify
Type conversion
vfcvt.xu.f.v vfcvt.x.f.v vfcvt.f.xu.v vfcvt.f.x.v
vfncvt.xu.f.w vfncvt.x.f.w vfncvt.f.xu.w vfncvt.f.x.w vfncvt.f.f.w
vfwcvt.xu.f.v vfwcvt.x.f.v vfwcvt.f.xu.v vfwcvt.f.x.v vfwcvt.f.f.v

Removed instructions

In my opinion, nobody should really care about removed instructions that did not make the final cut in the ratified standard. If you used those instructions, your code would not be forward-compatible, and thus end up mostly useless sooner rather than later. But for the sake of completeness, here they are:

vaadd.vi vasub.vi vdot.vv vdotu.vv vext.x.v vfdot.vv vmford.vf vmford.vv vwsmaccsu.vv vwsmaccsu.vx vwsmaccus.vv vwsmaccus.vx

Not to mention

Any instruction renamed without actual change to the encoding is excluded for simplicity, since that would be mostly irrelevant. Also two sets of instructions were added and promptly removed after 0.7.1 and before 1.0: vwsll and vqmacc (and friends).

Vector configuration

While the listings above may seem daunting, the affected instructions only add up to a small chunk of the entire Vector extension. Most optimisations would not need them. And those optimisatiosn that do need them can typically find decent substitutions.

The much more severe binary compatiblity problem lies with changes in vector type (vtype) encoding.


Misc (warning)

Beware that there are other more subtle changes. For example, some immediate values have changed signedness. There are probably other issues that I don't even know of.

Final words

With tedious macros, it should be possible to recompile some, but not all, standard-targeting vector code for the draft version. For what it is worth, I would not spend my money on draft-implementing hardware only for testing and benchmarking purposes, and only for a year (give or take) until conformant hardware can probably be procured from open markets at reasonable price points. Then again, this is just my opinion as a time-constrained hobbyist; your mileage may vary, and I do not actually know of upcoming hardware release dates.

And I sure would not mind if I got my hands on (draft or not) vector-capable Linux RISC-V chip without paying for it. Cough cough.

As for writing run-time interoperable code that would run regardless of the implemented version of the vector extension, it would be rather difficult for anything but simple byte-wise algorithms such as memcpy() and memset().