dav1d
is an open-source AV1 video decoder that aims for
the highest possible performance. The primary goal of this GSoC project
was to optimize dav1d
’s [1] core video processing functions by
implementing hand-written assembly using the RISC-V Vector (RVV)
Extension. The objective was to maximize performance to enable
smooth playback of high-definition AV1 video on low-power RISC-V
devices, thereby demonstrating and enhancing the multimedia capabilities
of the RISC-V ecosystem.
During the project, I performed the following key activities:
perf
tool, I analyzed dav1d
’s performance to identify bottlenecks. The analysis confirmed that functions like prep_8tap
and put_8tap
were responsible for the most significant computational load.w_mask
functions (444
,
422
, 420
). By applying various techniques such
as loop unrolling and dynamic LMUL selection based on VLEN, I achieved a
performance increase of up to 16x on Spacemit K1 and
up to 9x on K230.emu_edge
function with RVV, resulting in a performance
increase of up to 5x, depending on the input
values.The submitted Merge Requests have successfully accelerated key
dav1d
functions using RVV. The optimized code has passed
all checkasm
and argon
conformance tests,
ensuring its stability. These changes show significant performance gains
across various block widths and on different hardware with VLEN=128 and
VLEN=256.
The following are the main Merge Requests I worked on and submitted during this GSoC period. You can find detailed code changes, benchmark results, and the review process at each link.
While significant progress was made during GSoC, the RISC-V
optimization for dav1d
is not yet complete. I plan to
remain active in the community after GSoC and will continue contributing
by addressing the following tasks:
prep_8tap
and
put_8tap
: The next goal is to implement RVV
assembly for these two functions, which were identified as the biggest
bottlenecks by perf
.w_mask
for cases where w*h < 64
to process
it in a single pass.Through this project, I gained a deep understanding of RISC-V vector assembly and experienced solving complex performance issues on real hardware. The initial process of understanding RVV was very challenging, but I finally had a breakthrough during a 15-hour flight to Bulgaria, where I could focus intensely on the documentation.
This entire journey would have been impossible without my excellent mentors, Nathan and Luca, and the helpful members of the community. I would like to express my sincere gratitude to everyone who helped me.
The following resources were extremely helpful throughout the project:
^ | [1] | dav1d |
^ | [2] | crossdev-stages |