GSoC 2025 Final Report: RVV Optimization for the dav1d AV1 Decoder
Contents
Project Overview and Goals
dav1d1 is an open-source AV1 video decoder that aims for the highest possible performance. The primary goal of this GSoC project was to optimize dav1d's core video processing functions by implementing hand-written assembly using the RISC-V Vector (RVV) Extension. The objective was to maximize performance to enable smooth playback of high-definition AV1 video on low-power RISC-V devices, thereby demonstrating and enhancing the multimedia capabilities of the RISC-V ecosystem.
Key Activities and Achievements
During the project, I performed the following key activities:
- Building a RISC-V Gentoo Development Environment: To facilitate this project, I first built and stabilized a cutting-edge Gentoo Linux development image. This involved using the crossdev-stages2 scripts to cross-compile the entire system for RISC-V with full RVV support, a process during which I identified, debugged, and contributed fixes for numerous upstream bugs in core packages like GCC, crossdev, and Perl
- Performance Analysis and Bottleneck Identification: Using the
perftool, I analyzeddav1d's performance to identify bottlenecks. The analysis confirmed that functions likeprep_8tapandput_8tapwere responsible for the most significant computational load. - C-based Code Optimization:
- w_mask C Code Improvement (MR !1804): While implementing the RVV version, I identified an area in the existing C code that could be optimized. By simply pre-calculating and storing frequently used values in variables, I achieved a meaningful ~7% performance improvement on an x86_64 CPU.
- RVV Assembly Optimization: Based on the analysis, I implemented and contributed RVV assembly optimizations for the following core functions. All code was thoroughly tested on Spacemit K1 (VLEN=256) and K230 (VLEN=128) hardware.
- w_mask RVV Implementation (MR !1797): I implemented RVV assembly for three
w_maskfunctions (444,422,420). By applying various techniques such as loop unrolling and dynamic LMUL selection based on VLEN, I achieved a performance increase of up to 16x on Spacemit K1 and up to 9x on K230. - emu_edge (MR !1808): I optimized the
emu_edgefunction with RVV, resulting in a performance increase of up to 5x, depending on the input values.
- w_mask RVV Implementation (MR !1797): I implemented RVV assembly for three
Current Project Status
The submitted Merge Requests have successfully accelerated key dav1d functions using RVV. The optimized code has passed all checkasm and argon conformance tests, ensuring its stability. These changes show significant performance gains across various block widths and on different hardware with VLEN=128 and VLEN=256.
Code Contributions & Merge Requests
The following are the main Merge Requests I worked on and submitted during this GSoC period. You can find detailed code changes, benchmark results, and the review process at each link.
- mc: 8bpc rvv w_mask (v1) (!1797): https://code.videolan.org/videolan/dav1d/-/merge_requests/1797
- mc: 8bpc c w_mask (!1804): https://code.videolan.org/videolan/dav1d/-/merge_requests/1804
- mc: 8bpc rvv emu_edge (!1808): https://code.videolan.org/videolan/dav1d/-/merge_requests/1808
Future Work
While significant progress was made during GSoC, the RISC-V optimization for dav1d is not yet complete. I plan to remain active in the community after GSoC and will continue contributing by addressing the following tasks:
- Optimize
prep_8tapandput_8tap: The next goal is to implement RVV assembly for these two functions, which were identified as the biggest bottlenecks byperf. - Further Optimization: I plan to apply further optimizations, such as eliminating the height-based loop in
w_maskfor cases wherew*h < 64to process it in a single pass.
Challenges and Key Learnings
Through this project, I gained a deep understanding of RISC-V vector assembly and experienced solving complex performance issues on real hardware. The initial process of understanding RVV was very challenging, but I finally had a breakthrough during a 15-hour flight to Bulgaria, where I could focus intensely on the documentation.
This entire journey would have been impossible without my excellent mentors, Nathan and Luca, and the helpful members of the community. I would like to express my sincere gratitude to everyone who helped me.
Resources
The following resources were extremely helpful throughout the project:
- RISC-V Specifications:
- RISE Optimization Guide:
- Especially helpful for understanding RVV: