“To let anyone upload a video to show anyone else in the world, for free takes a lot of processing power.”
The pandemic turned out to be a blessing in disguise for video streaming services. In the first quarter of last year, YouTube alone witnessed a 25 percent increase in watch-time. And for the first half of last year, total daily live streams grew by 45 percent.
Running a global platform like YouTube with massive amounts of video being uploaded, stored, and distributed every second for its millions of creators and billions of viewers is a complex and demanding task.
Youtube videos are created and uploaded in a single format but consumed on different devices at different resolutions. The infrastructure team’s job is to get those videos ready for you to watch through a process called transcoding.
Transcoding is the process of compressing videos so that the smallest amount of data is sent to the device with the best possible quality. “But it’s costly and slow, and doing that processing using regular CPUs is pretty inefficient, especially as you add more and more videos,” said Jeff Calow, a software engineer at Youtube.
- Handling and scaling different output resolutions and formats.
- Handling complex algorithmic trade-offs and quality/compression/computing compromises.
- Providing inter and intra-task parallelism.
- Providing high performance at low costs.
- Enabling ease of deployment when operating at scale.
Image credits: Youtube blog
Looking beyond GPUs and TPUs
Video Coding Unit(VCU)
YouTube’s video processing platform currently supports multiple video-centric workloads at Google. YouTube also has to deal with hundreds of hours of video uploads every minute apart from Google Photos and Google Drive, which demand similar bandwidth.
The team at YouTube created a new system for transcoding video more efficiently at data centres. They developed a custom chip to transcode video, and a software to run it– the Video (trans)Coding Unit (VCU). “We’ve seen up to 20-33x improvements in compute efficiency compared to our previous optimized system, which was running software on traditional servers,” said Calow.
“VCUs resulted in upto 20-33x improvements in compute efficiency.”
VCUs are built ground-up for data-centre-scale video workloads. At this scale, deployment becomes a challenge. So the engineers at YouTube designed accelerators for userspace software control and addressed the hardware failures through redundancy and fallback at higher-level software layers. Not only that, but the engineers also have to account for the constant updating of the applications. Viewers rarely watch those pixelated 360p videos. Today, the internet is faster and cheaper. Youtube users no longer hesitate to click those 4k resolution videos. For Youtube, an affordable internet is a great business opportunity but also an engineering nightmare.VCUs enable programmability and interoperability while closely monitoring the computationally expensive infrequently-changing aspects of the system.
VCU System Design (Source: Ranganathan et al.,)
According to the team behind VCUs, the software and hardware were loosely coupled to facilitate parallel development pre-silicon and continuous iteration post-silicon. VCU has 3,000 millidecode cores and 10,000 milliencode cores available. The codec cores in the VCU are programmed as opaque memories by the on-chip management firmware. The loose coupling allows userspace software to adjust the flow of frames through codecs and changing codec modes without requiring other system changes.
VCU hardware design is a combination of Mentor Graphics’ Catapult tool and an in-house integration tool called Taffel. The encoder core design is implemented using a C++ based HLS design flow for faster development and design iteration.
VCU vs CPU, GPU (Source: Paper by Ranganathan et al.,)
Video processing has quickly become a data centre workload headache. So, VCUs or their variants are inevitable. Though some GPUs support transcoding, they fall short when it comes to video-sharing workloads. The work on VCUs is the first of its kind on broadcast-quality video acceleration. The engineers also have explored the design trade-offs for commercial production workloads serving hundreds of hours of uploads per minute and discussed co-design trade-offs with a production video processing software stack and deployment at scale.
As the world slowly drifts towards a predominantly virtual setup, live streaming, virtual conferencing, cloud gaming, vlogging, AR/VR footage will become more prominent. According to the YouTube engineers, Video Coding Unit is just the beginning. They believe that rich opportunities for future innovation await in the form of combining transcoding with other machine-learning on videos, such as automatic caption generation and more.