Home > Industry News > Detail

Internet giants are eyeing this chip

Date:2022-08-09 10:31:26    Views:780

Over the past few years, driven by demand, Internet core manufacturing has long been a household name. Especially with the boom in cloud computing, data centers and artificial intelligence in the past few years, the world's leading Internet companies seem to be going the same way, towards self-research of chips such as AI chips, CPUs and DPUs. At the same time, they will also have targeted different chip matrices according to their respective businesses.

While we are still amazed at the rapid development of the Internet core in recent years, Internet companies such as Google, Meta, Byte Jump and Tencent have invariably focused on a chip: that is, the video processing chip VPU (Video Processing Unit).

Google, Tencent, Byte and Facebook have started their own research

In April 2021, Google released its own Argos VCU (VCU is Google's name.) Argos has 10 cores for video processing, which are placed under a rather large heat sink, with two chips on each board. Google claims it can increase computing efficiency by 20 to 33 times, and what used to take days to process 4K video now takes only hours. Argos' development has successfully replaced up to tens of millions of Intel CPUs, saving more than 20 billion RMB in capital investment scale for CPUs alone. In the process of building this chip, Google even created their own EDA tool called Taffel.


Google Argos VCU

We are entering an era of booming audio and video, as evidenced by the proliferation of video users, the huge amount of video generated, and the increasing difficulty of video being compressed and processed. There are many different video standards and codecs that have evolved since 2003 (as shown below), and if the codec is more efficient in compressing the video, then the final file size is smaller and the stream is smaller.


Image source: Google's presentation at Hot Chips 33

Google's Argos chip can power its video encoder using VP9, which is 40% more efficient at compressing video compared to the previous generation of H.264. VP9 is a more sophisticated video codec that allows video files to become smaller and maintain the same picture quality, and it can also store video of the same size but higher quality. VP9 allows Google to save significant bandwidth that AV1 is a higher level of video encoding that will improve by another 30-40% over VP9. Higher levels of compression typically require more computation.

According to SemiAnalysis' sources, the next generation of Argos is already in development. It will be able to implement the AV1 format, which is difficult to support on CPUs or GPUs, and will enable further storage and bandwidth savings. In addition, they plan to start adding machine learning inference hardware to the new chip. Finally, they will also add networking to the add-on card itself to increase efficiency and reduce communication with the host CPU. This will allow them to automatically generate video captions, check for terms of service violations, and even allow video search to be enabled on YouTube and Google Photos.

In June this year, Tencent Cloud published "Tencent's Core Matters", from which we learned that Tencent's self-developed video transcoding chip, "Canghai", was streamed back and lit up on March 5, 2022. This is Tencent's third chip, and the first chip developed completely independently. The goal of Tencent's Canghai team is to make one of the strongest video transcoding chips in the industry, bringing the compression rate to the extreme. Using a 12nm process, the Canghai chip achieves the same quality of video with less data and less bandwidth, with a compression rate that is more than 30% higher compared to the industry's best performance.


Tencent Canghai lit

Byte Jump's core-making has recently set off another wave of attention. On July 20, Yang Zhen Yuan, vice president of Byte Jump, confirmed in an interview with the media at the "2022 Volcano Engine Original Power Conference" that Byte Jump is conducting its own chip research, mainly for its own video recommendation business. It is mainly used for its own video recommendation business. The R & D team will customize hardware optimization for ByteBeat's large-scale video recommendation service dedicated scenarios, such as video coding and decoding, cloud-based reasoning acceleration, in order to improve performance and reduce costs.

In addition to Byte, another domestic video giant, Racer, also has a layout on related video chip products. According to my understanding, their relevant chips will already be tablets, and perhaps should see more information disclosed.

In addition, Facebook's parent company Meta is also seeking to "control key technologies and reduce reliance on existing chip suppliers. It is reported that it is also developing custom server chips, one of which is an AI inference chip mainly used for recommendation algorithms, etc.; the other is mainly for video transcoding tasks to improve the quality of Facebook users to watch recorded and live video. And Facebook has also hired a senior network chip engineer from Intel, Jon Dama, to lead the chip design efforts of the Internet giant's infrastructure hardware engineering group.

CPUs and GPUs are no longer economical. VPU may shine

Nowadays, as the Internet content is constantly updated and iterated, video streaming media has begun to replace text and pictures, and live, on-demand, short video and other video applications are "eroding" people of every age, and video streaming media accounts for about 80% of the Internet traffic, such as Youtube in foreign countries, Jitterbug, Racer and other short Video. The web has gone decentralized in terms of content, with users uploading over 700 hours of YouTube videos per minute to Youtube, as well as ShakeYin, Racer and Tencent WeChat. Consumers are spending more of their time on user-generated content.

The work to be done in this process is increasingly complex, and the resolution, quality and bandwidth consumption of the videos directly determine the stickiness of users. A big reason why Jitterbug has been one of the winners in the short-form video space for several years is its ability to customize the push for each individual, with a powerful recommendation mechanism behind it. Users' pursuit of ultra-high definition video (4K/8K) is getting higher and higher, but it also brings higher demand for codec arithmetic and CDN bandwidth costs.

For years, Intel's CPU+software video decoding/encoding solutions have dominated the streaming market, but as the demand for high-quality video streaming continues to grow, CPUs will no longer make economic sense and will consume too much power and space. GPUs, while having a slightly better TCO (total cost of ownership), have the disadvantage of lower utilization and less workload flexibility. Using GPUs can be a complex and confusing affair for some applications running a driver stack, with various versions of Linux or Windows not working properly, and such software issues have hindered the development of GPU solutions from Intel, Nvidia and others, such as Intel's cancelled Xe HP tile GPU architecture. Intel's Xe-HP computing GPU is the first high-performance discrete GPU the company has launched in years, and the first discrete Xe GPU Intel has shown to the public.


Intel's Xe-HP computing GPU

It was clear that both CPUs and GPUs were no longer suitable for handling huge amounts of video business, so a dedicated video processing chip like the VPU was born. In a sense, VPU is more flexible than other encoding methods.


Image source: Semianalysis

VPU is a video gas pedal specifically designed for video scenarios in combination with AI technology. It has a built-in dedicated video encoding acceleration function module with high performance, low power consumption and low latency, which can bring high performance acceleration computing to video industry applications.


Source: Semianalysis

Generally speaking, ASICs need to provide an order of magnitude better capability in their target workloads to be recognized by the industry. According to SemiAnalysis' analysis of Rongming Microelectronics (NETINT), a domestic VPU chip startup, the density and power consumption of VPU is unmatched by CPUs and GPUs. The chart below shows that using HEVC codecs, Rongming's VPU crushes Nvidia's previous generation T4 (with newer Ampere-based GPUs) and Intel's Skylake/Cascade Lake servers. Their designed Codensity series VPU chips have been massively deployed in over 90% of China's top tier Internet and video content customers, and are widely used in a large number of overseas customers such as Microsoft and IBM, and they are also launching the world's first chip-level solution for the world that supports AV1 encoding capability.


Rongming Microelectronics VPU products
(Source: Rongming Microelectronics)

In addition, according to related reports, a company called Surge Technology also has a presence in this area. The company said it provides Seirios video codec acceleration solution, and the core ASIC video codec chip is an advanced process chip developed by the research and development team of Surge Technology. By installing it on the video processing server that performs encoding and transcoding, the processing performance can be improved without changing the server configuration. Lighten the multimedia processing burden of data center servers and reduce overall power consumption and cost.

From the benefits gained by Google's self-developed VPU, we can also see why Internet vendors have been pushing for VPU as a chip: on the one hand, the Internet is the most important place for TCO (total cost of ownership), and using VPU will greatly reduce the amount of CPU used; on the other hand, being able to build lower power consumption and faster chips according to their needs will also strengthen their strategic advantage. Another favorable condition is that they, the Internet vendors, have their own video products, rich multimedia application scenarios, and many live interactive head customers covered by the cloud, which will provide them with unique analysis and verification conditions for their R&D. Moreover, the Internet giants are bullish on this track, which is enough to see the broad prospects of this market of VPU.

Write in the end

Since VPU chip is a product with high requirements for scene processing technology, it seems that not many chip suppliers are currently majoring in ASIC VPU. On the whole, only a few vendors have really achieved large-scale practical applications, and there are still 2-3 years to go before the Internet vendors' self-researched products can really be landed for practical applications.

China's various video applications have been at the forefront of the times, while there is a huge user base, more than that, VPU market application scenarios are very much, with 5G, mobile video, cloud games, cloud desktop, VR/AR, meta-universe and other industries of high-speed expansion, the market demand for dedicated video processing chips show explosive growth, dedicated to video processing ASIC chips or usher in a long cycle of Blue ocean market.

Some studies have analyzed that it is expected that the VPU market size may reach 100 billion dollars in the next few years. From CPU to GPU and then to DPU, an era of VPU seems to be coming quietly now, and it is expected that more players will enter this market in the future.


  • STEP 1

    Enter Electronic Component part number below.

  • STEP 2

    Click the button below.It's that easy.

  • Contact name/company*
  • Email address*
  • Telephone number*
  • Part number and quantity and target price