A 3 trillion chip company is also struggling to survive?

A 3 trillion chip company is also struggling to survive?

NVIDIA is planning to release a special AI chip for the Chinese market again.

According to the latest report from Reuters, informed sources say that Intel is developing a new flagship AI chip for the Chinese market, which meets the current requirements of US export controls, adding another member to the special supply chip lineup of NVIDIA in China.

It is worth mentioning that NVIDIA released the "Blackwell" series in March this year, which is expected to be mass-produced later this year. According to NVIDIA, the B200 is 30 times faster than its predecessor in some tasks, making it one of the top AI chips currently available.

This new special flagship chip is also related to the B200. Sources say that NVIDIA will cooperate with Inspur Group, one of its main distribution partners in China, to launch and distribute this chip, tentatively named "B20". Judging from the naming, it may have some features of the B200.

Advertisement

Including this B20, in just over a year, NVIDIA has introduced seven or eight special supply chips to the Chinese market.

A800 and H800

On October 7, 2022, the US government announced a series of export control measures, including cutting off the supply of some semiconductor chips and chip manufacturing equipment to China.

In addition to affecting production equipment such as lithography machines, it also restricts China's access to high-performance, artificial intelligence chips under advanced processes. This includes prohibiting US companies such as NVIDIA and AMD from selling such chips to China, as well as restricting Chinese AI chip companies from tape-out at overseas fabs under US technology.

Under this export restriction, both NVIDIA and AMD have been affected.NVIDIA stated that the ban affected its A100 and H100 chips, which are designed to accelerate machine learning tasks, and may hinder the development and completion of the flagship chip H100, which is scheduled to be released in 2022. It pointed out that the sales of the affected chips in China for that quarter had already reached $400 million. If Chinese companies decide not to purchase NVIDIA's alternative products, that money would be lost in vain.

So, how does the U.S. export restriction specifically limit NVIDIA's chips?

According to the export restriction rules for advanced computing integrated circuits (ECCN 3A090 and 4A090) in the U.S. Department of Commerce document dated October 7, 2022, the controlled items list must meet the following conditions:

a. All integrated circuits with a total bidirectional transfer rate of 600 GB/s or programmable to 600 GB/s or more, except for volatile memory, and any of the following types of integrated circuits:

a.1. One or more digital processing units that execute machine instructions, with the product of the bit length per operation and the processing performance in TOPS for all processor units totaling 4800 or more;

a.2. One or more digital "raw computing units" (excluding units that assist in executing machine instructions related to the TOPS in 3A090.a.1), with the product of the bit length per operation and the total TOPS processing performance of all computing units totaling 4800 or more;

a.3. One or more analog, multi-valued, or multi-level "raw computing units," with the processing performance calculated as TOPS multiplied by 8, and the total of all computing units reaching or exceeding 4800;

a.4. Any combination of digital processing units and "raw computing units," with the sum calculated according to 3A090.a.1, 3A090.a.2, and 3A090.a.3 reaching 4800 or more.

The integrated circuits mentioned in 3A090.a include graphics processing units (GPUs), tensor processing units (TPUs), neural processors, memory processors, vision processors, text processors, coprocessors/accelerators, adaptive processors, field-programmable logic devices (FPLDs), and application-specific integrated circuits (ASICs).It is not difficult to see that the most important factor is the limitation on the interconnect speed of chips. According to this regulation, Nvidia's then best-selling A100 precisely fell within the restricted range, with a chip-to-chip transmission rate of 600GB/s. To some extent, it is possible that the U.S. Department of Commerce designated this restriction based on the A100.

In response to export controls, Nvidia quickly created a substitute for the A100 - the A800, with the U.S. ban officially announced on October 7, 2022. Just one month later, Nvidia introduced the A800, which complies with the new regulations, a modern version of adapting to policies.

According to the specifications, the NVIDIA A800 will use the same chip architecture as the Ampere A100 GPU. It will offer three versions, two PCIe versions with 40 GB and 80 GB, and an 80 GB SXM version. These GPUs will provide up to 9.7 TFLOP of FP64, 19.5 TFLOP of FP64 Tensor Core, 19.5 TFLOP of FP32, 156 TFLOP (with a sparsity of 312 TFLOP) of TF32, 312 TFLOP (with a sparsity of 624 TFLOP) of BFLOAT16, and 624 TOPS (with a sparsity of 1248 TOP) of INT8 performance. The 40 GB version has HBM2 memory with a bandwidth of up to 1.555 TB/s, while the 80 GB version has HBM2e memory with a bandwidth of up to 2 TB/s.

Of course, to meet the restriction requirements, the bandwidth inevitably suffered a cut from the original 600GB/s to 400GB/s. A Nvidia spokesperson said in a statement to Reuters: "The A800 GPU was put into production in the third quarter and is an alternative product for Chinese customers to replace the A100 GPU. The A800 complies with the U.S. government's clear test for reducing export controls and cannot be programmed to exceed this standard."

CCS Insight analyst Wayne Lam commented: "The A800 appears to be a repackaged A100 GPU, designed to evade recent trade restrictions by the Department of Commerce." He also noted that the number 8 is a lucky number in China.

"China is an important market for Nvidia, and reconfiguring products to avoid trade restrictions makes full sense from a business perspective," Lam said, adding that for data centers using thousands of chips, the chip-to-chip communication capability of the A800 has significantly decreased.

Subsequently, Nvidia followed the same approach with the H100, creating the H800. On the A100, Nvidia reduced the GPU's 600 GB/s interconnect to 400 GB/s, and it took the same approach with the H100. It is revealed that the chip interconnect rate of the H800 is reduced to about half of the H100, that is, from 800 GB/s to 400 GB/s. Compared with the A800, the impact on the performance of the H800 is greater, after all, the former only reduced by 33%, while the latter reduced by a full 50%.

At that time, a Nvidia spokesperson refused to disclose the differences between the H800 and H100 for the Chinese market, only saying, "Our 800 series products fully comply with export control regulations."

While overseas manufacturers were frantically purchasing A100 and H100, domestic manufacturers could only choose the lower-configured H800 and A800. To some extent, Nvidia's special supply chips have restricted the development of domestic AI large models.H20 and RTX 4090D

For domestic companies, there are pros and cons to the A800 and H800. The downside is that after a reduction in interconnect bandwidth, the performance of these two chips has been somewhat compromised, and the training speed has slowed down significantly. The upside is that both chips can be ordered through the Zhenxing channel, albeit at a higher cost than foreign companies.

However, the A800 and H800 did not last a year. On October 17, 2023, the U.S. Department of Commerce issued new control rules, supplementing and updating the export controls for advanced computing integrated circuits, semiconductor manufacturing equipment, and items supporting supercomputing applications and end uses, which were initially released on October 7, 2022.

The most significant part is the change in control parameters. The interim final rule removed "interconnect bandwidth" as a parameter for identifying restricted chips under ECCN 3A090, and instead, exports will be restricted if the chip exceeds one of the two parameters (3A090.a and 3A090.b) specified in ECCN 3A090.

According to the U.S. Department of Commerce's document, the revised 3A090.a control parameter will regulate integrated circuits with one or more digital processing units, and these units' "total processing performance" must be 4800 or above, or "total processing performance" of 1600 or above with a "performance density" of 5.92 or above. The new ECCN 3A090.b will regulate integrated circuits with one or more digital processing units that must meet one of the following conditions: "total processing performance" of 2400 or 2400 and above but less than 4800, "performance density" of 1.6 or 1.6 and above but less than 5.92, or "total processing performance" of 1600 or 1600 and above, "performance density" of 3.2 or 3.2 and above but less than 5.92.

In addition, the rules also set a licensing exception, creating a new "License Exception for Advanced Computing" for consumer ICs with AI functions below the threshold. This exception applies to two types of products: one is chips designed or sold for data centers, and the other is chips not designed or sold for data center use, with a "total processing performance" of 4800 or higher.

Compared to the rules on October 7, 2022, the new rules have once again expanded the scope of control. Under the dual rules of total processing performance and performance density, both uncut and cut chips have been included in the export control range. The A800 and H800 are the first to be affected, and other Nvidia products have also been impacted, with the L40 and L40S targeting the inference market, as well as the consumer field RTX 4090, being included in the ban on sales.

This is a heavy blow to Nvidia, meaning that the mainstream products of Nvidia at the time could not be sold in China due to export controls. It is important to note that in previous years, data center business revenue from the Chinese market accounted for about 20% to 25% of Nvidia's total revenue. In the fourth quarter of the 2024 fiscal year, due to an export control, the revenue from the Chinese market plummeted to a single digit.

Helpless, Nvidia can only raise the axe again.On November 16, 2023, a month after the new rules were announced, Nvidia launched its special GPU chips for the Chinese market - H20, L20, and L2. The H20 is based on Nvidia's Hopper architecture, while the L20 and L2 are based on the Ada architecture.

The L20 and L2 chips are respectively adjusted based on the L40 and L4. Since they are old architectures and are not commonly used in inference and training, they have not attracted much attention. However, the H20 is quite interesting. Although it no longer has a speed limit due to the new rules, it has achieved a full-blooded NVLink of 900GB/s, but its performance has to be significantly discounted. According to analyst Dylan Petal, even if the actual utilization rate of H20 can reach 90%, its performance in the actual multi-card interconnection environment can only be close to 50% of H100.

In response to the consumer market's RTX 4090, Nvidia also launched a substitute in December last year - the RTX 4090D. This special chip, which complies with US export control, has been castrated in terms of CUDA cores and power consumption. The CUDA cores have been reduced by 12.8%, and the power consumption has been reduced from 450W to 425W, a decrease of 5.9%, while all other core specifications remain unchanged.

Due to a slight increase in frequency, the performance of the 4090D is only about 5% lower than the 4090 in some benchmark tests. Compared to AI chips, such a gap seems to be within an acceptable range.

At the end of 2023, these four special chips have somewhat alleviated the embarrassment of Nvidia in China, preventing it from being in a situation where there is no goods to sell. However, after two rounds of restrictions, large enterprises and small and medium-sized companies have started to seek other ways, either to buy domestic chips, to set up servers overseas, or to buy H100/200 and A100 through unofficial channels, which is spreading helplessness among domestic manufacturers.

Old Huang's knife technique

DIY players familiar with gaming GPUs will not be too unfamiliar with Old Huang's knife technique.

Taking a more recent example, one year after the release of the RTX 20 series, in order to better distinguish the product line and cope with the new product launch of AMD's RX5000 series next door, Nvidia launched the RTX 20 Super series.Although both are based on the TU106 and TU104 cores, NVIDIA has carved out five graphics cards from these two cores, namely the RTX 2060, RTX 2060 Super, RTX 2070, RTX 2070 Super, and RTX 2080. Among them, the smallest gap is between the RTX 2060 Super and RTX 2070, which share the same TU106 core. The theoretical performance gap between the two is only about 5%, and the actual scores and game tests are also very close. It can be said that NVIDIA has played the art of cutting to the extreme.

Now, the special version of the chip that NVIDIA has taken out after some modifications is just a repetition of the past, and a return to the old business.

In addition to the B20 mentioned at the beginning, NVIDIA also plans to show its skills in the consumer market. According to the exposure, the shrunken version of the RTX 5090, the RTX 5090D, is expected to be launched in January 2025. It is expected to be based on NVIDIA's Blackwell architecture and use TSMC's 4NP process, or there may be some reduction in core specifications to avoid U.S. export restrictions.

Including these two rumored chips, NVIDIA already has a large special supply lineup in China: A800, H800, H20, L20, L2, RTX 4090D, B20, RTX 5090D.

Some people are quite optimistic about the prospects of these special version chips. Research institution SemiAnalysis estimates that NVIDIA is expected to sell more than 1 million H20 chips in China this year, with a value of more than 12 billion U.S. dollars.

However, NVIDIA still has many things to worry about. According to a report by Jeffries analyst, when the United States conducts an annual review of semiconductor export controls in October, it is "very likely" to prohibit NVIDIA's H20 chips from being sold to China. The analyst said that the ban may be implemented through "specific product bans, reducing the upper limit of computing power, and/or limiting memory capacity."

In addition, compliant cards artificially created like H20 are essentially based on the cutting of existing cores, and B20 is also the same. The core that could originally be used for H200 and B200 is now only available for selling cheaper special versions, and the sales life is likely to be only more than a year. It seems to be a loss-making deal.

But NVIDIA has no way out. The balance of rules and the market, it can only do its best to balance, but how many enterprises in China are willing to pay for the special version chips that have been cut and cut again?

Comments