The most famous product in chip devices is the CPU. The domestic CPU industry has experienced arduous development and has attracted more and more attention. You can often see domestic CPUs in the fields of government affairs, communications, and computing. But the field of general-purpose CPU is still dominated by foreign CPUs. There are Intel and AMD on the desktop, and ARM and Qualcomm on the mobile. It is not easy for domestic CPUs to stand out.
In recent years, domestic CPU companies are constantly developing while looking for ways to break the situation. The most common development model for domestic CPU manufacturers is to purchase instruction set architecture licenses. Domestic CPU companies have obtained authorizations for ARM, MIPS and even X86. Purchasing instruction set architecture authorization is undoubtedly an efficient method, which is equivalent to building a house on the ground prepared by others, and the product cannot be said to be completely domestically independent. However, authorization is ultimately authorization, and without the protection of corresponding property rights, it may still be restricted.
Introduction to LoongArch
In April 2021, Loongson Zhongke took the lead in the domestic autonomy and announced the launch of a completely autonomous instruction set architecture: LoongArch, which is completely autonomous from the top-level architecture to instruction functions and ABI standards. This shows that Loongson’s future CPU will no longer use the MIPS instruction set architecture. Starting from the 3A5000 launched this year, it will use the LoongArch architecture. This is undoubtedly an important milestone in the localization of chips.
We made a micro-architecture diagram of the Loongson 3A5000 with reference to official information. You can roughly see that the 3A5000 is roughly divided into 4 blocks, and each block contains a core and a cache.
According to official information, LoongArch still belongs to the RISC camp and has the characteristics of RISC, such as 32-bit fixed-length instructions, 32 general-purpose registers, and 32 floating-point/vector registers. But LoongArch also made improvements, canceled the instruction delay slot of RISC, calculated the target address of the direct jump instruction relative to the PC, and increased the relative transfer offset.
LoongArch has nearly 2000 instructions and fully considers compatibility requirements. Compiling the same source code into LoongArch reduces the number of dynamic execution instructions by 10%-20% when compiled into MIPS previously supported by Loongson, which means higher operating efficiency and higher performance. promote.
LoongArch has also studied the characteristics of MIPS, X86, and ARM in depth, and can perform binary translation on these mainstream architectures. It can achieve 100% translation for MIPS and achieve cross-platform compatibility. Loongson’s goal is to eliminate the inter-instruction set by 2025. Barriers.
In July 2021, Loongson Zhongke released two processors based on the LoongArch instruction set architecture: 3A5000 and 3C5000L. The 3A5000 processor is a desktop-oriented product, and the 3C5000L is a server processor. The main frequency of 3A5000 is 2.3Ghz-2.5GHz, and it has 4 cores. Each processor core adopts 64-bit LA464 autonomous microstructure, supports DDR4-3200MHz memory, and supports Hyper Transport 3.0 controller. 3C5000L consists of 4 3A5000 packages and has 16 cores.
The 3A5000 processor has a built-in security module, which can be effectively immune to the two classic CPU vulnerabilities of Meltdown (fuse) and Spectre (ghost). Like the previous generation 3A4000, it supports built-in encryption and decryption algorithms and a secure trusted module. It is also the only one that has passed domestic commercial secrets. CPU built-in module for secondary model appraisal test.
Loongson 3A5000 machine introduction
The Loongson 3A5000 general-purpose processor is mainly used in the consumer desktop market. In the future, products including desktops, notebooks, and all-in-ones will be launched. This time we got a desktop product equipped with a 3A5000 processor. The Loongson 3A5000 is adopted in appearance Classic commercial office host style, with black as the main style. The front panel provides a conventional switch button, two USB 2.0 ports, and two audio input/output ports.
Loongson 3A5000 machine
The main board I/O provides a VGA video port, a serial COM port, 4 USB 2.0 ports, 2 USB 3.2 Gen1 5Gbps ports, and a wired network port.
Main plate I / O
otherAccessoriesAbove, this host uses 256GB SATA solid state, dual 8GB DDR4 3200MHz memory. The graphics card is AMD Radeon HD 8750M, and the graphics card I/O provides a VGA interface and an HDMI interface.
UnilC 2*8GB DDR4 3200MHz memory
AMD Radeon HD 8750M
Remove the radiator, you can see the protagonist this time: Loongson 3A5000, Loongson 3A5000 chip code is “KMYC70”, so named to commemorate the 70 years of the War of Resist US Aid Korea, and the server 3C5000L chip code is “CPC100” to celebrate the 100th anniversary of the founding of the party.
In this host, the Loongson 3A5000 is directly soldered and packaged on the motherboard and does not support DIY replacement.
In terms of compilers, the three major compilers GCC, LLVM, and GoLang supporting the Loongson 3A5000 and the three virtual machines Java, JavaScript and .NET have all been developed. Loongson’s own basic operating system Loongnix and LoongOS for industrial control have been released, but The Loongson 3A5000 machine in our hands uses the Tongxin UOS system. Due to optimization issues, the performance of the Loongnix 3A5000 mainframe will be different from that of Loongnix and other systems.
Host configuration
Tongxin UOS system is voluntarily initiated and developed by a number of domestic operating system core companies to develop a complete safe, easy-to-use, and stable operating system product, which is also a key ecological step for the localization of chips in the future. The official website is currently open for download, and interested users can go to the official website to download and try. In addition to Tongxin UOS operating system, in fact, the domestic self-developed operating system Kylin Kylin Godson Edition is also a good choice.
Actual test:
In addition to the Loongson 3A5000, the processors participating in the test also added the inteli5 9500 six-core 14nm processor, the domestic ARM V8 quad-core 7nm processor and the domestic ARM V8 eight-core 14nm processor as a reference for comparison. The main hardware of the whole machine The parameters remain the same.
Among them, the Intel i5 9500 six-core 14nm architecture processor has a main frequency of 3.0-4.4Ghz and a thermal design power consumption of 65W. The domestic ARM V8 quad-core 7nm processor clocked at 2.6GHz, and a single chip can support 64 cores. Another domestic ARM V8 eight-core 14nm processor, compatible with 64-bit ARMv8 instruction set, clocked at 2.3GHz.
It needs to be explained in advance that the number of cores of the four processors involved in the test is not the same, so in the multi-core test project we take the maximum number of cores of the processor.
Benchmarks
UnixBench performance test:
Now we officially start the test, first of all, we still choose the familiar UnixBench test tool. This software is a performance testing tool under Unix-like (Unix, BSD, Linux) systems, and is widely used to test the performance of Linux system hosts. It can test the system call, read and write, process, graphical test and other results, and it is also a software that tests the whole machine in all aspects.
UnixBench single-core and multi-core performance test
From the test results, it can be seen that the performance of the Loongson 3A5000 and the domestic ARM V8 quad-core 7nm processor are very good, the Loongson 3A5000 single-core performance reached 1685 points, compared with the previous generation Loongson 3A4000, the improvement is very obvious, the single-core performance It has approached the level of the Intel i5 9500 six-core 14nm processor. This is also in line with Godson’s upgrade strategy of first improving single-core performance through design optimization, and then using advanced technology to increase the number of cores.
In the multi-core performance comparison, the Loongson 3A5000 achieved 4314 points and the 4387 points of the domestic ARM V8 quad-core 7nm are basically the same, but if compared with the Intel i5 9500 six-core 14nm, there is still a big gap. However, the 4-core Godson 3A5000 is more than 600 points higher than the domestic ARM V8 eight-core 14nm processor.
SPEC 2006 test:
Next, we conduct a comparative test of SPEC 2006. SPEC 2006 is a large-scale CPU performance test project, focusing on testing the system’s processor, memory subsystem and compiler. Able to test the most basic fixed-point performance and floating-point performance of the CPU. It should also be noted in advance that the number of cores in the test processor is not equal, so we choose the score with the most cores in the multi-core test.
SPEC CPU2006 BASE performance test
This time we divided the SPEC 2006 test into single-core and multi-core tests. Loongson 3A5000 single-core fixed point is 25.1 points, single-core floating point is 26 points. Compared with the Intel i5 9500 six-core 14nm processor, there is indeed a big gap, but the single-core fixed-point processor is comparable to the domestic ARM V8 quad-core 7nm processor, and the single-core floating point is slightly better than the domestic ARM V8 quad-core processor. This 7nm processor. Compared with the domestic ARM V8 eight-core 14nm processor, the single-core fixed point of the Loongson 3A5000 is nearly 10 points higher, and the single-core floating point is nearly twice as high.
In the multi-threaded test, the Intel i5 9500 six-core 14nm processor still performed best, while the Loongson 3A5000 multi-core fixed-point and multi-core floating point were higher than the domestic ARM V8 quad-core 7nm processor, due to the domestic ARM V8 eight-core 14nm processor The number of cores has certain advantages, so the fixed-point and floating-point scores are higher than the Loongson 3A5000 and domestic ARM V8 quad-core 7nm processors.
Stream:
Stream is the mainstream memory bandwidth test program in the industry, and the test behavior is relatively simple and controllable. This program requires very little CPU computing power and puts a lot of pressure on CPU memory bandwidth. As the number of processor cores increases, memory bandwidth does not increase linearly. Therefore, memory bandwidth is more important to improve the processing power of multi-cores.
Stream memory test
In the performance of the Stream Copy test sub-item, the Loongson 3A5000 performed quite well, surpassing the intel i5 9500 six-core 14nm processor. Among them, Copy single-line performance scored 16,864 points, and multi-line performance scored 21873 points. The scores of the domestic ARM V8 octa-core 14nm processor and the domestic ARM V8 quad-core 7nm processor are not much different, but the overall performance of Copy is slightly inferior to that of the Loongson 3A5000.
Application test
In fact, in addition to the single-core and multi-core benchmark performance tests of the processor, the user software application experience can more intuitively reflect the performance difference between the processors. Let’s take a look at the commonly used office WPS, browser and video playback. The application experience of the device.
WPS:
We will install the same version of Tongxin UOS operating system on the host equipped with four processors, and then use the WPS office software to open 10MB (text+picture), 50M (text+picture) and 50M (text+picture+video). For large-capacity files, focus on testing the speed of opening documents to measure the performance of the processor. In order to ensure the regularity of the test data samples as much as possible, each document is opened 5 times and then the average is taken.
WPS office software open document speed comparison (the shorter the time, the better)
Through the actual test, it can be seen that in the 10MB (text + picture) document opening speed, the domestic ARM V8 octa-core 14nm processor takes the shortest time to be 1.47 seconds, and the Loongson 3A5000 to open the file in 1.54 seconds. In the 50M (text+picture) opening speed, the domestic ARM V8 quad-core 7nm processor takes the longest time of 3.01 seconds. In the 50M (text+picture+video) test, the domestic ARM V8 eight-core 14nm processor takes the longest time to 4.24 Seconds, the Intel i5 9500 six-core 14nm processor takes the shortest time of 2.23 seconds. On the whole, the overall performance of the Intel i5 9500 six-core 14nm processor is better, while the Loongson 3A5000 is slightly better than the domestic ARM V8 quad-core 7nm processor. The domestic ARM V8 octa-core 14nm processor has a better file opening speed for small files. However, the opening speed of large documents is not ideal.
Browser:
The browser is an important application for us to view webpage information and videos every day. At present, every webpage is basically filled with a large number of graphics, which will put a lot of load on the CPU operation. Let’s test the performance of the four processors on the browser. We open the browser and load the iQiyi video website at the same time, and compare the opening time. Since the Loongson 3A5000 provides its own Loongnix browser, the Loongson 3A5000 chooses its own Loongnix browser V3.1, and all others choose the Firefox browser for testing.
Browser open iqiyi website speed (the shorter the time, the better) unit: second
Because there are videos, pictures, CSS, JavaScript, etc. on the iQiyi website, loading the page also puts forward certain requirements on the processor performance. Through the test, we can see that the Intel i5 9500 six-core 14nm processor has a better loading speed in 1.4 seconds, and the Loongson 3A5000’s 1.78 seconds is also very good. The longest time is the domestic ARM V8 eight-core 14nm processor in 2.35 seconds.
Video playback:
The final comparison test is the performance of video and audio playback. We chose the default video player under the UOS operating system of Tongxin, and opened the 1080P video mp4 format of the same size, and tested the time for the four processors to load the video.
Cinema player loading 1080P MP4 video time (the shorter the time, the better) Unit: second
Through actual measurement, it can be seen that the domestic ARM V8 quad-core 7nm processor has the fastest loading speed of 1.43 seconds, the Loongson 3A5000 is followed by 1.64 seconds, and the domestic ARM V8 eight-core 14nm processor takes 2.09 seconds. In addition, it’s worth mentioning that because the performance of the Loongson 3A5000 processor has been significantly improved,4KThe soft decoding capability of high-definition video has also been further improved, and high-definition video can still be played smoothly without the independent display.
Do not break or stand, continue to surpass
For a long time, the goal of Loongson Zhongke is to allow the Chinese to use a completely autonomous CPU processor. After 20 years of prosperous years, the emergence of LoongArch has made Loongson a big step away from this goal, and it is not only It is a breakthrough for Godson and a new milestone in China’s independent CPU industry.
The performance of the Loongson 3A5000 based on the autonomous command system LoongArch is very satisfying. The short-term shortcoming is ecological construction, and application software adaptation needs to be strengthened. Although as a transitional stage, Loongson’s binary translation system LAT can achieve cross-instruction platform application compatibility, and can also run part of X86/WindowsApplication software, but in order to achieve a huge software ecosystem that meets the needs of various applications, the teamwork of domestic software manufacturers is still needed.
The complexity involved in the ecology is sometimes more troublesome than a single technology. The CPU ecology requires hardware, system and user support. On the hardware, Godson already has the Godson 3A5000 blessed by LoongArch, and the system includes Tongxin UOS, Kylin, etc. Adaptation of domestic operating system. What LoongArch needs most at present is users. Users include not only consumers but also developers. Without the software ecological support brought by developers, there will be no large number of consumers paying for it. Without user consumption, there will be no funds to continue research and development, so LoongArch vigorously The promotion of ecology is the most important step at the moment.AppleM1 is considered a threatIntelAnd Windows, also because of the huge user base, plus tens of millions of iOS developers around the world, can become a dark horse in the CPU industry.
It may be difficult for some users to understand, why we have been doing CPUs for so many years, and the domestic CPU ecology still cannot be established. Here I give an example. Some time ago, it was reported that Intel was going to build a fab in Europe. How much would it invest? The news shows that Intel’s total investment during the entire life cycle may exceed 100 billion US dollars. Even including policy subsidies from some countries, Intel invests 10 billion US dollars in chip research and development every year. The actual investment in the entire domestic CPU industry is much lower than that of companies like Intel, and cannot meet the needs of all chip companies at all, and most of them just barely maintain research and development. Moreover, foreign CPUs and operating systems have run-in for decades in the software ecology and industrial system. As the saying goes, the slightest difference is a thousand miles away. A small detail error will make a billion-dollar product line obsolete. Without capital, it will inevitably be impossible to play. The threshold is high and it is difficult to enter the industry.
In such an environment, it is not easy for Loongson to launch the autonomous instruction set architecture LoongArch. To be precise, companies like Loongson that still insist on making domestically-made independent chips are worthy of respect.
Without a bit of cold, how could the plum blossoms smell scent, and the road to autonomy is extremely difficult. Godson’s courage is amazing. Now that Godson has taken the most difficult first step, the next step is the establishment of the ecosystem. At present, Loongson Zhongke has established the LoongArch community, and at the same time, it will form the LoongArch alliance and open LoongArch for free. We hope that more developers can participate in it and let the domestic independent CPU go further. We look forward to Loongson Zhongke bringing us a new and independent Domestic CPU ecological field.