{"id":9745,"date":"2020-03-11T14:28:19","date_gmt":"2020-03-11T06:28:19","guid":{"rendered":"https:\/\/ljdevice.com.tw\/%e6%af%94-frontier-%e5%bf%ab%e4%ba%86-10-%e5%80%8d%ef%bc%8camd-%e8%88%87-hpe-%e8%81%af%e6%89%8b%e6%89%93%e9%80%a0%e6%95%88%e8%83%bd%e9%81%94-2-exaflops-%e7%9a%84%e5%85%a8%e7%90%83%e6%9c%80%e5%bf%ab\/"},"modified":"2020-09-16T11:25:23","modified_gmt":"2020-09-16T03:25:23","slug":"%e6%af%94-frontier-%e5%bf%ab%e4%ba%86-10-%e5%80%8d%ef%bc%8camd-%e8%88%87-hpe-%e8%81%af%e6%89%8b%e6%89%93%e9%80%a0%e6%95%88%e8%83%bd%e9%81%94-2-exaflops-%e7%9a%84%e5%85%a8%e7%90%83%e6%9c%80%e5%bf%ab","status":"publish","type":"post","link":"https:\/\/ljdevice.com.tw\/en\/%e6%af%94-frontier-%e5%bf%ab%e4%ba%86-10-%e5%80%8d%ef%bc%8camd-%e8%88%87-hpe-%e8%81%af%e6%89%8b%e6%89%93%e9%80%a0%e6%95%88%e8%83%bd%e9%81%94-2-exaflops-%e7%9a%84%e5%85%a8%e7%90%83%e6%9c%80%e5%bf%ab\/","title":{"rendered":"El Capitan Supercomputer Detailed: AMD CPUs &#038; GPUs To Drive 2 Exaflops of Compute"},"content":{"rendered":"<p><span class=\"body\">by <a class=\"b\" href=\"https:\/\/www.anandtech.com\/Author\/85\">Ryan Smith<\/a><a href=\"https:\/\/www.anandtech.com\/show\/15581\/el-capitan-supercomputer-detailed-amd-cpus-gpus-2-exaflops#\"> <em>on March 4, 2020 <\/em><\/a><\/span><\/p>\n<p><span class=\"body\"> <span class=\"head\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-9742\" src=\"https:\/\/ljdevice.com.tw\/wp-content\/uploads\/2020\/03\/el-capitan_875x500_0-624x356-300x171.jpg\" alt=\"\" width=\"300\" height=\"171\" \/><\/span><\/span><\/p>\n<p><!--more--><\/p>\n<p>Back in August, the United States Department of Energy and Cray announced plans for a third United States exascale supercomputer, <a href=\"https:\/\/www.llnl.gov\/news\/doennsa-lab-announce-partnership-cray-develop-nnsas-first-exascale-supercomputer\">El Capitan<\/a>. Scheduled to be installed in Lawrence Livermore National Laboratory (LLNL) in early 2023, the system is intended primarily (but not exclusively) for use by the National Nuclear Security Administration (NNSA), who uses supercomputers in their ongoing nuclear weapons modeling. At the time the system was announced, The DOE and LLNL confirmed that they would be buying a Shasta system from Cray (now part of HPE), however the announcement at the time didn\u2019t go into any detail about what hardware would actually be filling one of Cray\u2019s very flexible supercomputers.<\/p>\n<p>But as of today, the wait is over. This afternoon the DOE and HPE are announcing the architectural details of the supercomputer, revealing that AMD will be providing both the CPUs and accelerators (GPUs), as well as revising the performance estimate for the supercomputer. Already expected to be the fastest of the US\u2019s exascale systems, El Capitan was originally commissioned as a 1.5 exaflop system seven months ago. However thanks to some late configuration changes, the DOE now expects the system to reach 2 exaflops once it\u2019s fully installed, which would cement its place at the top of the US\u2019s supercomputer inventory.<\/p>\n<p align=\"center\"><a href=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_04.jpg\"><img decoding=\"async\" src=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_04_575px.jpg\" alt=\"\" \/><\/a><\/p>\n<p>Overall, El Capitan is the second (<a href=\"https:\/\/www.hpcwire.com\/2019\/08\/13\/cray-wins-nnsa-livermore-el-capitan-exascale-award\/\">and apparently final<\/a>) system being built as part of the US DOE\u2019s CORAL-2 program for supercomputers. Like the similar Frontier system, El Capitan comes with a $600 million price tag and is intended to ensure the US\u2019s leadership in supercomputers in the exascale era. LLNL will be using the system to replace <a href=\"https:\/\/computing.llnl.gov\/computers\/sierra\">Sierra<\/a>, their current IBM Power 9 + NVIDIA Volta supercomputer. All told, El Capitan will be 16 times more powerful than the system it replaces. LLNL will be using it primary for nuclear weapons modeling \u2013 substituting for actual weapon testing \u2013 while the system will also see secondary use as a research system in other fields, particularly those where machine learning can be applied.<\/p>\n<table border=\"1\" width=\"650\" cellspacing=\"0\" cellpadding=\"3\" align=\"center\">\n<tbody>\n<tr class=\"tgrey\">\n<td colspan=\"6\" align=\"center\">US Department of Energy Exascale Supercomputers<\/td>\n<\/tr>\n<tr class=\"tlblue\">\n<td class=\"contentwhite\" align=\"center\" bgcolor=\"#016a96\" width=\"152\"><\/td>\n<td class=\"contentwhite\" align=\"center\" bgcolor=\"#016a96\" width=\"114\">El Capitan<\/td>\n<td class=\"contentwhite\" align=\"center\" bgcolor=\"#016a96\" width=\"114\">Frontier<\/td>\n<td class=\"contentwhite\" align=\"center\" bgcolor=\"#016a96\" width=\"114\">Aurora<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>CPU Architecture<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">AMD EPYC &#8220;Genoa&#8221;<br \/>\n(Zen 4)<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">AMD EPYC<br \/>\n(Future Zen)<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Intel Xeon Scalable<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>GPU Architecture<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Radeon Instinct<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Radeon Instinct<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Intel Xe<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>Performance (RPEAK)<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">2.0 EFLOPS<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">1.5 EFLOPS<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">1 EFLOPS<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>Power Consumption<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">&lt;40MW<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">~30MW<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">N\/A<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>Nodes<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">N\/A<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">100 Cabinets<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">N\/A<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>Laboratory<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Lawrence Livermore<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Oak Ridge<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Argonne<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>Vendor<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Cray<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Cray<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">Intel<\/td>\n<\/tr>\n<tr>\n<td class=\"tlgrey\"><strong>Year<\/strong><\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">2023<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">2021<\/td>\n<td align=\"center\" bgcolor=\"#f7f7f7\">2021<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>El Capitan is the second exascale supercomputer win for AMD, who is also providing the CPUs and GPUs behind the 1.5 exaflops Frontier system for Oak Ridge National Laboratory. And indeed, at a high level El Capitan looks a whole lot like Frontier from a hardware perspective. With Cray serving as the prime contractor on both systems, El Capitan and Frontier are Cray Shasta systems, employing AMD\u2019s processors along with Cray\u2019s cabinets and their Slingshot interconnect technology. However in an interesting turn of events, LLNL is being just a bit more forthcoming about what specific hardware will be in their new supercomputer.<\/p>\n<p align=\"center\"><a href=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_11.jpg\"><img decoding=\"async\" src=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_11_575px.jpg\" alt=\"\" \/><\/a><\/p>\n<p>On the CPU side of matters, AMD will be supplying a standard version of their <a href=\"https:\/\/www.anandtech.com\/show\/13554\/amd-announces-zen-4-microarchitecture\">Zen 4<\/a>-based \u201cGenoa\u201d EPYC processor. As it\u2019s still two generations out from AMD\u2019s current wares, the amount of information on Zen 4\/Genoa is limited, but AMD is promising support for next-generation memory, Infinity Fabric 3, as well as broad promises of both single and multi-threaded performance leadership. Notably, this is a greater level of detail on the CPU than we currently have for Frontier, which is using an unspecified and customized next-generation EPYC CPU.<\/p>\n<p align=\"center\"><a href=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_12.jpg\"><img decoding=\"async\" src=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_12_575px.jpg\" alt=\"\" \/><\/a><\/p>\n<p>Meanwhile on the GPU side of matters, AMD and Cray are continuing to hold their cards rather close. While the companies are confirming that this will use a next-generation AMD GPU using a new architecture, they aren\u2019t naming the architecture or offering too much in the way of details about it. For now, what they are saying is that these GPUs will be using next-generation HBM for their memory, and that they\u2019ll bring support for mixed precision compute for improved deep learning performance.<\/p>\n<p>On the whole, these broad specifications are very close to the GPU slated to be used in Frontier, so El Capitan may very well be using the same GPU, or at least a further derivative of it. From the nature of AMD\u2019s comments about the part, it sounds like whatever it is, we should expect to find out more architectural details about it soon.<\/p>\n<p align=\"center\"><a href=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_13.jpg\"><img decoding=\"async\" src=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_13_575px.jpg\" alt=\"\" \/><\/a><\/p>\n<p>But perhaps the biggest part of today\u2019s reveal is the interconnect. For the first time AMD is naming their 3rd generation Infinity Fabric, which will be used to connect the processors within each blade. Like Frontier, El Capitan will be running in a 4:1 configuration, with four GPUs hooked up to each CPU. For Infinity Fabric 3.0, AMD is promising further improvements to inter-chip bandwidth and latency. However the most interesting claim is that these IF 3.0 device nodes will support unified memory across the CPU and GPU, which is something AMD doesn\u2019t offer today. Indeed even Frontier is only slated to offer coherency between the processors which is a step below a true unified memory model. The devil is in the details of course \u2013 a unified memory system does not necessarily mean fast access to other devices\u2019 memory \u2013 but this stands to be a major leap for AMD as a unified memory system can improve both the ease in programming such a system, and improving its performance when running heterogeneous workloads.<\/p>\n<p>Finally, as previously mentioned, tying together the nodes will be Cray\u2019s own Slingshot interconnect. Among other things, Slingshot supports adaptive routing, congestion management, and quality-of-service features. The interconnect is capable of 200Gb\/sec per port, with individual blades incorporating a port for each GPU in the blade so that other nodes can directly read and write data to a GPU\u2019s memory.<\/p>\n<p align=\"center\"><a href=\"https:\/\/images.anandtech.com\/doci\/15581\/HPE_Cray_Shasta_Exploded_View.jpg\"><img decoding=\"async\" src=\"https:\/\/images.anandtech.com\/doci\/15581\/HPE_Cray_Shasta_Exploded_View_575px.jpg\" alt=\"\" \/><\/a><\/p>\n<p>Unfortunately, the DOE and Cray are not going into quite as much detail on the completed layout of the system. El Capitan is slated to use less than 40MW of power \u2013 and we\u2019re told it\u2019ll be &#8220;fairly substantially under that&#8221; \u2013 however at this time the DOE isn\u2019t disclosing the total number of cabinets. But to put things in comparison, Frontier is slated to use 100 Shasta cabinets, with a total power budget lower than El Capitan. So we wouldn\u2019t be too surprised to ultimately find out that part of the reason that El Capitan is 33% faster than Frontier is due to the DOE throwing more hardware at it and ordering more cabinets. But whatever the number, it\u2019s going to be enough that El Capitan will be using direct liquid cooling.<\/p>\n<p>Meanwhile, it\u2019s interesting to note that in their press conference, LLNL took the time to mention that part of the performance boost for El Capitan over its initial order was due to the group\u2019s procurement plan. LLNL noted that they used a \u201clate-binding\u201d strategy for El Capitan, deciding on the (Shasta) architecture early, and then picking the specific processors at a later point \u2013 presumably about as late as they could wait to make the decision. Ultimately LLNL cites this as giving them better results in the end, as they were able to pick the fastest hardware that could be made available. In other words, while the DOE and LLNL announced El Capitan back in August, they only recently decided that it would be AMD filling it.<\/p>\n<p>Overall, El Capitan marks an important second exascale supercomputer win for AMD, while Cray will now be involved in all three US exascale systems. So it\u2019s a big win for both vendors, and a continuation of momentum for AMD, who only just scored its first big supercomputer win in a long while with Frontier last year.<\/p>\n<p align=\"center\"><a href=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_07.jpg\"><img decoding=\"async\" src=\"https:\/\/images.anandtech.com\/doci\/15581\/ElCap_07_575px.jpg\" alt=\"\" \/><\/a><\/p>\n<p>The fact that El Capitan is a derivative of Frontier also means that with all three exascale systems now locked in, it will be NVIDIA who finds themselves on the outside looking in for this generation. As we noted with the Frontier announcement, the Intel Aurora and the AMD Frontier\/El Capitan systems are coming from full-service processor vendors that supply both CPUs and GPUs. Current-generation systems like Summit use mixed vendors \u2013 e.g. IBM + NVIDIA \u2013 so the move to integrated vendors is a big shift for these CPU + accelerator systems. And while it makes a lot of sense for LLNL to order a copy of one of the other exascale systems in the name of efficiency, it should be noted that US DOE supercomputer contracts are as much political as they are technical. The US has a vested interest in supporting a domestic supercomputer industry and ensuring there are viable competitors to help keep costs down (there used to be several), so with three major processor alliances\/vendors in the US, someone was bound to end up the odd man out.<\/p>\n<p>At any rate, El Capitan is scheduled for delivery in early 2023. And with AMD\u2019s annual Financial Analyst Day scheduled for tomorrow, hopefully we\u2019ll be getting a better picture of where Genoa fits into AMD\u2019s roadmaps, and perhaps a bit more on what to expect on the hardware that will eventually be powering the world\u2019s fastest supercomputer.<\/p>\n<p>Source:<a href=\"https:\/\/www.anandtech.com\/show\/15581\/el-capitan-supercomputer-detailed-amd-cpus-gpus-2-exaflops\">El Capitan Supercomputer Detailed: AMD CPUs &amp; GPUs To Drive 2 Exaflops of Compute<\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>by Ryan Smith on March 4, 2020<\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[609],"tags":[],"class_list":["post-9745","post","type-post","status-publish","format-standard","hentry","category-industrial-news"],"_links":{"self":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts\/9745","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/comments?post=9745"}],"version-history":[{"count":2,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts\/9745\/revisions"}],"predecessor-version":[{"id":9747,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts\/9745\/revisions\/9747"}],"wp:attachment":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/media?parent=9745"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/categories?post=9745"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/tags?post=9745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}