{"id":25621,"date":"2024-04-16T09:33:13","date_gmt":"2024-04-16T01:33:13","guid":{"rendered":"https:\/\/ljdevice.com.tw\/?p=25621\/"},"modified":"2024-04-16T09:33:55","modified_gmt":"2024-04-16T01:33:55","slug":"meta-details-five-nanometer-mtia-chip-accelerating-ai-inference-workloads","status":"publish","type":"post","link":"https:\/\/ljdevice.com.tw\/en\/meta-details-five-nanometer-mtia-chip-accelerating-ai-inference-workloads\/","title":{"rendered":"Meta details five-nanometer MTIA chip for accelerating AI inference workloads"},"content":{"rendered":"<p>APRIL 10 2024<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-25618\" src=\"https:\/\/ljdevice.com.tw\/wp-content\/uploads\/2024\/04\/2024-04-16_092941-300x194.jpg\" alt=\"\" width=\"300\" height=\"194\" \/><!--more--><\/p>\n<p>Meta Platforms Inc. today detailed a new iteration of its MTIA artificial intelligence chip that can run some workloads up to seven times faster than its predecessor.<\/p>\n<p>The first version of the MTIA, or Meta Training and Inference Accelerator, made its debut last May. Despite the name, the chip isn\u2019t optimized for AI training but rather focuses primarily on inference, or the task of running AI models in production. Meta built both the first-generation MTIA and new version detailed today to power internal workloads such as content recommendation algorithms.<\/p>\n<p>The latest iteration of the chip retains the basic design of its predecessor, the company detailed. Meta also carried over some of the software tools used to run AI models on the MTIA. The company\u2019s engineers combined those existing building blocks with hardware enhancements that significantly increase the new chip\u2019s performance.<\/p>\n<p>Like its predecessor, the second-generation MTIA comprises 64 compute modules dubbed PEs that are optimized for AI inference tasks. Each PE has a dedicated cache that it can use to store data. Placing memory close to logic circuits reduces the distance data has to cover while moving between them, which shortens travel times and thereby speeds up processing.<\/p>\n<p>Meta originally made the PEs using Taiwan Semiconductor Manufacturing Co. Ltd.\u2019s seven-nanometer process. With the second-generation MTIA, the company has switched to a newer five-nanometer node. It also expanded the cache integrated into each PE compute module from 128 kilobytes to 384 kilobytes.<\/p>\n<p>The MTIA\u2019s onboard cache is based on a memory technology called SRAM. It\u2019s faster than DRAM, the most widely used type of computer memory, which makes it more suitable for powering high-performance chips.<\/p>\n<p>DRAM is made of cells, tiny data storage modules that each comprise a transistor and a kind of miniature battery called a capacitor. SRAM, in contrast, uses a more complex cell design that features six transistors. This architecture makes SRAM significantly faster but also costs more to manufacture and limits the available storage capacity. As a result, the technology has few applications besides powering processors\u2019 onboard cache modules.<\/p>\n<p>\u201cBy focusing on providing outsized SRAM capacity, relative to typical GPUs, we can provide high utilization in cases where batch sizes are limited and provide enough compute when we experience larger amounts of potential concurrent work,\u201d Meta engineers detailed in a\u00a0<a href=\"https:\/\/ai.meta.com\/blog\/next-generation-meta-training-inference-accelerator-AI-MTIA\/#hardware\">blog post<\/a>\u00a0today.<\/p>\n<p>The 64 PE compute modules in Meta\u2019s MTIA chip can not only move data to and from their respective caches, but also share that data with one another. An on-chip network allows the modules to coordinate their work when running AI models. Meta says that the network provides more than twice as much as the module interconnect layer in the original MTIA, which speeds up processing.<\/p>\n<p>According to the company, another contributor to the new chip\u2019s increased performance is a set of improvements \u201cassociated with pipelining of sparse compute.\u201d In AI inference, sparsity is a principle that states a sizable portion of the data a neural network processes often isn\u2019t necessary to produce an accurate result. Some AI inference chips can remove this unnecessary data to speed up computations.<\/p>\n<p>\u201cThese PEs provide significantly increased dense compute performance (3.5x over MTIA v1) and sparse compute performance (7x improvement),\u201d Meta detailed.<\/p>\n<p>In its data centers, the company plans to deploy the new MTIA chip as part of racks that are also based on a custom design. Every rack is divided into three sections that each contain 12 hardware modules dubbed boards. Each board, in turn, holds two MTIA chips.<\/p>\n<p>Meta also developed a set of custom software tools to help its developers more easily run AI models on the processor. According to the company, several of those tools were carried over from the original version of the MTIA that it detailed last year.<\/p>\n<p>The core pillar of the software bundle is a system called Triton-MTIA that turns developers\u2019 AI models into a form the chip can run. It\u2019s partly based on Triton, an open-source AI compiler developed by OpenAI that ships with its own programming language. Triton-MTIA also integrates with other open-source technologies including PyTorch, a popular AI development framework created by Meta.<\/p>\n<p>\u201cIt improves developer productivity for writing GPU code and we have found that the Triton language is sufficiently hardware-agnostic to be applicable to non-GPU hardware architectures like MTIA,\u201d Meta\u2019s engineers detailed. \u201cThe Triton-MTIA backend performs optimizations to maximize hardware utilization and support high-performance kernels.\u201d<\/p>\n<p>Source:<a href=\"https:\/\/siliconangle.com\/2024\/04\/10\/meta-details-five-nanometer-mtia-chip-accelerating-ai-inference-workloads\/\">Meta details five-nanometer MTIA chip for accelerating AI inference workloads &#8211; SiliconANGLE<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>APRIL 10 2024<\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[609],"tags":[],"class_list":["post-25621","post","type-post","status-publish","format-standard","hentry","category-industrial-news"],"_links":{"self":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts\/25621","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/comments?post=25621"}],"version-history":[{"count":3,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts\/25621\/revisions"}],"predecessor-version":[{"id":25623,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/posts\/25621\/revisions\/25623"}],"wp:attachment":[{"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/media?parent=25621"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/categories?post=25621"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ljdevice.com.tw\/en\/wp-json\/wp\/v2\/tags?post=25621"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}