Harvard Dropouts Raise $5 Million for LLM Accelerator

0
32
Harvard Dropouts Raise $5 Million for LLM Accelerator

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

MENLO PARK, Calif.—A pair of 21-year-old Harvard dropouts have raised $5.36 million in a seed round for their chip startup Etched.ai, which plans to make an AI accelerator chip dedicated to large language model (LLM) acceleration, the men told EE Times. The round was led by Primary Venture Partners with MAX Ventures and angels, including former Ebay CEO Devin Wenig. This seed round values the company at $34 million.

Etched.ai CEO Gavin Uberti told EE Times he had planned to take a year off from Harvard, but ended up finding a job working on the ApacheTVM open-source compiler and microkernels at OctoML.

While working on developing microkernels for Arm Cortex M4 and Cortex M7 cores, Uberti noted that Arm’s instruction set didn’t have an 8-bit MAC SIMD instruction, only 16-bit (M4 and M7 support many other 8-bit SIMD operations, but an 8-bit MAC SIMD instruction was introduced with Helium). This meant 8-bit MAC SIMD operations ran effectively at half speed.

“It could never be fixed, and every time I’d go to work, I’d have to deal with this [oversight], and it made me think with Chris [Zhu, Etched.ai CTO] that we have to be able to do this better,” Uberti said. “At the same time, we saw there was a change happening in the world of language models.”

Uberti was referring to the recent surge in interest in LLMs, such as ChatGPT, which are based on the transformer architecture.

He and Zhu decided to start a chip company to design a more efficient inference architecture for LLMs. While there isn’t an LLM-specific accelerator on the market today, Nvidia has announced software features aimed at transformers, and other accelerator companies have announced support for language and vision transformers. Etched.ai plans to compete with incumbents by specializing further.

“You can’t get the kind of improvements we’re getting by being generalized,” Uberti said. “You have to bet hard on a single architecture, not just on AI, but on something more specific…. We think eventually Nvidia will do this. We think the opportunity is too big to ignore.”

uberti-and-zhu-1-e1685570010635-4855525
Gavin Uberti (left) and Chris Zhu (right) have dropped out of Harvard to form an AI chip startup focused on LLMs. (Source: Etched.ai)

Specialized accelerator

Uberti cites bitcoin mining chips as an example of a successful specialized ASIC offering. In the AI accelerator domain, several companies have specialized architectures for particular workloads. There are a few of examples of CNN-focused architectures at the edge (see: Kneron), while specialized architectures for the data center have mainly focused on DLRM (deep learning recommendation model), which is notoriously difficult for GPUs to accelerate (see: Neuchips). By contrast, Nvidia already has a fully deployed software feature called the Transformer Engine in its current H100 GPU, which allows LLM inference to be run without further quantization.

There’s also the problem of hyperscalers’ appetite for building their own specialized chips for their own workloads. Meta recently announced it had built its own DLRM inference chip, which is already widely deployed. Google’s TPU and AWS’ Inferentia are built for more general workloads.

Any comparison with recommendation workloads should take into account the timescales involved, Zhu said, since recommendation is relatively mature at this point.

“This is a very recent development—the market for running transformers didn’t really exist six months ago, whereas DLRM on the other hand has had comparatively longer,” he said. “The world has changed very rapidly and that is our opportunity.”

On the flipside, the rapid evolution of workloads in the AI space could spell disaster if Etched.ai specializes too much.

“That’s a real risk, and I think it’s turning off a lot of other people from going down this route, but transformers aren’t changing,” Uberti said. “If you look back four years to GPT-2, compared to Meta’s recent Llama model, there are just two differences—the size and the activation function. There are differences in how it is trained, but that doesn’t matter for inference.”

The basic components of transformers are fixed, and while there are nuances, Uberti is not worried.

“Innovations don’t come out of thin air,” he said. “There’s still this cycle of things published in academia that takes some time to be integrated.”

Uberti’s examples are gated linear activation units, which first appeared in literature in 2018 but didn’t find their way into Google’s Palm model until 2020, and 2021’s Alibi, a method for positional encoding, which didn’t find widespread adoption until the end of 2022. A typical startup might take 18-24 months to develop a chip from scratch.

“The edge industry tells us a lot—the one lesson they’ve learned is not to specialize, that you don’t know what the future holds, place your bets in the wrong place and you could be useless,” Uberti said. “We took that advice and threw it out the window.”

Sohu chip

The partners have been working ideas for their first chip, codenamed Sohu, which they claim can reach 140× the throughput per dollar compared with an Nvidia H100 PCIe card processing GPT-3 tokens.

Sohu will be “a chip that has a lot of memory,” and Uberti hinted that the two-order-of–magnitude performance metric was mainly due to impressive throughput (rather than a drastic cost differential), with the chip designed to support large batch sizes.

The men have “most of the architecture fleshed out” already, Uberti said. The tiled nature of the design should make design faster and minimize complexity, he added. Supporting only one type of model also minimizes the complexity of the software stack, particularly the compiler.

The company’s customers will be anyone who wants to pay less to use ChatGPT. Beyond that, the company is still working on its business model, Uberti said, confirming the company already has “customer dollars committed” but declined to give further details.

Etched.ai plans to spend its seed funding on hiring an initial team, getting to work on the RTL front-end development and begin talking to IP providers. So far, Etched.ai has hired Mark Ross as chief architect, who had a spell as Cypress CTO in the early 2000s.

The company will seek to do a series A likely at the beginning of next year.

“Most investors are skeptical, and rightfully so, because what they see is a pair of undergrads trying to tackle the semiconductor industry,” Zhu said. “But there is definitely still a non-trivial portion of investors who are impressed and excited by the vision that we’re pitching them and what we think that this can become.”

Etched.ai is aiming to have its Sohu chip available in 2024.