Hailo Debuts Edge GenAI Chip, Raises $120 Million

0
17
Hailo Debuts Edge GenAI Chip, Raises $120 Million

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

Edge AI chip startup Hailo has launched a new chip designed to accelerate generative AI models at the edge. The company also raised $120 million in a new funding round.

Hailo CEO Orr Danon told EE Times the new Hailo-10 can run Llama2-7B with up to 10 tokens per second with less than 5 W of power, or StableDiffusion 2.1 at under 5 seconds per image in the same power envelope.

“The idea is to enable a new class of devices with high performance acceleration, but within the cost and power budget of the edge, which has always been our traditional strength,” Danon said. “We’re showcasing very significant improvements both in performance and power consumption versus integrated NPUs.”

Use cases for the Hailo-10 are varied, but will include AI in the PC and another key market for Hailo: automotive.

rzv2h-600x340_thumbnail-4945565

By Shingo Kojima, Sr Principal Engineer of Embedded Processing, Renesas Electronics  03.26.2024

thumbnail-image-1-5367415

By Dylan Liu, Geehy Semiconductor   03.21.2024

pcie-nvme_600x340-1353851

By Lancelot Hu  03.18.2024

7_orr-danon_ceo_hailo-1-9638577
Orr Danon (Source: Hailo)

“All tech CEOs are now looking at any product thinking, ‘How can I use this advancement in AI to make my business better?’” Danon said. “There are lots of great ideas and lots of opportunities…[Generative AI] is a theme we’ll see in many markets, but automotive will probably be the fastest one, with natural user interfaces where you feel like you’re talking to a person, or at least, don’t feel like you’re talking to a machine.”

A large language model (LLM)-based system in a vehicle might use Whispr-based voice-to-text before generating a response via a one to seven-billion–parameter LLM. The first automotive applications for generative AI will include navigation systems and infotainment.

“It doesn’t need to be Shakespeare, it just needs to be something you feel comfortable talking with,” Danon said. “It should respond immediately with something that resembles a natural conversation.”

Most Hailo customers are not interested in running very large models at the edge.

“We are not focusing on the biggest models,” he said. “For edge deployments, you can run relatively large models, but what most customers are interested in is not running 70B parameters—you could do it, but it just wouldn’t be meaningful. They would rather run a more specialized model that’s fit for the edge. With a 70B model, where do you store it? 70 GB of RAM would be more expensive than your edge device, so it doesn’t make sense.”

There are plenty of good models available between one and seven billion parameters today, Danon said, adding that optimization methods like speculative decoding can help deploy good quality models at very low power and reasonable cost.

“When you look at realistic deployments, that’s where things are headed,” he said. “All the major vendors are announcing optimized models—Google, Microsoft, Meta—and from the Chinese ecosystem too, which is as vibrant as the Western ecosystem. We’re seeing all these [models] coming into play.”

hailo-10-7284427
The Hailo-10, designed for generative AI, can achieve 40 TOPS at INT4. (Source: Hailo)

Lower precision

Hailo already has its Hailo-8 accelerator and the Hailo-15 SoC for security cameras, but the Hailo-10 is slightly different.

“We have significantly improved our ability to work with large models, with a dedicated memory interface to the device,” Danon said. “The Hailo-8 is mostly vision focused, Hailo-10 is more genAI but for a mixture of modalities, mixing genAI with transformers and CNNs, etc…all the practical use cases we see are a mixture of these modalities.”

The Hailo-10 supports 4-, 8- and 16-bit integer precision and can achieve 40 TOPS at INT4. Addition of a 4-bit precision capability doubles throughput versus the 8-bit precision of the Hailo-8.

“The majority of customers can work at 4-bit with accuracy close to floating-point models,” Danon said.

The previous-gen Hailo-8’s theoretical max is 26 TOPS at INT8 with the Hailo-10 coming in at around 20 TOPS at INT8. Why is Hailo tackling bigger models with less compute?

“It’s a different balance, because the memory access is much, much wider,” Danon said. “There is a little less on the TOPS side, but we are compensating for that on the architectural side.”

While the Hailo-8 already supported common transformer operators, Hailo-10 has improved the efficiency of these operators dramatically, Danon said.

“We have put a lot of emphasis on concurrency and multi-tasking, since many people are looking to do many tasks in parallel on the same device, not just, say, object detection and LLM, it’s a combination,” he said. “We’ve invested a lot in optimizing the pipelines and how the core architecture handles this transition smoothly.”

Vision traction

Hailo also raised an additional $120 million in an extension of its Series C funding, bringing the total raised to $344 million.

The additional capital will be used to invest in both the Hailo-10 and the Hailo-15 product lines, Danon said.

“The Hailo -15 is getting great traction from the AI vision side, both from the analytics perspective as well as image enhancement, super resolution, low light denoising, AI based HDR…these applications we are seeing proliferate to AI PCs, so everything is getting mixed together.”

The funding will also be used to support customers.

“We have over 300 customers, so lots of customer support [is needed],” Danon said. “This includes updating our software on a very frequent basis, adding support for things like genAI and more specific applications that customers are asking us to support and help them with.”

“And we are always working on next silicon,” he added.

Chinese automotive

While Hailo has had automotive on its roadmap since the start, this has always been a difficult segment to reach for chip startups. The Hailo-8 was recently selected, alongside the Renesas R-Car SoC for Chinese tier-1 iMotion’s iDC High domain controller, which will be deployed later in 2024 by a Chinese automotive OEM. IMotion is developing both the hardware and software stack for this domain controller module. Hailo will offload the “heavy-duty” AI from the main SoC.

The latest petaOPS processors are expensive, and cost is critical, Danon said.

“For the mass market, [petaOPS] are not needed,” he said. “The art is to bring the [capabilities] you need, to make them affordable and low power, otherwise you have another layer of reliability and affordability. [You want] something that can be bought in a standard passenger car, the Corollas of the world, not the Lexuses. The interesting part [of the market] is the Corollas.”

Are Chinese automotive OEMs moving faster on AI than their Western counterparts?

“Absolutely,” Danon said. “I am expecting a reverse in technology flow direction, where we see innovation generally happening in Asia, specifically China, but not only…this is a very interesting change from my perspective, things are happening for real, real products, real capabilities at a very quick pace.”

The Hailo-10 is sampling now and is due for general availability next quarter.