🔥 Discover this insightful post from TechCrunch 📖
📂 **Category**: AI,Exclusive,General COmpute,SambaNova
✅ **What You’ll Learn**:
The growing demand for computers to run AI models has accelerated, but there are two major hurdles anyone in this field must overcome: getting the right chips, and getting them into data centers where they can start generating revenue.
General Compute, a new inference startup — a company that rents AI processing power, specializing in the phase where models are run and respond to users rather than trained — has answers to those questions that highlight the direction the AI ecosystem is headed. These answers helped it raise a $15 million seed round at a post-cash valuation of $60 million, led by FUSE VC with participation from Carya Venture Partners and Village Global Ventures.
First, what is the correct slide? Demand for GPUs has reached a ceiling, but it has become conventional wisdom that they are not the most suitable chips for running AI models once they have been trained. The phase of AI in which the model actively generates responses has different computational requirements than training, and a new class of chips is being designed specifically for it. Nvidia’s $20 billion Groq deal in December and Cerebras’ $57 billion IPO last week point the way.
With capabilities strained at those two companies, General Compute’s co-founders, CEO Finn Puklowski and CTO Jason Goodison, saw another option. They’re turning to specialized chips made by SambaNova, an Intel-backed chipmaker focused on heuristics that has fallen slightly out of the Silicon Valley conversation.
That may change when SambaNova launches its new chipsets this year. The architecture is more flexible and uses more memory to store context during inference calculations, and SambaNova claims that it not only outperforms GPUs, but also other specialized chips designed by the likes of Groq or Cerebras. Poklosky says the new chips will generate between 600 and 700 codes per second, versus about 250 codes per second for GPUs.
General Compute has an order for the company’s $300 million SN50 chips, which it says will be the first new cloud it deploys.
These chips also help solve the second big problem — where to put them — for general computing: They are air-cooled, not water-cooled, and consume less power, so they can be installed in existing data center facilities without requiring new investments in infrastructure.
Puklowski is seeking to do colocation deals — arrangements in which General Compute installs its hardware at someone else’s facility — not just with data center providers, but also with cryptocurrency miners looking to reuse their infrastructure because the cost of producing bitcoin has often outpaced its price.
General Compute launched its cloud offering last week, claiming it’s already the fastest running MiniMax 2.7, a powerful, open source LLM software.
Joe Hasselman is a venture capitalist who got in on the ground floor of the inference boom when he invested in Groq in 2021. This year he launched a new fund, Evercrest Capital Partners, focused on the AI space, and made General Compute his first investment. In SambaNova’s partnership with General Compute, Hasleman sees similarities to CoreWeave’s relationship with Nvidia — and pairing Groq’s chipmaking with its previous cloud offerings.
“They need a healthy mix of customers who will put their chips in environments that will deliver high growth,” Hasleman said. “As much as General Compute is betting on SambaNova, SambaNova is betting on General Compute.”
The question is what type of computer architecture will have the most value in the future of artificial intelligence. Heuristic clouds are implicit bets on a world of multiple models and agents, where no single provider dominates, and the speed and cost of heuristics become the key competitive variables. Consider the $113 million Series B raised for OpenRouter this week, reflecting the company’s ability to provide customers with access to multiple models in order to optimize their premium spending.
Speed matters in this calculation, relative to price and capacity. Poklosky wants to turn hour-long workloads for coding agents into five- or 10-minute tasks, creating voice agents to serve customers, which require faster inference to speak effectively and more economically.
“If you’re using ChatGPT and it’s giving you 50 tokens per second, that’s still a lot faster than we can read,” Pokloski told TechCrunch, “and now that things have moved to agent-to-agent, where agents are reading on our behalf or pinging databases, they need to move faster.”
When you buy through links in our articles, we may earn a small commission. This does not affect our editorial independence.
⚡ **What’s your take?**
Share your thoughts in the comments below!
#️⃣ **#research #computing #revealed #Cerebras**
🕒 **Posted on**: 1779976766
🌟 **Want more?** Click here for more info! 🌟
