Thinking Machines Lab Unveils Interaction Models, Secures NVIDIA Partnership and $2B Seed Round

Thinking Machines Lab Unveils Interaction Models, Secures NVIDIA Partnership and $2B Seed Round
Mira Murati's Thinking Machines Lab announced on May 11 a new class of AI architecture called "interaction models," designed to handle real-time multimodal interaction natively rather than through external scaffolding. The announcement comes alongside news of a multi-year strategic partnership with NVIDIA to deploy at least one gigawatt of next-generation Vera Rubin systems and a $2 billion seed round valuing the startup at $12 billion.
The interaction models represent a departure from current approaches to multimodal AI. Rather than bolting conversation, vision, and audio capabilities onto transformer architectures through additional layers or external tools, Thinking Machines trained these models from scratch to process audio, video, and text streams simultaneously while responding in real time. The company claims state-of-the-art combined performance in intelligence and responsiveness, though specific benchmark comparisons were not disclosed in the initial announcement.
Technical Architecture and Capabilities
According to the company's research preview, interaction models handle continuous input streams across modalities without the latency penalties typically associated with multimodal processing pipelines. Traditional approaches often require tokenization steps, cross-modal attention mechanisms, or sequential processing that introduces delays between input and response. Thinking Machines' architecture appears designed to eliminate these bottlenecks at the model level rather than optimizing around them.
The real-time interaction capability addresses a persistent challenge in deploying conversational AI systems. Current multimodal models often exhibit noticeable delays when processing mixed audio and visual inputs, particularly in scenarios requiring rapid back-and-forth exchange. For enterprise applications requiring natural human-computer interaction—from customer service to collaborative design tools—response latency remains a significant adoption barrier.
Thinking Machines has made a research API called Tinker available to researchers, offering LoRA-based fine-tuning capabilities for open source models. The company indicated its first product will include significant open source components, suggesting a hybrid approach to commercialization that mirrors strategies employed by Hugging Face and Anthropic.
Strategic Partnerships and Funding
NVIDIA's partnership extends beyond traditional customer relationships to include a direct investment in Thinking Machines' long-term growth. The deployment commitment of at least one gigawatt of Vera Rubin systems signals substantial computational requirements for training and inference at scale. Vera Rubin represents NVIDIA's next-generation architecture following the H100 and H200 series, though detailed specifications remain under wraps.
The $2 billion seed round positions Thinking Machines among the most heavily capitalized AI startups in history. The $12 billion valuation reflects investor confidence in the team's execution capability, given Murati's track record at OpenAI and the technical credentials of the founding team, which comprises researchers who departed OpenAI following organizational changes in late 2023.
Looking at historical patterns, this funding trajectory resembles the capital intensity we observed during the cloud infrastructure buildout of the 2000s and early 2010s. The difference lies in the compressed timeline—companies like Amazon Web Services and Google Cloud Platform scaled their infrastructure investments over decades, while AI startups now require immediate access to massive compute resources to remain competitive. The gigawatt-scale deployment commitment suggests Thinking Machines anticipates training runs and inference workloads that exceed what most enterprises could economically justify on their own infrastructure.
Market Context and Technical Differentiation
The interaction models announcement comes as the industry grapples with the limitations of current multimodal approaches. OpenAI's GPT-4V, Google's Gemini, and Anthropic's Claude 3 all demonstrate impressive multimodal capabilities, but their architectures still reflect their text-first origins. Voice interfaces typically require separate speech-to-text preprocessing, and video analysis often involves frame sampling rather than continuous processing.
Thinking Machines' approach of training interaction models from scratch to handle real-time multimodal input represents a significant architectural departure. If the performance claims hold up under scrutiny, this could influence how major players approach their next-generation model development. The trade-offs between general-purpose transformer architectures and purpose-built interaction models will likely define competitive positioning over the next 18 to 24 months.
The company's commitment to open source components distinguishes it from purely proprietary approaches favored by most well-funded AI startups. This strategy could accelerate adoption among developers while building ecosystem effects around the core technology. However, balancing open source contributions with the need to capture value from massive infrastructure investments will require careful execution.
Research Publication and Industry Engagement
Thinking Machines has initiated a blog series called "Connectionism" with plans to frequently publish research findings, code, and technical documentation. The company's first post addressed creating AI models with reproducible responses—a persistent challenge in deploying language models for production applications where consistency matters more than creativity.
The research transparency approach reflects lessons learned from OpenAI's evolution from an open research organization to a more commercially-focused entity. By maintaining active research publication while building commercial products, Thinking Machines appears to be positioning itself to attract top-tier talent while contributing to the broader research community.
The interaction models announcement generated significant attention across technical communities, with Murati's social media post receiving over 250,000 views within hours of publication. This level of engagement suggests strong interest in alternative approaches to multimodal AI architecture among practitioners and researchers.
In my view, the convergence of massive funding, strategic partnerships with infrastructure providers, and a focus on architectural innovation rather than incremental improvements positions Thinking Machines as a serious contender in the next phase of AI development. The success of their interaction models approach could fundamentally reshape how we build and deploy conversational AI systems across industries.
The real test will come when the research preview transitions to production systems. Real-time multimodal interaction at scale involves not just model architecture but also edge deployment, bandwidth optimization, and integration challenges that often prove more difficult than laboratory demonstrations suggest.


