Serves 100+ open-source models (Llama, Mistral, DBRX, etc.) via a fast, cost-effective API. Optimized with custom kernels, FlashAttention, speculative decoding for low-latency serving.
LoRA and full fine-tuning support. Users upload datasets and get custom-tuned models served on Together infrastructure. Supports RLHF, DPO, and SFT workflows.
1.2 trillion token open training corpus -- one of the largest fully open pretraining datasets. Includes Common Crawl, Wikipedia, GitHub, ArXiv, books, StackExchange. Enables reproducible LLM training.
Managed GPU clusters (NVIDIA H100/A100) for customers who need to pretrain or fine-tune models from scratch. End-to-end training infrastructure with their Composer library.
Together AI sits at the intersection of open-source AI and infrastructure -- understanding how to efficiently train and serve models at scale is foundational for building any specialized AI system. Exposure to 100+ model architectures, the RedPajama dataset work, and custom fine-tuning pipelines directly maps to eventually training a specialized investment AI. SF-based with strong growth trajectory.
Open-source MoE model: 132B total parameters with 36B active parameters, using 16 experts per layer with top-4 routing. Competitive with Llama 2 70B and Mixtral at lower inference cost due to MoE sparsity.
End-to-end model training platform acquired for $1.3B. The Composer library provides distributed training with automatic mixed precision, FSDP, efficient data loading, and curriculum learning. Used to train DBRX and customer models.
Industry-standard open-source ML lifecycle management: experiment tracking (log metrics, params, artifacts), model registry (versioning, stage transitions), model serving. Used by thousands of companies worldwide.
Unified governance layer for data and AI assets across clouds. Manages access control, lineage tracking, and auditing for tables, ML models, and feature stores in a single catalog.
Databricks is uniquely positioned at the intersection of massive data infrastructure and frontier AI. The Mosaic ML acquisition brought serious model training expertise in-house. For building an AI investment system, the ability to process vast amounts of financial data (lakehouse) and train custom models (Mosaic/Composer) on that data is exactly the stack needed. DBRX's MoE architecture is directly relevant to efficient specialized models. Very strong compensation at L6.
Singular mission: build safe superintelligence. No products, no commercial distractions. Deliberately avoiding commercial pressure to focus on the hardest problem in AI.
Ilya's phrase for SSI's strategy: scale models toward superintelligence without the pressure of shipping products or quarterly revenue targets. Focus on getting the fundamentals right before deploying anything.
Research into adversarial evaluation of AI systems to identify failure modes before deployment. Exploring novel cognitive architectures that may go beyond standard Transformer scaling.
Partnership with Google Cloud for access to TPU infrastructure. Suggests large-scale training runs with Google's custom AI accelerators rather than NVIDIA GPUs.
SSI represents the purest possible bet on AGI/ASI. Ilya Sutskever is arguably the single person who has had the most impact on modern deep learning. If SSI succeeds in building safe superintelligence, everything changes -- including investment analysis. The $32B valuation on a ~40-person team with no product shows the market's conviction in Ilya. Being part of this team would be career-defining. Palo Alto HQ is Bay Area accessible.
Domain-specific AI assistants for enterprise verticals: finance, marketing, and sales. Not generic chatbots -- purpose-built AI that understands specific business domains deeply and can act autonomously on tasks within those domains.
Trains custom foundation models from scratch tailored to enterprise domains, then refines them with human feedback loops from domain experts. The approach combines the architectural expertise of the Transformer inventors with domain-specific data and expert reinforcement.
Ashish Vaswani was the lead author (first author) on the "Attention Is All You Need" paper that introduced the Transformer architecture. This is the person most directly responsible for the architecture powering GPT, Claude, Gemini, and every modern LLM.
Essential AI is founded by Ashish Vaswani -- literally the person who invented the Transformer. Working with the inventor of the architecture that powers all of modern AI would provide unparalleled learning. Their "Enterprise Brain" concept for finance is directly relevant to building an investment AI. The domain-specific foundation model approach (custom models + human feedback) is exactly how you would build an AI Warren Buffett. SF-based, early stage with significant equity upside.
Users create and interact with AI characters -- from fictional personas to educational tutors to creative collaborators. Models are specifically fine-tuned for in-character consistency, personality maintenance, and engaging multi-turn dialogue.
After Noam Shazeer and Daniel De Freitas returned to Google in 2024 as part of a licensing deal, Character AI shifted from proprietary models to using Meta's Llama as its base model. Fine-tuning on top of Llama for character-specific behavior and personality.
One of the most popular consumer AI products by engagement metrics. Users spend significant time in conversations -- average session lengths far exceed typical chatbot interactions. Demonstrates product-market fit in conversational AI.
Character AI demonstrated massive consumer adoption of conversational AI (20M MAU). Now running on Llama (Meta's model), which creates a direct connection point for Ravi. The expertise in fine-tuning for specific behaviors and personalities is transferable to building specialized investment AI personas. However, founder departures (Noam Shazeer to Google) change the technical leadership calculus significantly. Menlo Park location is Bay Area accessible.
Open-weight MoE models (Mixtral 8x7B/8x22B), Mistral 7B, Le Chat consumer chatbot. Strong team from Meta FAIR and DeepMind. Archived because headquarters is in Paris, France -- not Bay Area.
Command R/R+ enterprise LLMs, leading embedding/reranking models, Aya multilingual initiative. Founded by Transformer co-author Aidan Gomez. Archived because headquarters is in Toronto, Canada -- not Bay Area.
Jamba SSM-Transformer hybrid architecture (Mamba + attention), Jurassic models, enterprise AI. Co-founded by Stanford Prof. Yoav Shoham and Mobileye founder Amnon Shashua. Archived because headquarters is in Tel Aviv, Israel -- not Bay Area.
Nature-inspired AI: model merging, evolutionary optimization, "The AI Scientist" for automated research. Co-founded by Transformer co-author Llion Jones and Google Brain's David Ha. Archived because headquarters is in Tokyo, Japan -- not Bay Area.