Artificial Intelligence (AI) is often framed as the domain of a few elite companies. But a closer look reveals a massive gap between those who actually build AI and those who put wrappers on it, train it, and resell it.

The actual numbers for who builds AI are shockingly small. Only a small number of companies, primarily five, contribute to open-source AI at scale: Meta, Google, Nvidia, Microsoft, and OpenTeams. In saying so, I also want to point out smaller and significant contributions from companies like Stability AI, EleutherAI, and Mistral, who all make meaningful code updates and changes despite their smaller size.

Because open source AI is free to download doesn’t mean it is free to operate. Like adopting a puppy, you still have non-trivial startup costs and maintenance. However, these expenses are not what industry developers have traditionally suggested and are shockingly affordable for most enterprises.

These economic factors combine to produce an AI industry with barriers ironically related to human development and not compute. Just as the best time to plant a tree is 20 years ago, the best time to create new AI developers was a long time ago. It will take a generation to produce adequate supply for the field due to lengthy training requiring PhD level know-how in mathematics, statistics, and computer science plus years of coding experience.

On the technical input side of the AI building equation, training costs are collapsing and with an increasingly high-quality library of pre-trained, open source models already available, everyone can have their own AI. Indeed, those who don’t will be at the mercy of those who do.

Shareholders and voters will not buy the story that giving away their competitive and personal data to AI oligarchs is economically necessary or valuable. Communal LLMs are economically inefficient, epistemically noisy, and strategically reckless for any enterprise or government that values performance, accuracy, or control of its intellectual property. The privacy risks inherent in communal models are real. The future of AI will be enterprises owning their open source AI to save costs, provide accurate answers, and protect their intellectual property. This is the efficient market solution preferred to communal LLMs sucking massive power to untangle spurious correlations from superfluous data.

To be sure, black-box AI SaaS providers will still have a business model as “drivers” of AI as opposed to those who build and deploy it. However, that economic pie will be smaller just as F1 drivers still do financially well, but nowhere near the economic returns earned by those who build cars. The upstream builders of AI will always have an advantage when deploying and training models because of the advanced know-how they possess to optimize AI and fully take advantage of its capabilities.

The AI Industry Pyramid

Below is a visual representation of the AI workforce, from elite maintainers to the broader ecosystem of users:

Detailed Platform Stats

1. PyTorch
– GitHub contributors: ~3,900+
– Monthly active contributors: ~250
– Core Maintainers: 80
– Users: ~4–5 million
– Lead Developer: Meta AI
– Source: https://github.com/pytorch/pytorch

2. TensorFlow
– GitHub contributors: ~3,500+
– Monthly active contributors: ~200
– Core Maintainers: 70
-Users: ~2.5–3.5 million
– Lead Developer: Google
– Source: https://github.com/tensorflow/tensorflow

3. Hugging Face
– GitHub contributors: ~2,600+ (Transformers repo)
– Monthly active contributors: ~150
-Core Maintainers: 50
– Users: ~1.5–2.5 million
– Source: https://github.com/huggingface/transformers

Global Ecosystem of Users

AI/ML Engineers (1.2M – 1.8M): Full-time professionals training and deploying models using programs provided by open source
Academic Users (800K – 1.2M): Researchers, grad students, and educators who also train and deploy models
Indie Devs & Hobbyists (400K – 700K): Kaggle participants, open-source users, startup tinkerers
Enterprise Software Devs (3M – 5M): Developers integrating AI into enterprise apps

Industry Impact

• 75%+ of Fortune 500 companies use open-source AI directly or via cloud providers
• Hugging Face Hub sees millions of downloads monthly (https://huggingface.co)
• GitHub’s Octoverse consistently ranks PyTorch and TensorFlow among the top OSS projects (https://octoverse.github.com/)

Top AI Companies: Builders vs. Train & Deployers

Open Source Corporate Builders:
• Meta (PyTorch, Llama)
• Google (TensorFlow, JAX)
• Microsoft (ONNX, DeepSpeed)
• NVIDIA (CUDA, cuDNN, Triton)
• OpenTeams (Nebari, PyTorch, TensorFlow)
Note: these companies can also train and deploy models

Open Source Train & Deploy Companies:
• Amazon, Apple, Salesforce, Oracle, Palantir, Snowflake, Databricks, C3.ai, SAP, OpenAI, Anthropic, xAI
Note: both Mistral and Anthropic claim to have some internal building capabilities, but as closed-box systems it cannot be confirmed

The Collapsing Cost of Training

Availability and Complexity of Open Source LLMs

Sources

• PyTorch contributors: https://github.com/pytorch/pytorch
• TensorFlow contributors: https://github.com/tensorflow/tensorflow
• Hugging Face contributors: https://github.com/huggingface/transformers
• Developer counts from GitHub Octoverse, Stack Overflow Developer Survey
• Enterprise stats from AWS, Azure, GCP, LinkedIn data

Author: Joe Merrill

I'm a VC in Austin, TX. View all posts by Joe Merrill

The Shocking Economics of AI

The AI Industry Pyramid

Detailed Platform Stats

Global Ecosystem of Users

Industry Impact

Top AI Companies: Builders vs. Train & Deployers

The Collapsing Cost of Training

Availability and Complexity of Open Source LLMs

Sources

Author: Joe Merrill

Leave a comment Cancel reply

The AI Industry Pyramid

Detailed Platform Stats

Global Ecosystem of Users

Industry Impact

Top AI Companies: Builders vs. Train & Deployers

The Collapsing Cost of Training

Availability and Complexity of Open Source LLMs

Sources

Shamelessly Share this:

Related

Author: Joe Merrill

Leave a comment Cancel reply