AI-Washing: How to Spot Fake AI Companies (Hoffman’s Test)

Reid Hoffman called SpaceX 'not an AI company' and xAI a 'train wreck'. Here's how to apply his AI-washing test when picking tools or stocks.

Jordan West2026-06-248 min readBeginner

So how do you actually tell if a company calling itself “AI” is the real thing – or just slapping the label on something else? That question got a lot louder in late June 2026.

Reid Hoffman went on the Pioneers of AI podcast and dropped two lines that lit up Hacker News and tech Twitter: SpaceX “isn’t an AI company” and xAI is “a complete train wreck” for building foundational models (Fortune, June 24, 2026). The story is everywhere. But the useful part isn’t the gossip – it’s that Hoffman accidentally handed everyone a working AI-washing detector. This tutorial turns his critique into a 4-signal checklist you can run on any AI vendor, tool, or stock in about ten minutes.

The question hiding inside Hoffman’s quote

Most coverage frames this as Silicon Valley score-settling. Fair – Hoffman is an investor in both Anthropic and OpenAI (as of June 2026, per Fortune and Benzinga), so factor that bias in. But strip the names out and you’re left with a real diagnostic question: what makes a company genuinely an AI company versus one that bought the t-shirt?

That question has teeth. If you’re choosing a SaaS vendor, the answer determines whether their “AI features” will compound over time or stall. If you’re evaluating an IPO, it’s the difference between a moat and a marketing deck.

Why the usual “AI vendor checklist” falls short

Search “AI vendor due diligence” and you’ll find a hundred templates asking about SOC 2 reports, data-handling policies, and SLAs. Useful stuff. But none of it answers the prior question: is this even an AI company in the first place, or is it a hosting business, a consultancy, a wrapper with an LLM call bolted on the side?

That’s the gap Hoffman walked into. His framework – once you extract it from the podcast – is about structural evidence, not marketing claims. Four signals, ordered by how cheap they are to check.

The 4-signal AI-washing detector

Run these in order. If a company fails the first two, skip the rest.

Revenue source test – Where does the money actually come from? Models, APIs, and AI products? Or rent, hardware, services, and acquired subsidiaries?
Founder retention test – Are the people who built the core technology still there?
Acquisition pattern test – Is the AI strategy organic (built) or bolted on (bought)? How recently?
Benchmark presence test – Do their models show up in independent evals, or only in their own marketing?

Signal 1: Where does the revenue come from?

This is the one Hoffman pushed hardest. “You’re a premium-priced CoreWeave,” he told the podcast. “I get it. Which is not an AI company.” (Benzinga, June 2026.) Leasing GPUs to AI companies is a real estate business, not an AI business. SpaceX has been positioning revenue from leasing AI infrastructure – including to Anthropic – as validation of its AI credentials. Hoffman’s read: that’s landlord money with an AI tenant.

How to check it yourself: pull the most recent investor deck or 10-K. Look at revenue segmentation. If “AI” revenue is mostly compute leasing, professional services, or revenue from an acquired AI subsidiary, that’s the tell. Real AI revenue is API calls, model subscriptions, or product seats where the model is the product.

Signal 2: Did the founders stay?

Brutal signal. Underrated. All 11 of xAI’s original co-founders had departed by May 2026 – a cascade that began in February with Tony Wu’s resignation (Fortune, June 2026).

Watch out: Founder exits at AI labs are a leading indicator, not a lagging one. The departures usually show up 6-18 months before the public benchmark slide. By the time model rankings drop, the talent has often been gone for over a year.

How to check it yourself: LinkedIn the founding team. Cross-reference with the company’s about page from 18 months ago via the Wayback Machine. If more than half have left and the press releases all say “to pursue other opportunities,” you have your answer.

Signal 3: Built or bought?

Timing matters more than the acquisition itself. SpaceX went public on June 12, 2026 with AI central to its IPO narrative – then acquired Cursor, the AI coding tool, within days (Fortune/Benzinga, June 2026). Hoffman’s framing: “You could almost think of it as the IAC of AI… use the market cap to buy AI companies and try to buy your way into relevance.”

The diagnostic isn’t “did they acquire?” – plenty of real AI companies do. It’s “how close to the IPO, fundraise, or pivot was the acquisition?” Buy an AI product days after going public and there was no integration plan. The acquisition is the strategy.

Signal 4: Independent benchmarks

Turns out this is the easiest check of the four. Does the company’s flagship model show up on independent leaderboards like LMSYS Chatbot Arena, SWE-bench, or GPQA? Or only on slides in their own keynote? Grok models have faced persistent criticism for lagging behind competitors from Anthropic and OpenAI on benchmark performance (Fortune, June 2026) – and that’s not opinion, it’s data anyone can pull from the leaderboards directly.

Think of these four signals like symptoms in a differential diagnosis. One red flag doesn’t tell you much. Three red flags across independent categories – revenue, people, timing, benchmarks – and you’re not looking at bad luck. You’re looking at a pattern.

Worked example: SpaceX post-Cursor

Signal	SpaceX (post-Cursor, as of June 2026)	Verdict
Revenue source	Mostly compute leasing + launch services	Fail – infra, not AI
Founder retention	Original SpaceX founders intact, but no AI-founder lineage	N/A – no original AI team to retain
Built or bought	Cursor acquired days after IPO	Fail – bolt-on timing
Independent benchmarks	No first-party model on public leaderboards	Fail – relies on acquired/leased tech

Three fails, one N/A. You got there from public data in ten minutes, without needing the podcast.

Now the contrarian move: Hoffman also said Cursor “seems to have had its bright star some number of months ago and seems to be fading over the horizon” (Fortune, June 2026). Run the same 4 signals on Cursor itself before its acquisition – strong product revenue, founders present, organic growth, real benchmark presence on coding tasks. Two companies, same headline, opposite diagnostic profiles. The detector doesn’t care about narrative.

Four things to keep in your back pocket

Start with revenue. It’s the cheapest check and catches most AI-washing. Pull the income statement before reading the marketing site.
Weight the source. Hoffman’s framework is sharp, but he has skin in the game – he recently left Microsoft’s board to focus on his startup Manus and departed OpenAI’s board in 2023 over conflict-of-interest concerns (Benzinga, June 2026). Use his lens; don’t outsource your conclusion to him.
Watch the verb tense. “We’re using AI” can mean anything. “We trained a model on X data, here are the weights” is structurally different. Grammar tells you which side of the line a company is on.
The detector cuts both ways. Apply it to companies you like too. If your favorite “AI startup” fails three signals, enthusiasm doesn’t change the diagnosis.

One open question I haven’t resolved: where does NVIDIA land here? Almost all revenue is hardware. By signal 1, not an AI company. But nobody seriously disputes their AI credentials. Maybe the detector needs a fifth signal – something like “is the product an irreplaceable input to the AI stack?” I don’t have a clean answer. If you do, that’s worth a follow-up.

FAQ

Is this fair to SpaceX given they own infrastructure Anthropic uses?

Hoffman’s point isn’t that compute infrastructure doesn’t matter – it obviously does. Being the landlord to AI companies is just a different business, with different margins, risks, and moats. Conflating the two inflates valuations.

Can a company fail the detector now and pass later?

Yes – and this is the one worth watching closely. Say a company acquires an AI team today and fails the “built or bought” signal. In 18 months, if that team is still intact, shipping novel work, and showing up on independent benchmarks, the verdict flips. The detector is a snapshot, not a sentence. Re-run it every two quarters. A company moving from three fails to one fail is a better signal than one that’s been stuck at zero fails for three years – stagnation shows up here too.

Does this work for evaluating AI features inside non-AI companies?

Partially. Revenue and founder signals don’t apply to a feature inside a CRM. But signals 3 and 4 still hold: was the AI feature built or licensed from an API, and does the underlying model have independent benchmark presence? For most feature evaluations, that’s enough.

Next action: Pick one AI vendor your team is evaluating right now, or one AI-labeled stock in your portfolio. Run the 4 signals before end of day. Two or more fails means you have a conversation to have – with your procurement lead, your PM, or your broker.