How to get cited by ChatGPT — the five technical moves that actually move the needle

Q: Does llms.txt actually influence citation selection in production, or is it a polite hint the engines mostly ignore today?

Open question — discussed in the perspectives section of this article. No settled answer yet; see the linked references for current positions.

Q: How do you measure citation share when no major engine publishes reliable citation telemetry?

Open question — discussed in the perspectives section of this article. No settled answer yet; see the linked references for current positions.

Q: What is the right opt-out posture for high-value proprietary content — block GPTBot or accept citation in exchange for visibility?

Open question — discussed in the perspectives section of this article. No settled answer yet; see the linked references for current positions.

Q: Does named-expert authorship matter equally for B2B and consumer queries, or is it weighted differently per vertical?

Open question — discussed in the perspectives section of this article. No settled answer yet; see the linked references for current positions.

Q: When citation goes mainstream and every domain ships these five moves, what is the next moat — schema density, recency, or authority?

Open question — discussed in the perspectives section of this article. No settled answer yet; see the linked references for current positions.

[ THE SHORT VERSION ]

Half of the buying journey now happens inside ChatGPT, Claude, Perplexity, and Google's AI Overviews. If your brand is not cited there, a meaningful chunk of intent is invisible to you. The good news: the technical mechanics that move the needle are concrete, finite, and cheap to ship.

Most of the writing on "ranking in ChatGPT" mistakes the mechanics. It treats GEO like a re-skin of SEO. It is not. The engines pull from four distinct lanes — live web index, declared entity surface, training data, and partner feeds — and each move you make hits a different lane. Knowing which lane matters changes what you build.

This piece is the technical checklist. Five moves, ordered by return on engineering effort, with the anti-pattern at the end so you do not chase it.

[ FIGURES ]

Figure 1 · Where a citation comes from

When a user asks an AI engine a question, the engine grounds its answer in one of four lanes. As an operator you control lanes 1 and 2 directly, lane 3 slowly through being-cited-by-others, and lane 4 indirectly through search-engine relationships.

Figure 2 · Effort vs payoff — the five moves

Moves 1 through 3 are quick wins — cheap to ship, high citation payoff. Moves 4 and 5 are strategic bets — they take longer to compound but they are the only moves that earn citation in untouched query space. Influencing training data directly is a sunk cost; do not chase it.

[ EXPLANATION ]

Move 1: deploy llms.txt at the root of your domain. The proposal at llmstxt.org ^[1] specifies a plain-text declaration file that AI engines crawl to understand your entity surface — who you are, what you offer, what content matters. We have seen citation surface respond within four to eight weeks of deployment in production. Cost: one file. Hosting: same as robots.txt. There is no reason a domain that wants to be cited does not have this.

Move 2: ship Schema.org coverage on every route. Organization on the root, Service on each capability page, Article on every post, FAQPage on the routes that answer questions, Review and AggregateRating when you have verified data. The vocabulary ^[2] is well documented and Google's Structured Data Testing tool catches errors in minutes. Engines parse this BEFORE they parse your prose — clean schema is the foundation that everything else compounds on.

Move 3: allow the bots in robots.txt. GPTBot ^[3], ClaudeBot ^[4], PerplexityBot ^[5], Google-Extended ^[6]. Each engine publishes its crawler name and the opt-in mechanism. Most domains we audit are silently blocking one or more of these — usually GPTBot because the default robots.txt template treated it like a generic scraper. Opt in explicitly. If you have proprietary content you do not want cited, gate it behind auth, not behind robots.txt.

Move 4: name the expert and date the content. Add a Person schema ^[7] author with a verifiable role, publish the datePublished and dateModified fields, link to the author's LinkedIn or another sameAs profile. The engines weight named expert authorship heavily because it gives them a citation anchor that survives quoting. Anonymous brand-voice content rarely surfaces in answers; named operator content does.

Move 5: write with specific numbers and primary sources. Engines prefer to quote sentences that have specific quantities, dated facts, or named cites. "Most teams" is invisible; "17 agents in three weeks" gets quoted verbatim. The piece you are reading is structured this way intentionally — every paragraph aims for a quotable line, every claim points at a primary source.

The anti-pattern at the bottom of the chart: trying to "game training data" by publishing thousands of low-quality pages or buying citation links. Training data is updated on the model providers' cadence, not yours, and the engines weight the open web cleanup pass that runs before each cutoff. Money spent here is mostly wasted; the five moves above compound faster.

[ PERSPECTIVES ]

Camp A — GEO is just SEO 2.0

Citation lives downstream of indexing. Clean Schema.org and authoritative content rank well on Google AND get cited by Claude. Build the same technical foundation, the rest follows.

Camp B — Only training data matters

Some practitioners argue live retrieval is a small fraction of citations and the real money is influencing what shows up in the next training cutoff. Ship anchor content that ranks well in 2026 so it is in the 2027 training set.

Camp C — Wait for the engines to stabilise

Citation mechanics are still volatile. Google's AI Overviews launched, contracted, expanded. ChatGPT search rolled out, paused, re-rolled. Some argue: ship Schema and llms.txt now, defer the content investment until 2027.

Where we land

A and B are not in conflict — moves 1 through 5 above hit both lanes. C is the most expensive position to hold. Citation share is being decided right now; the operators investing in 2026 will sit at the top of the citation graph for the next decade.

[ OPEN QUESTIONS ]

01Does llms.txt actually influence citation selection in production, or is it a polite hint the engines mostly ignore today?

02How do you measure citation share when no major engine publishes reliable citation telemetry?

03What is the right opt-out posture for high-value proprietary content — block GPTBot or accept citation in exchange for visibility?

04Does named-expert authorship matter equally for B2B and consumer queries, or is it weighted differently per vertical?

05When citation goes mainstream and every domain ships these five moves, what is the next moat — schema density, recency, or authority?

[ REFERENCES ]

[1]llms.txt — proposed plain-text entity declaration for LLM-friendly sites.
Verify Archive
[2]Schema.org — official structured-data vocabulary used by every major search and answer engine.
Verify Archive
[3]OpenAI — GPTBot crawler documentation and opt-in instructions.
Verify Archive
[4]Anthropic — ClaudeBot and Claude-User crawler documentation.
Verify Archive
[5]Perplexity — PerplexityBot crawler documentation.
Verify Archive
[6]Google — Google-Extended crawler controls for generative AI training opt-out.
Verify Archive
[7]Schema.org — Person type for named-expert authorship markup.
Verify Archive

How to get cited by ChatGPT — the five technical moves that actually move the needle

We do GEO + AEO + SEO as a monthly retainer.