Search is splitting: classic search (Google, Bing) and AI answer engines (ChatGPT, Perplexity, Claude, Gemini). Startups that only optimize titles and backlinks miss how LLMs choose what to cite.
This guide covers llms.txt, AI crawlers vs robots.txt, GEO without buzzwords, and a validation workflow. For technical basics, see the startup site health checklist.
What is llms.txt?
llms.txt is a voluntary convention - a small markdown file that tells AI systems:
- Which pages are canonical for your product story
- How you prefer attribution (name, URL)
- What to deprioritize (drafts, internal docs)
Common locations:
https://yoursite.com/llms.txthttps://yoursite.com/.well-known/llms.txt
It does not replace robots.txt, Terms of Service, or copyright law. Think of it as a courtesy map for LLM crawlers - similar in spirit to how sitemap.xml helps search crawlers.
Minimal example structure:
# Your Product Name
> One-sentence positioning.
## Docs
- [Search guide](https://yoursite.com/docs/search)
## Optional
- [Pricing](https://yoursite.com/pricing)
Needle publishes llms.txt and llm.txt as references - not as a standard you must copy verbatim.
robots.txt and AI user-agents
Many sites now see dedicated crawlers: GPTBot, Google-Extended, ClaudeBot, PerplexityBot, etc. Your robots.txt can allow or disallow them per path.
| Decision | When |
|---|---|
| Allow marketing + docs | You want AI citations for discovery |
| Disallow app/dashboard | Authenticated or user-data routes |
| Disallow after legal review | Regulated industries - consult counsel |
After any robots change: verify you did not block /, /pricing, or /sitemap.xml. Use the Google Indexing Checker and GEO analyzer.
GEO in practice (four pillars)
- Clear positioning - Who you help, in one sentence, above the fold.
- Quotable structure - H2/H3, lists, FAQs models can extract.
- Trust signals - Honest comparisons, pricing facts, updated dates.
- Entity consistency - Same name, domain, and description across pages and
OrganizationJSON-LD.
GEO does not excuse broken TLS or missing meta tags - fix site health first.
Pair GEO with conversation demand
AI models echo how the web talks about problems. That language often appears in Reddit and HN threads before it appears on your homepage.
Weekly habit:
- Run GEO analyzer after template changes.
- Scan Trending Problems for category phrasing.
- Update homepage FAQ with verbatim-style buyer phrases (paraphrased, not fabricated quotes).
See positioning from real phrases.
Validation workflow
- Run GEO & LLM Site Analyzer on production URL.
- Fix: missing llms.txt (if you chose to publish), broken OG, invalid JSON-LD.
- Re-run after launch, pricing change, or major blog pillar.
- Spot-check AI answers manually: search your product category in Perplexity/ChatGPT - are facts accurate?
What not to do
- Keyword-stuff hidden text for "AI" - crawlers and humans both punish it.
- Block all AI bots without understanding which agents your buyers use.
- Publish llms.txt pointing to 404 docs - worse than omitting the file.
- Ignore classic SEO - GSC still drives meaningful SaaS traffic.