What is an llms.txt and What’s in Ours
As large language models increasingly consume, summarize, and reason over web content without human style browsing, the way websites communicate intent has become a technical problem rather than a design one.
Navigation menus, internal linking, visual hierarchy, and even traditional SEO signals are optimized for humans and search engines. They are not optimized for systems that ingest content in fragments, out of order, or via secondary datasets.
The llms.txt convention exists to address this gap.
This article explains why llms.txt exists, what problem it is meant to solve, how it should be structured, and how we implemented it at EnterraHost as a real world, multi product example.
This is not an announcement. It is a reference.
The Context Problem LLMs Face
Large language models do not experience websites.
They do not scroll.
They do not infer hierarchy from layout.
They do not reliably traverse navigation structures.
Instead, they often encounter:
- Individual pages
- Extracted documentation files
- Partial mirrors
- Context stripped content blocks
When that happens, three common failure modes appear:
- Product flattening
Multi feature tools are reduced to a single label, often incorrectly. - Scope confusion
Documentation is detached from the software it describes. - Authority ambiguity
Machines cannot easily distinguish official documentation from commentary or third party summaries.
llms.txt is an attempt to solve these problems with a single, explicit artifact.
What llms.txt Is Intended to Be
At its core, llms.txt is a machine oriented index of responsibility.
It answers a small number of important questions clearly:
- Who maintains this site?
- What software or services does it represent?
- Where is the authoritative documentation?
- What content should be treated as descriptive rather than promotional?
It does not attempt to control model behavior.
It does not replace sitemaps or robots.txt.
It does not guarantee ingestion or compliance.
Its value comes from clarity, restraint, and accuracy.
What llms.txt Should Not Become
As with any emerging convention, misuse is already common.
Based on early adoption, llms.txt should not be:
- A marketing document
- A feature list
- A sitemap mirror
- A keyword dump
- A claim heavy positioning statement
Anything that requires interpretation or persuasion reduces its usefulness.
Machines need orientation, not arguments.
Why We Chose to Publish llms.txt
EnterraHost develops multiple software products across different platforms and use cases:
- A WordPress and WooCommerce optimization plugin suite
- A privacy focused CAPTCHA integration for the MyBB forum platform
- A native duplication tool for WooCommerce products and WordPress content
Each product:
- Solves a different problem
- Has different users
- Has different documentation requirements
Without an explicit index, it is easy for that ecosystem to be misunderstood when consumed programmatically.
We published llms.txt to provide a single canonical reference point that explains what we build and where the technical truth lives.
Design Constraints We Applied
Before writing the file, we imposed strict constraints.
No Marketing Language
If a sentence would not belong in technical documentation, it does not belong in llms.txt.
Explicit Product Naming
Each product is named once, clearly, with no aliases or branding variations.
Documentation First
Links point to explanations, not landing pages.
Stable URLs Only
Every link is expected to remain valid long term.
Scoped Descriptions
Each product is described in one sentence that defines function, not value.
The Structure of Our llms.txt
Our file is intentionally simple.
Header and Responsibility Statement
# Enterrahost
> Enterrahost develops Blue Raven (a WordPress/WooCommerce optimization plugin suite), Cloudflare Turnstile for MyBB bot protection for the MyBB forum platform, and Enterra Quick Clone (a one-click product and post duplicator for WooCommerce and WordPress).
This establishes ownership and scope immediately.
There are no adjectives, rankings, or claims that require interpretation.
Context Paragraph
Blue Raven combines AI product creation, SEO, analytics, media optimization, and site monitoring (EnterraMon) in a single plugin. Turnstile for MyBB provides privacy-friendly CAPTCHA protection for forums.
This paragraph exists to prevent misclassification.
Without it, Our Blue Raven WordPress plugin risks being reduced to “an SEO plugin,” which would be incomplete and misleading. by adding contextual llms.txt content your project, site, business etc may also be reduced to its basics or misrepresented by LLMs
Documentation Index
The documentation section is the most important part of the file.
Every link points to a focused, self contained technical document covering:
- Core system behavior
- Individual tool modules
- Monitoring and analytics components
- Bot behavior and robots.txt interaction
- Structured data and SEO controls
Each document is narrow by design.
If an LLM reads only one of them, it should still understand what system it belongs to and why it exists.
Why We Broke Documentation Into Many Small Files
Instead of linking to a single documentation hub, we intentionally exposed individual documents.
This mirrors how both developers and machines reason about systems:
- Smaller units are easier to classify
- Purpose is clearer
- Context leakage is reduced
Large, generic documentation pages are difficult to summarize accurately.
Focused documents age better.
What We Deliberately Excluded
Some content does not belong in llms.txt and was intentionally excluded:
- Blog posts
- Pricing pages
- Changelogs
- Marketing copy
- Affiliate or comparison content
llms.txt should answer one question only:
“What is this site responsible for, and where is the authoritative explanation?”
Everything else is noise.
llms.txt and Discovery
Publishing llms.txt is only the first step.
Because the convention is still emerging, discovery currently happens through:
- Community maintained directories
- Specification adjacent sites
- Developer curated lists
Documenting how llms.txt spreads is part of making it useful.
As adoption grows, discoverability mechanisms will evolve, but early clarity matters more than reach.
Where llms.txt Is Defined and Discovered
The llms.txt convention is documented and maintained at its canonical reference site, https://llmstxt.org, which defines the intent, format, and guiding principles behind the file. Because adoption is still early, discovery currently relies on a small number of community maintained directories that index published llms.txt files for reference and experimentation. At the time of writing, these include llmstxt.site and directory.llmstxt.cloud, both of which act as informal registries rather than authoritative gatekeepers. This ecosystem is evolving, and the long term value of llms.txt depends less on where it is listed and more on the clarity and accuracy of the information it exposes.
Static vs Generated llms.txt Files
A hand written llms.txt is sufficient for small, static sites.
For larger or evolving software projects, static files create a maintenance problem:
- Documentation changes
- New modules are added
- URLs shift over time
A well structured llms.txt should reflect the current state of a system, not a snapshot from launch day.
This is not a tooling discussion, but a design consideration.
The format rewards correctness over cleverness.
llms.txt vs llms-full.txt
We treat llms.txt as an index, not an archive.
For deeper systems, an expanded llms-full.txt can exist alongside it to expose:
- More exhaustive documentation
- Extended architectural notes
- Additional reference material
The concise file should always remain readable in under a minute.
If it cannot, it has failed its purpose.
Common Mistakes to Avoid
From early implementations across the web:
- Treating llms.txt as an SEO opportunity
- Linking to homepages instead of explanations
- Using vague or inflated descriptions
- Mixing marketing and documentation
- Assuming brand recognition
This file will be read out of context.
Assume zero trust and zero familiarity.
How We Expect This File to Be Used
We do not assume universal adoption.
We assume:
- Some models will read it
- Some tools will reference it
- Some documentation authors will cite it
That is enough.
The value of llms.txt is not in control, but in having a canonical answer available when context matters.
Automating llms.txt Without Turning It Into Marketing
One of the practical challenges with llms.txt is maintenance.
The file only remains useful if it accurately reflects the current state of a site’s technical documentation. On sites where documentation evolves, modules are added, or URLs change, a manually maintained file quickly becomes stale.
To address this problem without compromising the design principles outlined above, we released php-llmscan, an open source command line tool that generates llms.txt and its associated documentation index directly from an existing sitemap.
The goal of the tool is narrow by design.
It does not attempt to summarize an entire site.
It does not generate marketing copy.
It does not decide what a product is.
Instead, it enforces the same constraints a careful human author would apply.
How php-llmscan Works
The tool operates in three explicit stages:
- Discovery
It parses a site’s sitemap and fetches each listed page using a declared user agent. - Qualification
Each page is evaluated to determine whether it contains technical documentation, such as feature explanations, configuration details, or factual system behavior. Marketing pages, legal content, blog posts, and general overviews are deliberately excluded. - Normalization
Qualified pages are converted into clean, neutral Markdown files intended for machine consumption. Promotional language, calls to action, and subjective claims are removed. Only factual descriptions and structured explanations remain.
From this output, the tool generates a single llms.txt file that acts as an index, linking to each documentation file with a short, factual description.
The result mirrors the structure recommended by llmstxt.org: a concise responsibility statement followed by a documentation index, nothing more.
Design Constraints Built Into the Tool
The implementation intentionally encodes the same rules described earlier in this article:
- Documentation is favored over landing pages
- Each page is described in a single factual sentence
- No feature ranking or persuasive language is allowed
- Links are expected to remain stable over time
The tool is designed to run on standard PHP hosting environments and does not require Python or external build systems. Configuration is explicit and local, including sitemap source, output paths, and AI provider selection.
Reference Implementation, Not a Requirement
php-llmscan is not a requirement for adopting llms.txt.
It exists as a reference implementation that demonstrates one way to scale the practice without violating its intent. Small or static sites can and should continue to publish hand written files. Larger systems benefit from automation only when it preserves correctness and restraint.
The tool’s source code is publicly available for inspection and adaptation:
https://github.com/enterrahost/php-llmscan
As with llms.txt itself, its usefulness depends entirely on how responsibly it is used.
Final Thoughts
llms.txt is still early.
Its usefulness will be determined less by specification changes and more by how responsibly it is used.
If you publish one:
- Be precise
- Be boring
- Be honest
- Prefer documentation over promotion
That is how it remains useful over time.
Reference Implementation
You can view our live llms.txt and documentation structure here:
https://enterrahost.com/llms.txt
If you are building software, plugins, or APIs and care about how machines interpret your work, this is a problem worth taking seriously.
We will continue to evolve our approach as the ecosystem matures.