Why We Publish llms.txt and What’s in Ours

What is an llms.txt and What’s in Ours

As large language models increasingly consume, summarize, and reason over web content without human style browsing, the way websites communicate intent has become a technical problem rather than a design one.

Navigation menus, internal linking, visual hierarchy, and even traditional SEO signals are optimized for humans and search engines. They are not optimized for systems that ingest content in fragments, out of order, or via secondary datasets.

The llms.txt convention exists to address this gap.

This article explains why llms.txt exists, what problem it is meant to solve, how it should be structured, and how we implemented it at EnterraHost as a real world, multi product example.

This is not an announcement. It is a reference.

The Context Problem LLMs Face

Large language models do not experience websites.

They do not scroll.
They do not infer hierarchy from layout.
They do not reliably traverse navigation structures.

Instead, they often encounter:

Individual pages
Extracted documentation files
Partial mirrors
Context stripped content blocks

When that happens, three common failure modes appear:

Product flattening
Multi feature tools are reduced to a single label, often incorrectly.
Scope confusion
Documentation is detached from the software it describes.
Authority ambiguity
Machines cannot easily distinguish official documentation from commentary or third party summaries.

llms.txt is an attempt to solve these problems with a single, explicit artifact.

What llms.txt Is Intended to Be

At its core, llms.txt is a machine oriented index of responsibility.

It answers a small number of important questions clearly:

Who maintains this site?
What software or services does it represent?
Where is the authoritative documentation?
What content should be treated as descriptive rather than promotional?

It does not attempt to control model behavior.
It does not replace sitemaps or robots.txt.
It does not guarantee ingestion or compliance.

Its value comes from clarity, restraint, and accuracy.

What llms.txt Should Not Become

As with any emerging convention, misuse is already common.

Based on early adoption, llms.txt should not be:

A marketing document
A feature list
A sitemap mirror
A keyword dump
A claim heavy positioning statement

Anything that requires interpretation or persuasion reduces its usefulness.

Machines need orientation, not arguments.

Why We Chose to Publish llms.txt

EnterraHost develops multiple software products across different platforms and use cases:

A WordPress and WooCommerce optimization plugin suite
A privacy focused CAPTCHA integration for the MyBB forum platform
A native duplication tool for WooCommerce products and WordPress content

Each product:

Solves a different problem
Has different users
Has different documentation requirements

Without an explicit index, it is easy for that ecosystem to be misunderstood when consumed programmatically.

We published llms.txt to provide a single canonical reference point that explains what we build and where the technical truth lives.

Design Constraints We Applied

Before writing the file, we imposed strict constraints.

No Marketing Language

If a sentence would not belong in technical documentation, it does not belong in llms.txt.

Explicit Product Naming

Each product is named once, clearly, with no aliases or branding variations.

Documentation First

Links point to explanations, not landing pages.

Stable URLs Only

Every link is expected to remain valid long term.

Scoped Descriptions

Each product is described in one sentence that defines function, not value.

The Structure of Our llms.txt

Our file is intentionally simple.

Header and Responsibility Statement

# Enterrahost
> Enterrahost develops Blue Raven (a WordPress/WooCommerce optimization plugin suite), Cloudflare Turnstile for MyBB bot protection for the MyBB forum platform, and Enterra Quick Clone (a one-click product and post duplicator for WooCommerce and WordPress).

This establishes ownership and scope immediately.

There are no adjectives, rankings, or claims that require interpretation.

Context Paragraph

Blue Raven combines AI product creation, SEO, analytics, media optimization, and site monitoring (EnterraMon) in a single plugin. Turnstile for MyBB provides privacy-friendly CAPTCHA protection for forums.

This paragraph exists to prevent misclassification.

Without it, Our Blue Raven WordPress plugin risks being reduced to “an SEO plugin,” which would be incomplete and misleading. by adding contextual llms.txt content your project, site, business etc may also be reduced to its basics or misrepresented by LLMs

Documentation Index

The documentation section is the most important part of the file.

Every link points to a focused, self contained technical document covering:

Core system behavior
Individual tool modules
Monitoring and analytics components
Bot behavior and robots.txt interaction
Structured data and SEO controls

Each document is narrow by design.

If an LLM reads only one of them, it should still understand what system it belongs to and why it exists.

Why We Broke Documentation Into Many Small Files

Instead of linking to a single documentation hub, we intentionally exposed individual documents.

This mirrors how both developers and machines reason about systems:

Smaller units are easier to classify
Purpose is clearer
Context leakage is reduced

Large, generic documentation pages are difficult to summarize accurately.

Focused documents age better.

What We Deliberately Excluded

Some content does not belong in llms.txt and was intentionally excluded:

Blog posts
Pricing pages
Changelogs
Marketing copy
Affiliate or comparison content

llms.txt should answer one question only:

“What is this site responsible for, and where is the authoritative explanation?”

Everything else is noise.

llms.txt and Discovery

Publishing llms.txt is only the first step.

Because the convention is still emerging, discovery currently happens through:

Community maintained directories
Specification adjacent sites
Developer curated lists

Documenting how llms.txt spreads is part of making it useful.

As adoption grows, discoverability mechanisms will evolve, but early clarity matters more than reach.

Where llms.txt Is Defined and Discovered

The llms.txt convention is documented and maintained at its canonical reference site, https://llmstxt.org, which defines the intent, format, and guiding principles behind the file. Because adoption is still early, discovery currently relies on a small number of community maintained directories that index published llms.txt files for reference and experimentation. At the time of writing, these include llmstxt.site and directory.llmstxt.cloud, both of which act as informal registries rather than authoritative gatekeepers. This ecosystem is evolving, and the long term value of llms.txt depends less on where it is listed and more on the clarity and accuracy of the information it exposes.

Static vs Generated llms.txt Files

A hand written llms.txt is sufficient for small, static sites.

For larger or evolving software projects, static files create a maintenance problem:

Documentation changes
New modules are added
URLs shift over time

A well structured llms.txt should reflect the current state of a system, not a snapshot from launch day.

This is not a tooling discussion, but a design consideration.

The format rewards correctness over cleverness.

llms.txt vs llms-full.txt

We treat llms.txt as an index, not an archive.

For deeper systems, an expanded llms-full.txt can exist alongside it to expose:

More exhaustive documentation
Extended architectural notes
Additional reference material

The concise file should always remain readable in under a minute.

If it cannot, it has failed its purpose.

Common Mistakes to Avoid

From early implementations across the web:

Treating llms.txt as an SEO opportunity
Linking to homepages instead of explanations
Using vague or inflated descriptions
Mixing marketing and documentation
Assuming brand recognition

This file will be read out of context.

Assume zero trust and zero familiarity.

How We Expect This File to Be Used

We do not assume universal adoption.

We assume:

Some models will read it
Some tools will reference it
Some documentation authors will cite it

That is enough.

The value of llms.txt is not in control, but in having a canonical answer available when context matters.

Automating llms.txt Without Turning It Into Marketing

One of the practical challenges with llms.txt is maintenance.

The file only remains useful if it accurately reflects the current state of a site’s technical documentation. On sites where documentation evolves, modules are added, or URLs change, a manually maintained file quickly becomes stale.

To address this problem without compromising the design principles outlined above, we released php-llmscan, an open source command line tool that generates llms.txt and its associated documentation index directly from an existing sitemap.

The goal of the tool is narrow by design.

It does not attempt to summarize an entire site.
It does not generate marketing copy.
It does not decide what a product is.

Instead, it enforces the same constraints a careful human author would apply.

How php-llmscan Works

The tool operates in three explicit stages:

Discovery
It parses a site’s sitemap and fetches each listed page using a declared user agent.
Qualification
Each page is evaluated to determine whether it contains technical documentation, such as feature explanations, configuration details, or factual system behavior. Marketing pages, legal content, blog posts, and general overviews are deliberately excluded.
Normalization
Qualified pages are converted into clean, neutral Markdown files intended for machine consumption. Promotional language, calls to action, and subjective claims are removed. Only factual descriptions and structured explanations remain.

From this output, the tool generates a single llms.txt file that acts as an index, linking to each documentation file with a short, factual description.

The result mirrors the structure recommended by llmstxt.org: a concise responsibility statement followed by a documentation index, nothing more.

Design Constraints Built Into the Tool

The implementation intentionally encodes the same rules described earlier in this article:

Documentation is favored over landing pages
Each page is described in a single factual sentence
No feature ranking or persuasive language is allowed
Links are expected to remain stable over time

The tool is designed to run on standard PHP hosting environments and does not require Python or external build systems. Configuration is explicit and local, including sitemap source, output paths, and AI provider selection.

Reference Implementation, Not a Requirement

php-llmscan is not a requirement for adopting llms.txt.

It exists as a reference implementation that demonstrates one way to scale the practice without violating its intent. Small or static sites can and should continue to publish hand written files. Larger systems benefit from automation only when it preserves correctness and restraint.

The tool’s source code is publicly available for inspection and adaptation:

https://github.com/enterrahost/php-llmscan

As with llms.txt itself, its usefulness depends entirely on how responsibly it is used.

Final Thoughts

llms.txt is still early.

Its usefulness will be determined less by specification changes and more by how responsibly it is used.

If you publish one:

Be precise
Be boring
Be honest
Prefer documentation over promotion

That is how it remains useful over time.

Reference Implementation

You can view our live llms.txt and documentation structure here:

https://enterrahost.com/llms.txt

If you are building software, plugins, or APIs and care about how machines interpret your work, this is a problem worth taking seriously.

We will continue to evolve our approach as the ecosystem matures.