Open knowledge format: Google's AI memory standard

7 MIN READ

A minimal YAML spec that turns scattered wikis and PDFs into a structured, Git-trackable knowledge base that LLMs can query without hallucinating the details.

The context problem that nobody is solving is not about model size or token limits. It is about structure. Every company has documentation. Most of it is invisible to AI agents.

That wiki entry explaining how churn is calculated. The Slack thread where someone clarified the definition of an "active user." The PDF policy document from three years ago that still governs how specific datasets get classified. These documents exist. Agents cannot reliably use them.

The reason is not that LLMs are not smart enough. It is that the documents are formatted for human readers, not machine consumers. No consistent schema. No predictable metadata. No guaranteed structure that a retrieval pipeline can rely on.

Google Cloud's Open Knowledge Format (OKF) is an attempt to solve this at the specification level.

**What OKF actually is**

OKF v0.1 is a minimal, open-source specification for organizing human-curated knowledge into structured directories that AI agents can reliably query. It is not a database, not a framework, and not a vendor SDK. It is a format standard.

The core idea is simple: your internal documentation should live in a Git repository as plain Markdown files, each with a small YAML frontmatter block that classifies and tags the document. The rest is entirely up to the author.

A typical OKF repository looks like this:

``` my-ai-memory/ ├── catalog.yaml ├── metrics/ │ ├── monthly_active_users.md │ └── revenue_churn.md └── api_endpoints/ └── user_registration.md ```

**Anatomy of a file**

Each file in an OKF directory has two parts. The YAML frontmatter block at the top declares what the document is. The Markdown body below it contains whatever the author wants to write.

Here is what a real metrics file looks like:

```yaml --- type: business_metric title: Monthly Revenue Churn Rate tags: [finance, core-metrics] resource: bq://project.finance.churn_daily timestamp: 2026-06-12T00:00:00Z --- ```

```markdown # Revenue Churn Definition

Formula: Churn = (Lost ARR / Starting ARR) \* 100

## Data Caveats

- Excludes transactional test accounts. - Updated nightly at 03:00 UTC. ```

The frontmatter tells an indexer exactly what kind of document this is, what external dataset it references, and when it was last confirmed accurate. The body tells a human or an agent what the metric actually means in plain language.

**Only one rule**

The only mandatory field in an OKF file is `type`. Everything else is optional. That single field classifies the document into a category that indexing systems use to pre-filter before retrieval runs.

Optional fields extend the signal: `title` adds a human-readable name, `tags` create categorization arrays that embedding queries can match against, and `resource` links the document to a raw data source or system database. Nothing else is enforced.

This is the unusual part of the spec. Most context management frameworks overconstrain the format in an attempt to be comprehensive. OKF takes the opposite bet: enforce only what is necessary for interoperability, leave everything else to the author.

**The knowledge graph**

OKF files can reference each other using standard Markdown relative links: `[MAU](./monthly_active_users.md)`. When a machine parser follows these links, it builds a dependency graph automatically.

This means your `revenue_churn.md` file can link to the MAU definition it relies on. A retrieval pipeline querying for churn context can traverse the graph and pull in the linked MAU document too. You get connected, contextual retrieval without building a custom graph database.

The graph structure emerges from the links you would write anyway. You do not build it separately. You just use standard Markdown.

**Three principles**

The OKF spec is built on three design principles that distinguish it from heavier context frameworks.

The first is minimal opinion. OKF only enforces the metadata interoperability layer. The text content, internal structure, and style of each document are entirely at the author's discretion. A financial team can write bullet-point summaries. An engineering team can write formal API specifications. Both live in the same directory and get indexed by the same pipeline.

The second is producer-consumer independence. The system separates who writes documents from what consumes them. A file can be hand-authored, generated by a code pipeline, or scraped from a wiki endpoint. The consumer side (indexing engines, RAG systems, LLM tools) does not care. The format is the contract.

The third is vendor neutrality. OKF does not bind you to a specific database, agent framework, or model provider. The format runs on flat files. A Git repository is sufficient infrastructure. Any LLM, any RAG engine, any retrieval pipeline that can read text files can consume it.

**What the numbers show**

The performance case is measurable. Standard raw document chunking, the default approach of splitting PDFs and wikis into arbitrary text segments, reaches about 52% retrieval accuracy with average lookup times around 650 milliseconds.

Structurally organized OKF directories, where metadata pre-filtering eliminates irrelevant documents before semantic search runs, hit 91% accuracy with lookups under 180 milliseconds. The metadata acts as a first-pass filter that prevents the retrieval model from wading through noise.

**Connecting to MCP**

OKF directories integrate cleanly with the Model Context Protocol. You expose your OKF directory as an MCP server, and any compatible LLM client can query it natively using tool calls. The flow is three steps: Markdown source files in a Git repo, an MCP server that indexes and serves them, and LLM clients that query via standard tool calls.

Google Cloud's Knowledge Catalog provides a managed version of this. But because the format is open and vendor-neutral, you can build it on any MCP-compatible server, run it locally, or host it as a simple static site with a search layer on top.

**Why this matters**

Every organization already has the raw material for a useful AI knowledge base. The problem has never been a shortage of documentation. It has been the absence of a consistent format that both humans and machines can reliably use.

OKF solves the format problem without requiring a migration to a new platform, a new database, or a new vendor relationship. It is a Git repository. Every team already knows how to work in one.

If your team adopts OKF for internal documentation, you can version your AI's knowledge base alongside your code. New hires contribute to it the same way they contribute to a wiki. Pull request reviews become audits of what the AI knows. The entire context management problem becomes a documentation workflow problem, and teams already know how to run those.

That is a meaningful shift. Most AI context problems today require infrastructure teams and custom pipelines. OKF moves the problem back into the hands of the people who know the domain best: the teams who actually wrote the documentation in the first place.

Open knowledge format: Google's AI memory standard

Related Reads

Voice AI in 2026: where do we stand?

Speculative Decoding & DeepSeek DSpark

DwarfStar: run frontier AI on your own machine