Lexical Retrieval Extension
Optional PDPP extension defining the public lexical retrieval surface at GET /v1/search.
Overview
The lexical retrieval extension defines a small, optional, discoverable, grant-safe public surface that lets applications and agents search records by text across the streams a caller is authorized to read. It is not part of core PDPP: implementations MAY expose it, and clients MUST NOT assume it exists unless the resource server explicitly advertises it (see Discovery).
The extension is intentionally lexical-only in v1. It does not expose semantic / vector retrieval, embeddings, body-DSL POST /v1/search, portable relevance calibration, or connector-specific search semantics — those are out of scope. See Non-goals.
For the long-form contract, see the canonical spec at openspec/changes/add-lexical-retrieval-extension/specs/lexical-retrieval/spec.md in the repo. This page is the developer-facing companion.
Authentication and Versioning
Same as the Data Query API:
Authorization: Bearer <access_token>
PDPP-Version: 2026-03-28Both client tokens (third-party apps holding a grant) and owner tokens (the resource owner performing self-export) are accepted. Per-mode behavior differs (see Owner-mode semantics) but the request shape is identical.
Request-Id is echoed in the response.
Endpoint
GET /v1/searchA dedicated cross-stream search endpoint. The reference's _ref/search is a separate, reference-only operator-jump helper for traces / grants / runs and is not the public lexical retrieval surface — the two share neither shape nor backing.
Query parameters
| Parameter | Type | Description |
|---|---|---|
q | string | Required. The lexical query. |
limit | integer | Page size. Default 25, max 100. |
cursor | string | Opaque pagination cursor from a previous response's next_cursor. Search cursors are not interchangeable with record-list cursors and not with changes_since values. |
streams[] | string (repeated) | Optional stream-scope narrowing. Omit to search every authorized stream that participates in the extension. See Owner-mode semantics for the per-mode meaning. |
Anything else is rejected with invalid_request_error. In particular:
connector_idis not a public parameter on this surface in v1. Owner-mode search fans out across all owner-visible connectors internally; each result carries the originatingconnector_idso clients can hydrate.filter[…],fields,expand[],expand_limit[…],order=,rank=,boost=, embedding/vector/semantic params, and connector-specific semantics are explicitly out of scope.
Result shape
{
"object": "list",
"url": "/v1/search",
"has_more": true,
"next_cursor": "<opaque>",
"data": [
{
"object": "search_result",
"stream": "messages",
"record_key": "msg_123",
"connector_id": "https://registry.pdpp.org/connectors/messaging-app",
"record_url": "/v1/streams/messages/records/msg_123",
"emitted_at": "2026-04-23T12:34:56Z",
"score": { "kind": "bm25", "value": -0.42, "order": "lower_is_better" },
"matched_fields": ["text"],
"snippet": { "field": "text", "text": "…overdraft fee…" }
}
]
}Required fields on every result
object: "search_result"streamrecord_keyconnector_id— required because the resource server scopes owner reads per connector. Even client-token callers receiveconnector_id(it mirrors the connector identity already encoded in the grant).emitted_atscore— when advertised, a typed implementation-relative lexical score. The reference emits{ "kind": "bm25", "order": "lower_is_better" }using SQLite FTS5 BM25 values.matched_fields— a non-empty subset of the stream's declaredquery.search.lexical_fieldsintersected with the caller's authorized fields.
Optional fields
record_url— when present, resolves to the canonicalGET /v1/streams/{stream}/records/{record_key}endpoint. For owner-token callers on a per-connector resource server (the reference today), the URL includes?connector_id=<canonical>.snippet— a{ field, text }pair drawn from amatched_fieldsentry. Implementations MAY omitsnippetper result. Snippet text never quotes ungranted field content — see Grant safety.
What is intentionally limited
- No portable relevance calibration. Scores are implementation-relative. Clients may use the advertised
kindandorder, but MUST NOT compare values across servers or implementation changes unless a later capability advertises stronger calibration. - No hydrated record payload. The extension returns candidate references; clients use the existing single-record read endpoint (or the
record_url) to hydrate.
Grant safety
For caller C and grant G, the extension searches only over (stream, field) pairs where:
streamis inG,fieldis readable underG's effective field projection forstream, ANDstreamdeclaresfieldin itsquery.search.lexical_fields.
Concretely:
- Streams outside the grant contribute zero hits.
- Fields outside the grant projection are never searched for the caller (no "filter-later" pattern).
matched_fieldsis a non-empty subset of the searchable ∩ authorized intersection.snippet.textcontains only substrings drawn from that intersection.- A stream whose searchable ∩ authorized intersection is empty contributes zero hits, and the response does not signal a per-stream error for that case.
Errors
Same error envelope as the Data Query API.
| Code | HTTP | When |
|---|---|---|
invalid_request | 400 | Missing q, unsupported v1 parameter (e.g. connector_id, filter[…], rank), or streams[] required because the server's advertisement reports cross_stream: false. |
grant_stream_not_allowed | 403 | Client tokens only. A streams[] entry names a stream not in the grant. |
invalid_cursor | 410 | Cursor refers to an expired or unknown snapshot. |
Owner-token streams[] is not a hard authorization check — naming a stream that no owner-visible connector exposes simply yields zero hits.
Owner-mode semantics
The reference implementation (and other resource servers that scope owner reads per connector) handles owner-token search as follows:
- The request shape is identical to client-token search. There is no public
connector_idparameter in v1. - The server fans out across every owner-visible connector internally and merges results.
streams[]is a soft filter: it narrows to a stream name shared across owner-visible connectors. Naming a stream that no owner-visible connector exposes yields zero hits, not an error.- Each
search_resultcarriesconnector_idso the caller can hydrate each hit through the correct per-connector owner read scope. record_url, when emitted, includes?connector_id=<canonical>so a plain GET against the URL hits the correct per-connector scope.
For client tokens, search is naturally scoped to the connector encoded in the grant; connector_id on results mirrors that grant identity.
Discovery
Server-level: extension advertisement
The extension advertises itself in the resource-server metadata document (RFC 9728) under a capabilities.lexical_retrieval block:
{
"resource": "https://example.com",
"...": "...",
"capabilities": {
"lexical_retrieval": {
"supported": true,
"endpoint": "/v1/search",
"cross_stream": true,
"snippets": true,
"default_limit": 25,
"max_limit": 100,
"score": {
"supported": true,
"kind": "bm25",
"order": "lower_is_better",
"value_semantics": "implementation_relative"
}
}
}
}When supported: true, all six base keys (supported, endpoint, cross_stream, snippets, default_limit, max_limit) are required. When score.supported: true, each result includes the typed score object. The advertisement is reachable without a bearer token.
A resource server that does not expose the extension SHALL omit capabilities.lexical_retrieval entirely or set supported: false. Clients MUST NOT assume /v1/search is available unless the advertisement says so.
Stream-level: query.search.lexical_fields
Each participating stream declares its searchable fields in its existing per-stream metadata (GET /v1/streams/{stream}):
{
"object": "stream_metadata",
"name": "posts",
"query": {
"search": {
"lexical_fields": ["title", "selftext"]
}
}
}v1 accepts only top-level scalar string fields declared in the stream's schema.properties. Nested paths, arrays, blob references, and unknown fields are rejected by the manifest validator. A stream that does not participate in lexical retrieval SHALL omit query.search entirely (there is no "search-aware but searches nothing" form).
The advertisement does not enumerate per-stream fields; clients discover them through the existing stream-metadata endpoint.
Pagination
?cursor=<opaque>Pagination is opaque. Cursors are not interchangeable with record-list (/v1/streams/.../records?cursor=…) or changes_since cursors. Within a single search session (same q, same streams[], same grant) cursoring is stable enough to avoid duplication and infinite loops; across server restart, snapshot expiry, or grant change the cursor MAY return invalid_cursor and the client recovers by issuing a fresh search.
The cursor format is implementation-defined — clients MUST treat it as opaque.
Ranking
Results are returned in relevance-oriented order. Higher-positioned results SHOULD generally be more relevant than lower-positioned results. The advertised BM25 score is implementation-relative and uses order: "lower_is_better" in the reference. The extension intentionally does not define portable score calibration, semantic reranking, recency blending, or per-connector custom weighting in v1.
Non-goals
Out of scope for v1; future extensions or revisions may address them separately:
- Semantic / vector retrieval.
- Embeddings or embedding versioning.
- Cross-connector entity resolution.
- Generic boolean / predicate query DSL.
- Connector-specific search semantics on the public surface.
- Portable score calibration.
- A
POST /v1/searchbody-DSL surface (reserved as a possible future extension). - Mandatory promotion of this extension to core PDPP.
See also
- Data Query API — the core record-read contract this extension complements.
- Semantic Retrieval Extension (Experimental) — a sibling experimental extension at
GET /v1/search/semantic. Unstable; use lexical retrieval when stability matters. - Approved spec:
openspec/changes/add-lexical-retrieval-extension/specs/lexical-retrieval/spec.md. - Implementation tranche:
openspec/changes/implement-lexical-retrieval-extension/.
Deferred Concerns
Issues identified during design and review that are intentionally out of scope for v0.1. Each item is named precisely so it can be referenced from the core spec and tracked for future versions.
Semantic Retrieval Extension (Experimental)
Experimental optional PDPP extension defining a semantic retrieval surface at GET /v1/search/semantic. Unstable.