Restore Postgres Bm25 Topk Search

in progresstasks5/29restore-postgres-bm25-topk-search

openspec/changes/restore-postgres-bm25-topk-search/View on GitHub →

Artifacts

Official change artifacts tracked under openspec/.

The reference Postgres lexical search path is now fast and honest, but broad queries can still rank only a bounded candidate window. That is an interim quality compromise: callers can tell recall is bounded, but the implementation can still miss a better lexical match outside the window.

Design

The reference implementation currently supports SQLite lexical retrieval through FTS5 bm25() and Postgres lexical retrieval through a derived lexicalsearchindex table with generated tsvector and tsrankcd. Recent Postgres acceleration work made broad search reliable by adding scoped GIN indexing, bounded fan-out, and bounded lexical candidate windows. Recent recall-disclosure work made that compromise honest by returning meta.count_accuracy and meta.recall.

Tasks (5/29)

Spec Deltas (2)

Affected capabilities

Capability specs this change proposes to modify.

Lexical Retrieval

Lexical retrieval responses SHALL report exact recall only when the active backend ranks the full grant-authorized lexical match set before pagination. A backend that applies a bounded candidate window, approximate prefilter, or any unproven pre-ranking truncation SHALL report incomplete recall using meta.recall.rankingscope: "candidatewindow" or "unknown" as appropriate.

lexical-retrieval

Reference Implementation Architecture

The reference implementation SHALL treat any pg_search / ParadeDB BM25 lexical backend as an optional Postgres runtime capability. The default Postgres lexical backend SHALL remain the native scoped FTS path unless configuration explicitly enables the BM25 backend and startup proves the required extension and index are usable. If the BM25 backend is disabled, unavailable, not ready, or fails at query time, the reference SHALL fall back to native Postgres lexical retrieval without changing the public /v1/search response shape.

reference-implementation-architecture