Design — port the explore-timeline server foundation to main
Context
main already has the dual-dialect storage plumbing this port needs — the
storage-backend.js abstraction, isPostgresStorageBackend(), the records
table, the emitted_at column, and idx_records_lookup. What it lacks is the
semantic-time layer: the semantic_time column, its expression index, the
explore-timeline-substrate.ts merged-timeline queries, the rs.explore.timeline
operation, and the GET /_ref/explore/records route. Those live only on the
deploy branch, so the live fast feed runs on un-merged server code. This change
ports that layer onto main verbatim in contract.
The bucket-aggregate over-time-chart endpoint depends on this layer (the expression index and the substrate's scope plumbing). Per the Codex gate, when a migration is discovered necessary the build stops and re-gates; it was, so the foundation lands first as its own change and the bucket endpoint builds on it later. No bucket-aggregate requirement appears in this change.
Migration safety rationale
The whole point of this design is that the migration is safe to run against a live, multi-million-row records table at boot — no lock-bloat, no mass rewrite, no assumption of an empty database, and identical semantics across SQLite and Postgres.
1. semantic_time is an ADDITIVE column
The column is TEXT NOT NULL DEFAULT ''. Adding it is metadata-only on both
engines (no per-row rewrite, because the default is a constant empty string, not
a computed expression). It does not touch any existing column, index, or
constraint, and nothing reads it as authoritative until a real value is written.
2. The migration is IDEMPOTENT and does not assume an empty DB
- SQLite:
migrateRecordSemanticTimeColumn(raw)guards withhasTableColumn(raw, 'records', 'semantic_time')and only then runsALTER TABLE records ADD COLUMN semantic_time TEXT NOT NULL DEFAULT ''. The inlineCREATE TABLE IF NOT EXISTS records (...)schema is a no-op on a pre-existing table, so theALTER(not the table DDL) is what adds the column on an already-populated database. - Postgres:
ALTER TABLE records ADD COLUMN IF NOT EXISTS semantic_time TEXT NOT NULL DEFAULT ''. - Re-running the migration on a DB that already has the column is a no-op on both backends. The migration is correct whether the records table is empty or holds millions of rows.
3. Backfill is LAZY — no bulk UPDATE of records
- The read path COALESCEs:
COALESCE(NULLIF(semantic_time, ''), emitted_at). Existing rows keep'', so they fall back toemitted_atand the merged timeline sort is no worse than the prioremitted_atorder until real semantic values land. This is the honesty guarantee: pre-migration rows are not mis-attributed; they sort exactly as they did before. - New and re-emitted rows get a real
semantic_timeat write time via the upsert (ON CONFLICT(connector_instance_id, stream, record_key) DO UPDATE SET ... semantic_time = excluded.semantic_time). No batch job rewrites old rows in this change; a chunked per-record semantic backfill (Step B) is deliberately out of scope and can run later without changing this contract. - This keeps the boot migration
O(1)against the live table — the documented reason forDEFAULT ''over a computed default or a massUPDATE.
4. The keyset index is an EXPRESSION index matching the read ORDER BY EXACTLY
- The merged-timeline read sorts by
COALESCE(NULLIF(semantic_time, ''), emitted_at) DESC, record_key DESC. A plainsemantic_timeindex does not back that expression, so the planner would seq-scan + sort the whole records table on every page. The index is therefore an expression index over the exactCOALESCE(NULLIF(...))ordering key. - Created
IF NOT EXISTSand only after the column is guaranteed present (the inline schema block intentionally does not create it, because on a pre-existing table thesemantic_timecolumn may not exist yet and aCREATE INDEXreferencing it would fail withno such column). It is created in the post-migration index block. - Dual-dialect: SQLite
idx_records_semantic_time, Postgresidx_pg_records_semantic_time. Both index the same expression key. The Postgres expression index is verified viaEXPLAINto produce an Index Scan with no Sort on the hot path before any backfill.
5. Dual-dialect throughout
Every piece is implemented for both SQLite and Postgres: the column add, the
expression index, the upsert write, and the substrate read deps
(sqliteExploreTimelineDeps / postgresExploreTimelineDeps, dispatched by
buildExploreTimelineDeps()). The substrate speaks directly to storage and uses
only parameterized placeholders for values; column/table names come from the
fixed schema. Dual-backend substrate tests pin parity.
Merged-timeline contract (ported verbatim)
- k-way merge across all
(connector_instance_id, stream)partitions, no partition cap (every record reachable — a silent cap would violate the feed's full-visibility contract). - The composite cursor anchors to
MAX(id)(the monotonic ingest sequence) at first-page time, so records backfilled with an oldemitted_atafter the snapshot do not leak into already-returned pages; they are surfaced vianew_since_snapshotinstead. Membership/pagination stays anchored on the monotonic id; only the sort uses semantic time. - Paging forward yields strictly non-increasing semantic time with no duplicates;
each record carries both
connector_id(type) andconnector_instance_id(instance). The O(partition-count) cursor blob is held server-side behind a short opaque handle (URL-length / HTTP 431 avoidance), unchanged from deploy.
Alternatives considered
- Bucket on
emitted_aton main today (avoidsemantic_time/substrate entirely): rejected.emitted_atis ingest time, so the feed and any future chart would mis-attribute records to ingest dates rather than authored dates — an honesty problem (the chart would not match the feed's semantic-time grouping). Sub-SLVP; this is why the migration is necessary. - Bundle the bucket endpoint into this PR: rejected. Mixing foundation with a new feature violates the semantic-boundary rule; the bucket aggregate gets its own change on top of this foundation.
Direction follow-on
direction=asc is the server half of the Explore sort cell. It is a real
keyset traversal over the same semantic-time key, not a reversal of one loaded
page. The substrate flips the partition seek predicate and ORDER BY for both
SQLite and Postgres. The operation flips the k-way merge comparator and stores
the direction in the composite cursor, so every page of an oldest-first walk
continues ascending from the same snapshot. The existing nowCeiling clamp stays
in force in both directions; future records remain in the Upcoming projection.
Out of scope
- The bucket-aggregate / over-time-chart count endpoint (separate, later change).
- Any chunked per-record semantic backfill (Step B) of historical rows.
- Frontend consumption / chart wiring (separate frontend change).
Acceptance checks
- Both backends: adding the column is idempotent (re-running the migration is a
no-op) and works on a non-empty records table without a bulk
UPDATE. - Both backends: the expression index exists after migration; Postgres
EXPLAINof the merged-timeline read shows an Index Scan, no Sort. - A row with
semantic_time = ''sorts byemitted_at; a row with a realsemantic_timesorts by that value (COALESCE fallback proven). - New writes set
semantic_timevia the upsert; no historical row is rewritten. GET /_ref/explore/recordsreturns a merged cross-source feed ordered by semantic time with a composite/handle cursor; paging forward is strictly non-increasing, non-duplicated, and uncapped across partitions.- Dual-backend substrate conformance tests pass.
openspec validate port-explore-timeline-server-foundation --strictpasses.