Reference Implementation Architecture

Created Jun 26, 2026openspec/changes/add-explore-merged-timeline/specs/reference-implementation-architecture/spec.mdView on GitHub →

ADDED Requirements

Requirement: Reference implementation SHALL expose a durable owner-session explore-records endpoint

The reference implementation SHALL expose GET /_ref/explore/records as a durable, owner-session-authenticated reference route that returns a page of time-ordered records merged across all of the owner's (connector_instance_id, stream) partitions. This route is a reference/operator surface, NOT a PDPP Core protocol route, and SHALL NOT be reachable over /mcp or with a grant-scoped token. Its response shape is designed exclusively for the console Explore canvas and clients MUST NOT depend on it as a stable external protocol.

Scenario: The endpoint is gated to owner sessions

WHEN a request to GET /_ref/explore/records lacks a valid owner session
THEN the reference SHALL reject it with a 401 response
AND defining this endpoint SHALL NOT make any explore-records capability reachable over /mcp or with a grant-scoped token

Scenario: First-page response carries a snapshot anchor and merged record list

WHEN an authenticated owner session requests GET /_ref/explore/records without a cursor parameter
THEN the response SHALL have object: "list" and include:
- data: an array of ExploreTimelineRecord objects, each carrying connector_id (connector TYPE, e.g. "amazon"), connector_instance_id (connection INSTANCE, e.g. "cin_..."), stream, record_key, emitted_at, and data
- has_more: true when more records exist beyond this page, false when all records in the snapshot have been returned
- next_cursor: an OPAQUE cursor string when has_more is true; null when the feed is exhausted. Clients MUST treat it as opaque and pass it back verbatim — they MUST NOT parse or depend on its internal form. The reference implementation returns a short server-side handle (prefix ecr1_) that maps to the composite cursor payload stored server-side (see the cursor-transport requirement below); raw base64url v3 blob cursors are still accepted for backward compatibility (a stale v2 cursor, whose keyset key was emitted_at, is rejected as invalid_cursor so the tab re-anchors a fresh snapshot).
- snapshot_at: an ISO-8601 timestamp corresponding to the ingest-sequence anchor captured at first-page time
- new_since_snapshot: an integer count of records ingested after the snapshot anchor, for use as an "N new" affordance in the UI

Scenario: The console assembler derives a set-descriptor that constrains Explore canvas claims

The set-descriptor is assembled by the console (operator-ui) layer, NOT returned by the endpoint. The endpoint returns only the raw merged page (has_more, next_cursor, new_since_snapshot, etc.); the descriptor's kind is a console/lens decision (the same endpoint response renders as complete_chronological under the recent lens but as a bounded descriptor under the time-range or search lenses), so it cannot be authored server-side without the endpoint knowing the console's render mode.

WHEN the console renders the merged timeline page from GET /_ref/explore/records under the recent lens
THEN the operator-ui assembler SHALL derive a descriptor field typed as a discriminated union with kind: "complete_chronological" from the endpoint response (has_more, next_cursor, new_since_snapshot)
AND the console Explore canvas SHALL switch on descriptor.kind and SHALL NOT claim completeness or ordering that the descriptor does not carry
AND the discriminated union shape SHALL be the load-bearing enforcement mechanism: the renderer MUST NOT claim "newest first" or "complete" for any set whose descriptor does not carry kind: "complete_chronological"

Requirement: The endpoint SHALL scope pagination, counts, and cursor to the selected connection/stream set

GET /_ref/explore/records SHALL accept optional connection / connection_id and stream query parameters (comma-separated or repeated) that scope the merged timeline to the selected (connector_instance_id, stream) set. The scope SHALL be applied at the PARTITION-ENUMERATION layer (so the k-way merge only walks selected partitions) AND to the new_since_snapshot count, so that pagination, has_more, next_cursor, and new_since_snapshot all describe the SAME selected set. Scoping SHALL NOT be implemented by paging a global feed and trimming records client-side, because that produces sparse or empty pages for a selected source whose records are absent from the global page while the descriptor still claims completeness.

Scenario: Selecting a single connection scopes the entire paged feed to it

WHEN an authenticated owner session requests GET /_ref/explore/records with a connection parameter naming one connector instance
THEN exhaustively paging the cursor SHALL return EVERY record in that connection's partitions and NO record from any unselected connection
AND has_more, next_cursor, and new_since_snapshot SHALL describe only the selected connection's set, not the owner's global corpus
AND no page SHALL be sparse or empty solely because the selected connection's records were absent from a globally-paged window

Scenario: An omitted or empty scope means the full owner-visible set

WHEN an authenticated owner session requests GET /_ref/explore/records with no connection or stream parameter (or an empty value)
THEN the endpoint SHALL enumerate every visible (connector_instance_id, stream) partition exactly as the unscoped merged timeline does

Requirement: The merged timeline SHALL order by each record's SEMANTIC time, anchored by the ingest sequence for membership

The merged timeline SHALL order records by each record's SEMANTIC time — when the thing happened — and NOT by its ingest time (emitted_at). Semantic time is the stream manifest's consent_time_field (preferred) then cursor_field, read from the record's data, coerced to an ISO-8601 instant (numeric Unix epochs included: seconds below 1e12, milliseconds at/above), falling back to emitted_at when no semantic field is declared or the value is missing/unparseable — so semantic time is ALWAYS populated and ordering degrades gracefully to ingest order. The per-partition ORDER BY and keyset seek SHALL both use semantic time; the substrate computes it as COALESCE(NULLIF(semantic_time, ''), emitted_at). The snapshot anchor for MEMBERSHIP (id <= snapshotSeq) SHALL remain the monotonic ingest sequence — ordering and membership are DIFFERENT keys, and semantic time (not monotonic) MUST NOT be used as the membership anchor.

Scenario: Records sort by semantic time even when ingested out of semantic order

WHEN records are ingested such that their emitted_at (ingest) order differs from their semantic time order — e.g. a backfill ingests many records in one run so their emitted_at clusters while their authored create_time spans months
THEN exhaustively paging GET /_ref/explore/records SHALL return them ordered by semantic time DESC (newest authored first), NOT by emitted_at

Requirement: The composite cursor SHALL encode per-partition keyset positions anchored on the monotonic ingest sequence

The composite cursor PAYLOAD SHALL encode the position of every live (connector_instance_id, stream) partition as a keyset tuple (semantic time — COALESCE(NULLIF(semantic_time, ''), emitted_at), record_key) plus a snapshot anchor on the MONOTONIC INGEST SEQUENCE (MAX(id) — BIGSERIAL in Postgres, AUTOINCREMENT rowid in SQLite), not on the keyset key. The payload SHALL be a base64url-encoded JSON blob at schema version 3 (the keyset key changed from emitted_at to semantic time at v3). This payload is the INTERNAL cursor state; how it is conveyed to the client in next_cursor is the cursor-transport requirement below.

Scenario: Paging the composite cursor forward yields strictly older, non-duplicated records

WHEN an authenticated owner session pages GET /_ref/explore/records by passing the next_cursor from a prior response
THEN every record in the new page SHALL have a SEMANTIC time less than or equal to any record in the prior page (strictly non-increasing semantic-time order)
AND no record from the prior page SHALL appear in the new page (no duplicates)
AND records from multiple (connector_instance_id, stream) partitions SHALL appear interleaved in the correct semantic-time order

Scenario: The snapshot anchor excludes records ingested after the first page

WHEN a new record is ingested into any partition AFTER the first page of a cursor was issued
THEN that record SHALL NOT appear in any subsequent page of the SAME cursor, regardless of its emitted_at value
AND the new record SHALL be counted in new_since_snapshot when a fresh first-page request is issued after the ingest

Scenario: An invalid or stale cursor returns a typed error

WHEN an authenticated owner session provides a cursor string that is an unknown/expired server-side handle, OR a raw blob that is not valid base64url JSON, has an incompatible schema version (e.g. a pre-fix v2 cursor whose keyset key was emitted_at), or is missing required fields
THEN the endpoint SHALL return HTTP 400 with error code invalid_cursor

Requirement: The endpoint SHALL support re-rendering page 1 pinned to a cursor's original snapshot (rewind)

The endpoint SHALL accept a rewind request parameter that, when truthy ("1"/"true") AND a cursor is supplied, re-renders PAGE 1 pinned to that cursor's ORIGINAL snapshot: the operation SHALL decode the cursor for its snapshotSeq (and display snapshotAt), DISCARD the cursor's per-partition positions, and re-enumerate all partitions from the start under the SAME id <= snapshotSeq membership bound. A new snapshot SHALL NOT be captured. This exists so the console "Load more" accumulator can re-render page 1 against the SAME snapshot as later pages, using the ingest-sequence anchor (snapshotSeq) for membership — never a display-timestamp (emitted_at) proxy. A record ingested AFTER the original snapshot (its id > snapshotSeq) therefore can never appear on a rewound page 1, even when its emitted_at lands inside page 1's window, so it can never displace an original page-1 row (the "Load more hides records above" class).

Scenario: Rewinding a page-1 cursor re-renders the original snapshot's page 1

WHEN an authenticated owner session requests GET /_ref/explore/records with a prior page's cursor AND rewind=1
THEN the response SHALL be page 1 of the snapshot encoded in that cursor (membership id <= snapshotSeq), with the partition positions reset to the start
AND the snapshot SHALL NOT be re-captured (snapshot_at stays the cursor's)
AND a record ingested after that snapshot SHALL be excluded from the rewound page even when its emitted_at falls inside page 1's window

Scenario: Rewind without a cursor is a no-op

WHEN an authenticated owner session requests GET /_ref/explore/records with rewind=1 but NO cursor
THEN the endpoint SHALL behave exactly as a normal first-page request (capture a fresh snapshot), because there is no prior snapshot to pin to

Requirement: The URL `next_cursor` SHALL be opaque; the reference MAY return a server-side handle

The next_cursor value SHALL be treated as OPAQUE by clients, who MUST pass it back verbatim and MUST NOT parse it. The reference implementation SHALL return a short server-side HANDLE (prefix ecr1_) that maps to the composite cursor payload persisted server-side, keeping the URL bounded regardless of partition count — the payload grows with the partition count, so returning it inline in the URL overflows reverse-proxy URL limits at scale (HTTP 431). The reference SHALL still accept a raw base64url v3 blob cursor (a cursor that does not begin with the handle prefix) for backward compatibility with cursors issued before the handle transport. An unknown or expired handle — or a stale v2 cursor — SHALL return HTTP 400 invalid_cursor.

Scenario: A many-partition feed returns a bounded-length next_cursor

WHEN an authenticated owner session pages GET /_ref/explore/records for an owner whose corpus spans many (connector_instance_id, stream) partitions
THEN the next_cursor value SHALL be a short opaque handle whose length does NOT grow with the partition count
AND passing that handle back SHALL resume pagination over the same snapshot, reaching every record with no silent cap (the handle resolves to the full composite payload server-side)

Requirement: Partition enumeration SHALL NOT apply any LIMIT

The reference implementation SHALL enumerate ALL distinct (connector_instance_id, stream) pairs the owner has records in with no LIMIT clause on the partition query, so that every record the owner holds is reachable by exhaustively paging the composite cursor. A silent cap that hides records in overflow partitions is a violation of this requirement.

Scenario: All partitions are enumerated regardless of count

WHEN the owner has records across N distinct (connector_instance_id, stream) partitions for any finite N
THEN the partition enumeration query SHALL return all N partitions
AND exhaustively paging the composite cursor to completion SHALL yield every record in the owner's corpus, with no record permanently unreachable

Scenario: A corpus spanning many partitions returns records from all of them

WHEN the owner's corpus spans records from P1, P2, and P3 partitions (different connector instances and/or streams)
THEN exhaustively paging GET /_ref/explore/records SHALL return records from all three partitions interleaved by semantic time (newest first)
AND no partition SHALL be silently excluded regardless of its ordinal position among the enumerated partitions

Requirement: The merged timeline SHALL carry both connector TYPE and connection INSTANCE identity on every record

Every ExploreTimelineRecord in the data array SHALL carry both:

connector_id: the connector TYPE identifier (e.g. "amazon"), used by the UI to resolve display labels and manifest metadata
connector_instance_id: the specific connection INSTANCE identifier (e.g. "cin_..."), used by the UI to construct per-connection peek/record-detail reads and connection-scoped URLs

The UI SHALL use connector_id for display labels and SHALL use connector_instance_id for API reads. The raw connector_instance_id value SHALL NOT be rendered as a display name.

Scenario: A record from a multi-account connector carries distinct instance identity

WHEN the owner has two connections of the same connector type (e.g. two Amazon accounts) and both have records in the merged feed
THEN each record in data SHALL carry the specific connector_instance_id of the connection it came from
AND the two records SHALL have distinct connector_instance_id values even though they share the same connector_id
AND the console Explore canvas SHALL use the distinct connector_instance_id values to route peek reads to the correct connection scope