Reference Implementation Architecture
ADDED Requirements
Requirement: Reference implementation SHALL expose a durable owner-session explore-records endpoint
The reference implementation SHALL expose GET /_ref/explore/records as a durable,
owner-session-authenticated reference route that returns a page of time-ordered records
merged across all of the owner's (connector_instance_id, stream) partitions. This route
is a reference/operator surface, NOT a PDPP Core protocol route, and SHALL NOT be
reachable over /mcp or with a grant-scoped token. Its response shape is designed
exclusively for the console Explore canvas and clients MUST NOT depend on it as a stable
external protocol.
Scenario: The endpoint is gated to owner sessions
- WHEN a request to
GET /_ref/explore/recordslacks a valid owner session - THEN the reference SHALL reject it with a 401 response
- AND defining this endpoint SHALL NOT make any explore-records capability
reachable over
/mcpor with a grant-scoped token
Scenario: First-page response carries a snapshot anchor and merged record list
- WHEN an authenticated owner session requests
GET /_ref/explore/recordswithout a cursor parameter - THEN the response SHALL have
object: "list"and include:data: an array ofExploreTimelineRecordobjects, each carryingconnector_id(connector TYPE, e.g."amazon"),connector_instance_id(connection INSTANCE, e.g."cin_..."),stream,record_key,emitted_at, anddatahas_more:truewhen more records exist beyond this page,falsewhen all records in the snapshot have been returnednext_cursor: an OPAQUE cursor string whenhas_moreistrue;nullwhen the feed is exhausted. Clients MUST treat it as opaque and pass it back verbatim — they MUST NOT parse or depend on its internal form. The reference implementation returns a short server-side handle (prefixecr1_) that maps to the composite cursor payload stored server-side (see the cursor-transport requirement below); raw base64url v3 blob cursors are still accepted for backward compatibility (a stale v2 cursor, whose keyset key wasemitted_at, is rejected asinvalid_cursorso the tab re-anchors a fresh snapshot).snapshot_at: an ISO-8601 timestamp corresponding to the ingest-sequence anchor captured at first-page timenew_since_snapshot: an integer count of records ingested after the snapshot anchor, for use as an "N new" affordance in the UI
Scenario: The console assembler derives a set-descriptor that constrains Explore canvas claims
The set-descriptor is assembled by the console (operator-ui) layer, NOT returned by the
endpoint. The endpoint returns only the raw merged page (has_more, next_cursor,
new_since_snapshot, etc.); the descriptor's kind is a console/lens decision (the same
endpoint response renders as complete_chronological under the recent lens but as a
bounded descriptor under the time-range or search lenses), so it cannot be authored
server-side without the endpoint knowing the console's render mode.
- WHEN the console renders the merged timeline page from
GET /_ref/explore/recordsunder the recent lens - THEN the operator-ui assembler SHALL derive a
descriptorfield typed as a discriminated union withkind: "complete_chronological"from the endpoint response (has_more,next_cursor,new_since_snapshot) - AND the console Explore canvas SHALL switch on
descriptor.kindand SHALL NOT claim completeness or ordering that the descriptor does not carry - AND the discriminated union shape SHALL be the load-bearing enforcement
mechanism: the renderer MUST NOT claim "newest first" or "complete" for any set
whose descriptor does not carry
kind: "complete_chronological"
Requirement: The endpoint SHALL scope pagination, counts, and cursor to the selected connection/stream set
GET /_ref/explore/records SHALL accept optional connection / connection_id and
stream query parameters (comma-separated or repeated) that scope the merged timeline to
the selected (connector_instance_id, stream) set. The scope SHALL be applied at the
PARTITION-ENUMERATION layer (so the k-way merge only walks selected partitions) AND to the
new_since_snapshot count, so that pagination, has_more, next_cursor, and
new_since_snapshot all describe the SAME selected set. Scoping SHALL NOT be implemented
by paging a global feed and trimming records client-side, because that produces sparse or
empty pages for a selected source whose records are absent from the global page while the
descriptor still claims completeness.
Scenario: Selecting a single connection scopes the entire paged feed to it
- WHEN an authenticated owner session requests
GET /_ref/explore/recordswith aconnectionparameter naming one connector instance - THEN exhaustively paging the cursor SHALL return EVERY record in that connection's partitions and NO record from any unselected connection
- AND
has_more,next_cursor, andnew_since_snapshotSHALL describe only the selected connection's set, not the owner's global corpus - AND no page SHALL be sparse or empty solely because the selected connection's records were absent from a globally-paged window
Scenario: An omitted or empty scope means the full owner-visible set
- WHEN an authenticated owner session requests
GET /_ref/explore/recordswith noconnectionorstreamparameter (or an empty value) - THEN the endpoint SHALL enumerate every visible
(connector_instance_id, stream)partition exactly as the unscoped merged timeline does
Requirement: The merged timeline SHALL order by each record's SEMANTIC time, anchored by the ingest sequence for membership
The merged timeline SHALL order records by each record's SEMANTIC time — when the
thing happened — and NOT by its ingest time (emitted_at). Semantic time is the
stream manifest's consent_time_field (preferred) then cursor_field, read from the
record's data, coerced to an ISO-8601 instant (numeric Unix epochs included:
seconds below 1e12, milliseconds at/above), falling back to emitted_at when no
semantic field is declared or the value is missing/unparseable — so semantic time is
ALWAYS populated and ordering degrades gracefully to ingest order. The per-partition
ORDER BY and keyset seek SHALL both use semantic time; the substrate computes it as
COALESCE(NULLIF(semantic_time, ''), emitted_at). The snapshot anchor for MEMBERSHIP
(id <= snapshotSeq) SHALL remain the monotonic ingest sequence — ordering and
membership are DIFFERENT keys, and semantic time (not monotonic) MUST NOT be used as
the membership anchor.
Scenario: Records sort by semantic time even when ingested out of semantic order
- WHEN records are ingested such that their
emitted_at(ingest) order differs from their semantic time order — e.g. a backfill ingests many records in one run so theiremitted_atclusters while their authoredcreate_timespans months - THEN exhaustively paging
GET /_ref/explore/recordsSHALL return them ordered by semantic time DESC (newest authored first), NOT byemitted_at
Requirement: The composite cursor SHALL encode per-partition keyset positions anchored on the monotonic ingest sequence
The composite cursor PAYLOAD SHALL encode the position of every live
(connector_instance_id, stream) partition as a keyset tuple (semantic time —
COALESCE(NULLIF(semantic_time, ''), emitted_at), record_key) plus a snapshot
anchor on the MONOTONIC INGEST SEQUENCE (MAX(id) — BIGSERIAL in Postgres,
AUTOINCREMENT rowid in SQLite), not on the keyset key. The payload SHALL be a
base64url-encoded JSON blob at schema version 3 (the keyset key changed from
emitted_at to semantic time at v3). This payload is the INTERNAL cursor state; how
it is conveyed to the client in next_cursor is the cursor-transport requirement
below.
Scenario: Paging the composite cursor forward yields strictly older, non-duplicated records
- WHEN an authenticated owner session pages
GET /_ref/explore/recordsby passing thenext_cursorfrom a prior response - THEN every record in the new page SHALL have a SEMANTIC time less than or equal to any record in the prior page (strictly non-increasing semantic-time order)
- AND no record from the prior page SHALL appear in the new page (no duplicates)
- AND records from multiple (connector_instance_id, stream) partitions SHALL appear interleaved in the correct semantic-time order
Scenario: The snapshot anchor excludes records ingested after the first page
- WHEN a new record is ingested into any partition AFTER the first page of a cursor was issued
- THEN that record SHALL NOT appear in any subsequent page of the SAME cursor,
regardless of its
emitted_atvalue - AND the new record SHALL be counted in
new_since_snapshotwhen a fresh first-page request is issued after the ingest
Scenario: An invalid or stale cursor returns a typed error
- WHEN an authenticated owner session provides a
cursorstring that is an unknown/expired server-side handle, OR a raw blob that is not valid base64url JSON, has an incompatible schema version (e.g. a pre-fix v2 cursor whose keyset key wasemitted_at), or is missing required fields - THEN the endpoint SHALL return HTTP 400 with error code
invalid_cursor
Requirement: The endpoint SHALL support re-rendering page 1 pinned to a cursor's original snapshot (rewind)
The endpoint SHALL accept a rewind request parameter that, when truthy
("1"/"true") AND a cursor is supplied, re-renders PAGE 1 pinned to that
cursor's ORIGINAL snapshot: the operation SHALL decode the cursor for its
snapshotSeq (and display snapshotAt), DISCARD the cursor's per-partition
positions, and re-enumerate all partitions from the start under the SAME
id <= snapshotSeq membership bound. A new snapshot SHALL NOT be captured. This
exists so the console "Load more" accumulator can re-render page 1 against the SAME
snapshot as later pages, using the ingest-sequence anchor (snapshotSeq) for
membership — never a display-timestamp (emitted_at) proxy. A record ingested
AFTER the original snapshot (its id > snapshotSeq) therefore can never appear on
a rewound page 1, even when its emitted_at lands inside page 1's window, so it can
never displace an original page-1 row (the "Load more hides records above" class).
Scenario: Rewinding a page-1 cursor re-renders the original snapshot's page 1
- WHEN an authenticated owner session requests
GET /_ref/explore/recordswith a prior page'scursorANDrewind=1 - THEN the response SHALL be page 1 of the snapshot encoded in that cursor
(membership
id <= snapshotSeq), with the partition positions reset to the start - AND the snapshot SHALL NOT be re-captured (
snapshot_atstays the cursor's) - AND a record ingested after that snapshot SHALL be excluded from the rewound
page even when its
emitted_atfalls inside page 1's window
Scenario: Rewind without a cursor is a no-op
- WHEN an authenticated owner session requests
GET /_ref/explore/recordswithrewind=1but NOcursor - THEN the endpoint SHALL behave exactly as a normal first-page request (capture a fresh snapshot), because there is no prior snapshot to pin to
Requirement: The URL next_cursor SHALL be opaque; the reference MAY return a server-side handle
The next_cursor value SHALL be treated as OPAQUE by clients, who MUST pass it back
verbatim and MUST NOT parse it. The reference implementation SHALL return a short
server-side HANDLE (prefix ecr1_) that maps to the composite cursor payload
persisted server-side, keeping the URL bounded regardless of partition count — the
payload grows with the partition count, so returning it inline in the URL overflows
reverse-proxy URL limits at scale (HTTP 431). The reference SHALL still accept a raw
base64url v3 blob cursor (a cursor that does not begin with the handle prefix) for
backward compatibility with cursors issued before the handle transport. An unknown or
expired handle — or a stale v2 cursor — SHALL return HTTP 400 invalid_cursor.
Scenario: A many-partition feed returns a bounded-length next_cursor
- WHEN an authenticated owner session pages
GET /_ref/explore/recordsfor an owner whose corpus spans many (connector_instance_id, stream) partitions - THEN the
next_cursorvalue SHALL be a short opaque handle whose length does NOT grow with the partition count - AND passing that handle back SHALL resume pagination over the same snapshot, reaching every record with no silent cap (the handle resolves to the full composite payload server-side)
Requirement: Partition enumeration SHALL NOT apply any LIMIT
The reference implementation SHALL enumerate ALL distinct (connector_instance_id, stream) pairs the owner has records in with no LIMIT clause on the partition query, so that every record the owner holds is reachable by exhaustively paging the composite cursor. A silent cap that hides records in overflow partitions is a violation of this requirement.
Scenario: All partitions are enumerated regardless of count
- WHEN the owner has records across N distinct (connector_instance_id, stream) partitions for any finite N
- THEN the partition enumeration query SHALL return all N partitions
- AND exhaustively paging the composite cursor to completion SHALL yield every record in the owner's corpus, with no record permanently unreachable
Scenario: A corpus spanning many partitions returns records from all of them
- WHEN the owner's corpus spans records from P1, P2, and P3 partitions (different connector instances and/or streams)
- THEN exhaustively paging
GET /_ref/explore/recordsSHALL return records from all three partitions interleaved by semantic time (newest first) - AND no partition SHALL be silently excluded regardless of its ordinal position among the enumerated partitions
Requirement: The merged timeline SHALL carry both connector TYPE and connection INSTANCE identity on every record
Every ExploreTimelineRecord in the data array SHALL carry both:
connector_id: the connector TYPE identifier (e.g."amazon"), used by the UI to resolve display labels and manifest metadataconnector_instance_id: the specific connection INSTANCE identifier (e.g."cin_..."), used by the UI to construct per-connection peek/record-detail reads and connection-scoped URLs
The UI SHALL use connector_id for display labels and SHALL use
connector_instance_id for API reads. The raw connector_instance_id value SHALL
NOT be rendered as a display name.
Scenario: A record from a multi-account connector carries distinct instance identity
- WHEN the owner has two connections of the same connector type (e.g. two Amazon accounts) and both have records in the merged feed
- THEN each record in
dataSHALL carry the specificconnector_instance_idof the connection it came from - AND the two records SHALL have distinct
connector_instance_idvalues even though they share the sameconnector_id - AND the console Explore canvas SHALL use the distinct
connector_instance_idvalues to route peek reads to the correct connection scope