Planning
The internal project view for the PDPP reference implementation. Active changes, durable capability specs, and change-local working notes — rendered directly from the repository.
Active changes
Sorted by status, then most recently modified.
- Add Mock Reference Demo Instance
The current /sandbox is a useful static walkthrough, but it does not provide the reviewer experience we actually need: a hosted, credential-free reference instance that feels like the real dashboard and exposes callable AS/RS-shaped APIs. Internal reviewers, implementers, and agents should be able to explore PDPP end to end without Docker, .env.local, connector auth, or private owner data.
add-mock-reference-demo-instance0/26 tasksaffects: reference-demo-instance, reference-surface-topology, reference-web-bridge-contractupdatedin progress - Design Host Browser Bridge For Docker
Browser-backed polyfill connectors can require human interaction: Cloudflare challenges, OTP prompts, "is this you?" confirmations, and archive-export verification steps. In a native local deployment, the headed browser appears on the owner's desktop. In Docker, the connector process can launch Chrome inside the container, but the owner cannot see or control that browser.
design-host-browser-bridge-for-docker24/29 tasksaffects: reference-implementation-architectureupdatedin progress - hydrate-first-party-blob-streams
The reference already exposes a grant-safe blob transport through blobref.fetchurl and GET /v1/blobs/{blob_id}, but first-party connectors mostly emit metadata-only records for files and attachments. Reviewers can discover that an email, statement, receipt, or uploaded file exists, but cannot fetch the bytes through PDPP even when the source account makes those bytes available.
hydrate-first-party-blob-streams10/27 tasksaffects: reference-implementation-architectureupdatedin progress - Add Reference Web Dark Mode
The reference web app is light-only. The owner uses the dashboard as a sustained operator surface (records, runs, deployment diagnostics, search), and prolonged light-mode use is uncomfortable. The brand and shadcn primitives already encode every color through semantic CSS variables (--background, --foreground, --muted, --border, --primary, --success, --destructive, etc.) and Tailwind 4 already exposes a dark variant via @custom-variant dark (&:is(.dark *)). The missing piece is an actual .dark token set, a flicker-free toggle, and a few legacy color-mix(... white) mixes in the brand CSS that bake the page background being white.
add-reference-web-dark-mode17/20 tasksaffects: reference-surface-topologyupdatedin progress - Add polyfill connector system
The reference implementation today has sample polyfill connectors (Spotify, GitHub, Reddit) backed by seed fixtures. It does not yet have living polyfill connectors against real platforms for a real user, running on a real schedule, with a real human-in-the-loop interaction channel.
add-polyfill-connector-system85/90 tasksaffects: polyfill-runtimeupdatedin progress - Add Polyfill Layer Two Stream Coverage
add-polyfill-connector-system has become a mixture of shipped MVP infrastructure, live connector bug notes, and a large Layer 2 stream backlog. That makes it hard for workers to improve connector coverage without touching unrelated runtime or governance work.
add-polyfill-layer-two-stream-coverage13/18 tasksaffects: reference-implementation-architectureupdatedin progress - Add Reddit Pilot Real Shape Fixture
The connector fixture scrubber pipeline shipped with two pilot shapes: a browser DOM capture (Amazon) and an API JSON capture (GitHub). Reddit is now the third distinct shape — a records-level JSONL stream emitted directly from runConnector() — and has no committed real-shape fixture. Its integration tests use synthetic listings, which miss drift between the hand-crafted shapes and what Reddit's old-reddit JSON actually serves.
add-reddit-pilot-real-shape-fixture0/20 tasksaffects: reference-implementation-governanceupdatedin progress - Add Reference Runtime Spec
The cleanup audit found that reference-runtime behavior is proven by tests and active program work but lacks a durable canonical OpenSpec capability. Scheduler behavior, runtime validation, browser-profile binding, filesystem bindings, connector runtime logging, and inbox/notification behavior should not graduate from add-polyfill-connector-system as unbounded implementation history.
add-reference-runtime-spec6/7 tasksaffects: reference-implementation-runtimeupdatedin progress - Define Hybrid Retrieval
The reference now exposes lexical and semantic retrieval with scores, but assistants still have to call both endpoints and merge results client-side. A server-side hybrid endpoint would make the recall layer simpler, safer, and easier to evaluate.
define-hybrid-retrieval20/23 tasksaffects: hybrid-retrieval, lexical-retrieval, semantic-retrievalupdatedin progress - Make Reference Queries Inspectable
swap-sqlite-driver bundled two different goals: replacing the crash-prone SQLite driver and extracting SQL into inspectable artifacts. The driver swap has landed; query extraction remains valuable, but it should now be evaluated on inspectability and maintainability rather than native-driver stability.
make-reference-queries-inspectable13/15 tasksaffects: reference-implementation-architectureupdatedin progress - Retire Browser Daemon
The polyfill connector subsystem has two browser-launch paths that no production runtime uses:
retire-browser-daemon29/32 tasksaffects: polyfill-runtimeupdatedin progress - Make Public Sandbox Functional
/sandbox currently reads like a placeholder for work we intend to do. Public reviewers and prospective implementers need a concrete, useful surface that lets them experience PDPP with simulated data immediately, without connecting real accounts or running the reference stack.
make-public-sandbox-functional18/18 tasksaffects: reference-surface-topologyupdatedcomplete - Add Schema Followups
The add-schema-validation-coverage change (committed as a3e1c8a) wired schema validation into eleven connectors and surfaced three followups it explicitly did not address:
add-schema-followups20/20 tasksaffects: reference-implementation-architecture, reference-implementation-governanceupdatedcomplete - Add Schema Validation Coverage
Connector schema coverage was uneven before this change. Five connectors (amazon, chase, chatgpt, reddit, usaa) shipped a schemas.ts with validateRecord; six others (github, gmail, ynab, codex, claude_code, slack) had no shape-check at all despite emitting hundreds of thousands of records into the local owner database. The connector authoring guide §3 calls schema-validation the floor — "a connector must never emit a record that looks right but is wrong" — but in practice that floor only existed for some.
add-schema-validation-coverage25/25 tasksaffects: reference-implementation-architecture, reference-implementation-governanceupdatedcomplete - Expand First Party Parent Child Relations
expand[] is implemented and grant-safe, but only a small set of first-party parent-child relations are enabled. Assistants still have to do N+1 reads for common records such as Slack messages with attachments/reactions or other safe child collections.
expand-first-party-parent-child-relations17/17 tasksaffects: reference-implementation-architectureupdatedcomplete - Polish Assistant Query Api Discovery
The assistant feedback shows the query layer is now powerful but still hard to self-discover: a capable client can use range filters, search filters, aggregations, blobs, and changes_since, but too much of the correct shape is learned by trial-and-error.
polish-assistant-query-api-discovery17/17 tasksaffects: reference-implementation-architectureupdatedcomplete - Polish Reference Api Discovery Seams
A cold-start integrator (human or agent) hitting the reference AS/RS today has no obvious entry point. Probing /, /health, /v1 returns uniform 404s. Two independent fresh-eyes assessments (see tmp/pdpp-review-memo.md) both wasted significant time before reaching the well-known endpoint and /v1/schema. One observer concluded the server was non-functional. The recall surface, query API, and discovery endpoint behind the bearer are strong; the rough edges are at the seam an unauthenticated probe sees first.
polish-reference-api-discovery-seams18/18 tasksaffects: reference-implementation-architectureupdatedcomplete
Protocol semantics: grants, queries, authorization metadata, and capability contracts.
Open questions, plans, audits, and research grouped by workstream.
Recent notes
Design Host Browser Bridge For Docker1 notes · 1 working note · updated
hydrate-first-party-blob-streams1 notes · 1 working note · updated
Add polyfill connector system55 notes · 24 open questions · 1 plan · 2 strategy notes · 2 audits · 1 research note · 14 connector notes · 11 working notes · updated
Status: researching Owner: owner/runtime Created: 2026-04-19 Updated: 2026-04-25 Related: add-polyfill-connector-system; add-reference-runtime-spec; credential-bootstrap-automation-open-question.md; raw-provenance-capture-open-question.md; external-tool-dependencies-open-question.md; connector-configuration-open-question.md
"We are unable to complete your request. Our system is currently unavailable. Please try again later."
If the answer is "yes, with constraints," PDPP could support a new lane between:
| | Activity streams | Authored artifacts | |---|---|---| | Volume | High (thousands → millions) | Low (tens → hundreds) | | Mutation | Append-only (sometimes tombstones) | Mutable; edited over time | | Cursor | Timestamp / monotonic | Revision or content-hash | | Consent weight | Sensitive by volume (bulk access) | Sensitive by leverage (encodes user strategy) | | Disclosure framing | "Your Gmail messages" | "Your custom ChatGPT prompts" | | Restoration | Server retains source of truth; re-collectable | Lost forever if not preserved |
Status: sprint-needed Owner: project owner Created: 2026-04-19 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system/tasks.md (Gmail attachment blob collection), openspec/changes/add-polyfill-connector-system/design-notes/layer-2-coverage-gmail-ynab-usaa-github.md, pdpp-trust-model-framing.md
Status: sprint-needed Owner: project owner Created: 2026-04-19 Updated: 2026-04-24 Related: design-notes/source-instances-and-multi-account-configurations-2026-04-24.md (repo root)
Layer 2 coverage audits (see layer-2-coverage-gmail-ynab-usaa-github.md, layer-2-coverage-chatgpt-claude-codex.md) surfaced identity/social data in every connector inspected:
Status: sprint-needed Owner: project owner Created: 2026-04-20 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system/design-notes/partial-run-semantics-open-question.md, openspec/changes/add-polyfill-connector-system/design-notes/gap-recovery-execution-open-question.md, pdpp-trust-model-framing.md
Several connectors in today's fleet (Gmail, ChatGPT, USAA, Slack) have had hours of debugging that similar infrastructure could have cut to minutes. This pattern will repeat every time we fix a bug or add a connector.
| Class | Example | Spec'd today? | |---|---|---| | 1. Runtime bindings | network, filesystem, interactive | ✅ yes — runtime_requirements.bindings | | 2. Language-level deps | npm packages, Go modules, Python imports | ❌ no — implicit in the connector package | | 3. External tool binaries | slackdump, osxphotos, ffmpeg, pandoc, playwright browsers | ❌ no — invisible to spec |
Status: sprint-needed Owner: project owner Created: 2026-04-20 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system/design-notes/partial-run-semantics-open-question.md, openspec/changes/add-polyfill-connector-system/design-notes/cursor-finality-and-gap-awareness-open-question.md, pdpp-trust-model-framing.md
The short-term answer to "human-attended browser access for connectors in Docker" is specifically host-browser control, not noVNC, not browser streaming, not a connector-worker protocol. The follow-up work should be scoped around a deliberately configured local bridge:
Catches broken extractors, undeclared columns, null-where-required, declared fields that never populate. Recent run surfaced ChatGPT dropping ~67% of messages, Gmail snippet/references/content_type broken, USAA manifest drift. The conformance harness already runs this.
One row per setting: {id, key, category, value, valuetype: "string" | "number" | "boolean" | "json" | "enum", valueenumoptions, description, lastmodified, source_connector}. Pro: queryable across sources; portable; renders cleanly in disclosure UI. Con: nested/collection settings (Gmail filters, muted-channel lists) collapse into opaque JSON, losing typing where it matters most.
reference-implementation/server/index.js:1054:
spec-core.md Tier 1 RS requirement #12:
Status: sprint-needed Owner: project owner Created: 2026-04-20 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system/design-notes/cursor-finality-and-gap-awareness-open-question.md, openspec/changes/add-polyfill-connector-system/design-notes/gap-recovery-execution-open-question.md, pdpp-trust-model-framing.md
Most consumer platforms expose at least one of the following surfaces for personal data:
Re-extraction cost, audit fidelity, and self-export completeness all pivot on whether the RS holds the upstream artifact or only the parsed record. Owners who self-export raw receive something qualitatively different from owners who receive only the extractor's output: raw is auditable against the source and re-parseable when the extractor improves; the parsed record is frozen at whatever shape the extractor happened to produce on ingest day.
GET /v1/streams without a connector_id query parameter returns:
The question: is one-DB-per-owner a PDPP spec requirement, a reference-implementation convention, or an incidental choice?
Meanwhile, anyone who builds on PDPP will re-embed the same 800k records. That's a staggering amount of duplicated work across implementations and the forcing function the blob-hydration note already named for binaries ("don't make every consumer re-derive expensive things that are identical across consumers"). Embeddings and BM25 indexes fall in the same class.
Many platforms expose a three-step web flow for obtaining durable API credentials:
Status: sprint-needed Owner: project owner Created: 2026-04-19 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system/design-notes/credential-bootstrap-automation-open-question.md, pdpp-trust-model-framing.md
Status: captured Owner: project owner Created: 2026-04-20 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system/design-notes/partial-run-semantics-open-question.md, openspec/changes/add-polyfill-connector-system/design-notes/cursor-finality-and-gap-awareness-open-question.md, openspec/changes/add-polyfill-connector-system/design-notes/gap-recovery-execution-open-question.md, openspec/changes/add-polyfill-connector-system/design-notes/blob-hydration-open-question.md, openspec/changes/add-polyfill-connector-system/design-notes/credential-storage-open-question.md
The open question note establishes that PDPP could plausibly support a lane for:
Status: captured Owner: connector worker Created: 2026-04-24 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system
Status: audit complete; reverified by query-api-gap-audit Original branch: audit-query-api-readiness Verification branch: query-api-gap-audit Verification worktree: /home/tnunamak/code/pdpp-query-api-gap-audit Scope: read-only audit of reference-implementation/server, query docs/specs, OpenSpec artifacts, and packages/polyfill-connectors/manifests.
Year-freezing. A year's orders don't change once closed. Once we've scraped year Y and the count matches for 2 consecutive runs, Y is "frozen" — skip it on future runs. Only the current year + last 60 days of the prior year need re-scraping each run.
The common assumption — and what our initial research pointed at — was that Chase uses Akamai Bot Manager Premier which detects headless Chromium at login. Four findings disprove that for this specific flow:
Financial transactions are a high-value polyfill stream for the same reason USAA and YNAB are: reconciliation, life-history analysis, owner self-export, audit. Chase is the largest US retail bank by active checking accounts and is a natural parallel to USAA for demonstrating that PDPP polyfill connectors generalize across institutions.
All fetches to /backend-api/ MUST go through page.evaluate(fetch) inside the browser context. Node.js fetch will be 403'd by Cloudflare. Non-negotiable.
Three streams each, following Claude Code's shape for consistency where possible:
Library: imapflow (Node). Handles CONDSTORE natively, tolerates Gmail's lack of QRESYNC, clean async/await API, maintained.
Node.js v24+ readline.createInterface() treats U+2028 (LINE SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR) as line terminators. This tracks ECMA-262's definition of line terminators.
---
---
USAA deprecated OFX/QFX in mid-2023 but retains CSV export via UI. Flow per account: 1. Navigate to /my/accounts 2. Click account name 3. Click "I want to" menu (upper-left of account detail page) 4. Select "Export" 5. Choose CSV, date range, download
Evidence gathered from live session recon during the overnight run.
USAA's CSV export UI hard-caps at ~18 months. Empirically on 2026-04-19: "10/19/2024 accepted, 04/19/2024 rejected." Requesting older ranges leaves the form in "Fix From Date" state and submit button never enables. This is documented in the connector at packages/polyfill-connectors/connectors/usaa/index.js around line 350.
Built scaffolds for 13 additional connectors beyond the original 5-MVP (YNAB, Gmail, ChatGPT, USAA, Amazon). All have full manifests; implementations vary from complete (API-based with available creds) to scaffolded-pending-wiring (browser-based needing live session).
Rationale: valuable for reconciliation — GPS at which a payee was last used can match a bank-statement merchant to a specific Amazon/Uber location.
Status: open Owner: Tim Created: 2026-04-24 Updated: 2026-04-24 Related: add-polyfill-connector-system; reference runtime controller; protocol-violation diagnostics
Status: captured Owner: connector-live-smoke-triage worker Created: 2026-04-24 Updated: 2026-04-24 Related: openspec/changes/add-polyfill-connector-system
Tim's list (verbatim, then expanded):
Slackdump's -chan-types public,private,im,mpim lets the operator say "give me public channels and DMs but skip group DMs." This is not a resources filter (that's by ID) and not a streams filter (that's by record kind). It's a sub-type within a stream.
49,173 real records across 20 streams from 4 platforms, all your actual data, ingested into PDPP RS. YNAB + Gmail + ChatGPT from last night, USAA added today (5 streams: accounts, transactions, statements, inboxmessages, creditcard_billing).
The A++ follow-up audit found that several connectors with obvious parent/child stream relationships were historically child-first:
Several auto-login helpers use page.waitForTimeout(ms) as a synchronization primitive. This is an explicit Playwright anti-pattern — the docs call it out as "strongly discouraged" because it makes tests flaky, slow, and brittle to timing variance. It's in our code because the helpers were adapted from pre-existing scrapers that used the pattern, and we preserved working behavior while extending them.
generated/private pilot.
Honest audit turned up 5 classes of gap across the connector fleet. This document tracks the fix-all pass.
Current config (reference-implementation/server/db.js):
Binding on every connector past, present, and future.