Generalize Local Connector Bounded Reads

tasks10/10
Created openspec/changes/generalize-local-connector-bounded-reads/tasks.mdView on GitHub →

1. Contract

  • Add the filesystem/local-DB bounded-read requirement to local-agent-collector-completeness.
  • Add a manifest- or registry-driven regression guard for local connector whole-file and unbounded .all() reads.
  • Add reviewed exceptions for small per-artifact reads with explicit reasons.
  • Add a reviewed accumulator requirement for logical-unit summaries that must not retain raw source payloads.

2. High-Risk Connectors

  • Convert imessage local database reads from unbounded .all() to row iteration.
  • Convert twitter_archive archive parsing away from whole-file array materialization. Done: connectors/twitter_archive/archive-stream.ts streams tweets.js/tweet.js/direct-messages.js with createReadStream + the vetted dependency-free @streamparser/json parser (paths: ['$.*'], keepStack: false releases each emitted element). The two readFile exceptions were removed from the guard and it still passes. On-disk fixtures with escaped/nested/unicode cases under __fixtures__/archive-files/; streaming-equivalence, chunk-boundary, legacy-fallback, empty/missing/malformed, and end-to-end subprocess tests in archive-stream.test.ts. The prior blocker is resolved; see research/twitter-archive-streaming-blocker-2026-06-17.md (Resolution section).
  • Convert large Slack dump row reads to row iteration or document bounded query exceptions.

3. Validation

  • Run targeted polyfill connector tests for changed connectors.
  • Run pnpm --filter @pdpp/polyfill-connectors typecheck.
  • Run openspec validate generalize-local-connector-bounded-reads --strict.