Skip to content

ADR-0008: Split Ingest Into Connector, Recipe, and Job

  • Status: Accepted
  • Date: 2026-06-21

Context

runex currently exposes inbound sources as DataSourceSpec.

That object works well for concrete business adapters such as obsidian-thought, media-snapshot, and wechat-session: each source self-describes its params, capability requirements, watchability, ontology bundles, and adapter factory.

But the name "datasource" now carries too many meanings:

  • transport/interface: file, HTTP, SQLite, CSV, local command
  • business interpretation: Thought, WeChatConversation, media work, lead
  • mapping: raw field names to typed CanonicalItem fields and links
  • run mode: one-shot import, watch, sync, deletion propagation

This makes the ingest surface unclear. A one-off thought import, a WeChat conversation pull, a CSV upload, a SQL query, and an HTTP JSON endpoint all look like peers even though they sit at different layers of the pipeline.

At the same time, ADR-0002 and ADR-0004 already establish two constraints:

  • external adapters must not be collapsed into kernels
  • CanonicalItem remains the typed graph boundary

So the solution cannot be "just make ingest a kernel" or "pass raw JSON through the DSL".

Decision

We split inbound ingest into three explicit concepts:

  1. Connector: how raw external records are enumerated
  2. Recipe: how those raw records become ontology semantics
  3. Ingest job: how a connector and recipe are run together

The preferred shape is:

text
SourceEndpoint -> RawRecord -> MappingRecipe -> CanonicalItem -> Store
     connector       raw fact      semantic map       typed item      graph

CanonicalItem remains the only typed boundary consumed by the store ingest pipeline. The new split happens before that boundary.

Definitions

Connector

A connector owns transport/interface concerns:

  • file enumeration
  • CSV parsing
  • HTTP JSON fetch
  • SQLite read-only query
  • local command JSON output
  • WeChat CLI or wxark server access

A connector declares:

  • params
  • required capabilities
  • raw record shape
  • optional watch/sync support

It does not know which supertag a record becomes.

Recipe

A recipe owns semantic interpretation:

  • accepted raw record shape
  • target supertag
  • ontology bundles to load
  • natural key expectations
  • field mapping
  • link mapping
  • domain-specific filtering

It does not perform direct I/O.

Ingest Job

An ingest job binds:

  • one connector configuration
  • one recipe
  • one run mode (import, watch, or sync)
  • optional cursor/deletion policy

This is the unit a product or agent can save, inspect, rerun, or schedule.

Compatibility

Existing DataSourceSpec remains supported as a preset business source.

Conceptually, a DataSourceSpec is now treated as a convenience bundle:

text
DataSourceSpec ~= ConnectorSpec + MappingRecipeSpec + default job policy

This lets existing sources keep working while new generic ingest features can be added at the correct layer.

The registry should expose both:

  • concrete/preset datasources for compatibility
  • core connectors and recipes for the clearer ingest model

Why

The engine's core job is not "support many source names".

Its core job is:

enumerate external facts, interpret them as ontology objects, and idempotently project them into the graph so reactive semantics can run

The connector/recipe split keeps each concern in the place where it can be tested and evolved independently.

It also removes a false equivalence:

  • file-csv is not the same kind of thing as obsidian-thought
  • http-json is not the same kind of thing as media-snapshot
  • wechat-cli and wxark-http are two ways to obtain the same business conversation facts

Alternatives Considered

1. Keep adding concrete DataSourceSpec entries

Rejected as the sole model.

Reason:

  • every CSV/HTTP/SQL import becomes a new adapter even when only the mapping differs
  • run mode and transport concerns stay mixed with business semantics
  • products cannot offer a clean "connect this source, choose a recipe" flow

2. Collapse ingest into kernels

Rejected.

Reason:

  • violates ADR-0002
  • hides adapter semantics in effect functions
  • encourages raw payload movement through DSL effects

3. Let recipes output raw JSON and make upsert generic

Rejected.

Reason:

  • violates ADR-0004
  • weakens typed fields and links
  • pushes identity and reference conventions into payload discipline

4. Replace all existing datasources immediately

Rejected.

Reason:

  • unnecessary churn
  • higher regression risk
  • current business adapters remain useful presets

Consequences

Positive

  • ingest vocabulary matches the real architecture
  • generic CSV/HTTP/SQL/file imports become possible without bespoke adapters
  • WeChat CLI and wxark HTTP can share one semantic mapper
  • products can inspect connectors, recipes, and saved jobs separately
  • CanonicalItem remains the graph boundary

Negative / Tradeoffs

  • the registry gains more explicit concepts
  • adapter authors must decide whether they are writing a connector, a recipe, or a preset source
  • some existing DataSourceSpec docs need compatibility wording

We accept this because the extra names reflect real boundaries instead of inventing ceremony.

Design Rules

  1. Connectors may require capabilities; recipes may not perform direct I/O.
  2. Recipes may emit CanonicalItem; connectors may not.
  3. Watch/cursor/deletion behavior belongs to connectors or ingest jobs.
  4. Ontology bundles are declared by recipes or preset datasources.
  5. Existing preset datasources must continue to expose the old manifest surface until a product-facing migration exists.