ADR-0008: Split Ingest Into Connector, Recipe, and Job
- Status: Accepted
- Date: 2026-06-21
Context
runex currently exposes inbound sources as DataSourceSpec.
That object works well for concrete business adapters such as obsidian-thought, media-snapshot, and wechat-session: each source self-describes its params, capability requirements, watchability, ontology bundles, and adapter factory.
But the name "datasource" now carries too many meanings:
- transport/interface: file, HTTP, SQLite, CSV, local command
- business interpretation: Thought, WeChatConversation, media work, lead
- mapping: raw field names to typed
CanonicalItemfields and links - run mode: one-shot import, watch, sync, deletion propagation
This makes the ingest surface unclear. A one-off thought import, a WeChat conversation pull, a CSV upload, a SQL query, and an HTTP JSON endpoint all look like peers even though they sit at different layers of the pipeline.
At the same time, ADR-0002 and ADR-0004 already establish two constraints:
- external adapters must not be collapsed into kernels
CanonicalItemremains the typed graph boundary
So the solution cannot be "just make ingest a kernel" or "pass raw JSON through the DSL".
Decision
We split inbound ingest into three explicit concepts:
- Connector: how raw external records are enumerated
- Recipe: how those raw records become ontology semantics
- Ingest job: how a connector and recipe are run together
The preferred shape is:
text
SourceEndpoint -> RawRecord -> MappingRecipe -> CanonicalItem -> Store
connector raw fact semantic map typed item graphCanonicalItem remains the only typed boundary consumed by the store ingest pipeline. The new split happens before that boundary.
Definitions
Connector
A connector owns transport/interface concerns:
- file enumeration
- CSV parsing
- HTTP JSON fetch
- SQLite read-only query
- local command JSON output
- WeChat CLI or wxark server access
A connector declares:
- params
- required capabilities
- raw record shape
- optional watch/sync support
It does not know which supertag a record becomes.
Recipe
A recipe owns semantic interpretation:
- accepted raw record shape
- target supertag
- ontology bundles to load
- natural key expectations
- field mapping
- link mapping
- domain-specific filtering
It does not perform direct I/O.
Ingest Job
An ingest job binds:
- one connector configuration
- one recipe
- one run mode (
import,watch, orsync) - optional cursor/deletion policy
This is the unit a product or agent can save, inspect, rerun, or schedule.
Compatibility
Existing DataSourceSpec remains supported as a preset business source.
Conceptually, a DataSourceSpec is now treated as a convenience bundle:
text
DataSourceSpec ~= ConnectorSpec + MappingRecipeSpec + default job policyThis lets existing sources keep working while new generic ingest features can be added at the correct layer.
The registry should expose both:
- concrete/preset datasources for compatibility
- core connectors and recipes for the clearer ingest model
Why
The engine's core job is not "support many source names".
Its core job is:
enumerate external facts, interpret them as ontology objects, and idempotently project them into the graph so reactive semantics can run
The connector/recipe split keeps each concern in the place where it can be tested and evolved independently.
It also removes a false equivalence:
file-csvis not the same kind of thing asobsidian-thoughthttp-jsonis not the same kind of thing asmedia-snapshotwechat-cliandwxark-httpare two ways to obtain the same business conversation facts
Alternatives Considered
1. Keep adding concrete DataSourceSpec entries
Rejected as the sole model.
Reason:
- every CSV/HTTP/SQL import becomes a new adapter even when only the mapping differs
- run mode and transport concerns stay mixed with business semantics
- products cannot offer a clean "connect this source, choose a recipe" flow
2. Collapse ingest into kernels
Rejected.
Reason:
- violates ADR-0002
- hides adapter semantics in effect functions
- encourages raw payload movement through DSL effects
3. Let recipes output raw JSON and make upsert generic
Rejected.
Reason:
- violates ADR-0004
- weakens typed fields and links
- pushes identity and reference conventions into payload discipline
4. Replace all existing datasources immediately
Rejected.
Reason:
- unnecessary churn
- higher regression risk
- current business adapters remain useful presets
Consequences
Positive
- ingest vocabulary matches the real architecture
- generic CSV/HTTP/SQL/file imports become possible without bespoke adapters
- WeChat CLI and wxark HTTP can share one semantic mapper
- products can inspect connectors, recipes, and saved jobs separately
CanonicalItemremains the graph boundary
Negative / Tradeoffs
- the registry gains more explicit concepts
- adapter authors must decide whether they are writing a connector, a recipe, or a preset source
- some existing
DataSourceSpecdocs need compatibility wording
We accept this because the extra names reflect real boundaries instead of inventing ceremony.
Design Rules
- Connectors may require capabilities; recipes may not perform direct I/O.
- Recipes may emit
CanonicalItem; connectors may not. - Watch/cursor/deletion behavior belongs to connectors or ingest jobs.
- Ontology bundles are declared by recipes or preset datasources.
- Existing preset datasources must continue to expose the old manifest surface until a product-facing migration exists.