ADR-0008: Split Ingest Into Connector, Recipe, and Job

Status: Accepted
Date: 2026-06-21

Context

runex currently exposes inbound sources as DataSourceSpec.

That object works well for concrete business adapters such as obsidian-thought, media-snapshot, and wechat-session: each source self-describes its params, capability requirements, watchability, ontology bundles, and adapter factory.

But the name "datasource" now carries too many meanings:

transport/interface: file, HTTP, SQLite, CSV, local command
business interpretation: Thought, WeChatConversation, media work, lead
mapping: raw field names to typed CanonicalItem fields and links
run mode: one-shot import, watch, sync, deletion propagation

This makes the ingest surface unclear. A one-off thought import, a WeChat conversation pull, a CSV upload, a SQL query, and an HTTP JSON endpoint all look like peers even though they sit at different layers of the pipeline.

At the same time, ADR-0002 and ADR-0004 already establish two constraints:

external adapters must not be collapsed into kernels
CanonicalItem remains the typed graph boundary

So the solution cannot be "just make ingest a kernel" or "pass raw JSON through the DSL".

Decision

We split inbound ingest into three explicit concepts:

Connector: how raw external records are enumerated
Recipe: how those raw records become ontology semantics
Ingest job: how a connector and recipe are run together

The preferred shape is:

text

SourceEndpoint -> RawRecord -> MappingRecipe -> CanonicalItem -> Store
     connector       raw fact      semantic map       typed item      graph

CanonicalItem remains the only typed boundary consumed by the store ingest pipeline. The new split happens before that boundary.

Definitions

Connector

A connector owns transport/interface concerns:

file enumeration
CSV parsing
HTTP JSON fetch
SQLite read-only query
local command JSON output
WeChat CLI or wxark server access

A connector declares:

params
required capabilities
raw record shape
optional watch/sync support

It does not know which supertag a record becomes.

Recipe

A recipe owns semantic interpretation:

accepted raw record shape
target supertag
ontology bundles to load
natural key expectations
field mapping
link mapping
domain-specific filtering

It does not perform direct I/O.

Ingest Job

An ingest job binds:

one connector configuration
one recipe
one run mode (import, watch, or sync)
optional cursor/deletion policy

This is the unit a product or agent can save, inspect, rerun, or schedule.

Compatibility

Existing DataSourceSpec remains supported as a preset business source.

Conceptually, a DataSourceSpec is now treated as a convenience bundle:

text

DataSourceSpec ~= ConnectorSpec + MappingRecipeSpec + default job policy

This lets existing sources keep working while new generic ingest features can be added at the correct layer.

The registry should expose both:

concrete/preset datasources for compatibility
core connectors and recipes for the clearer ingest model

Why

The engine's core job is not "support many source names".

Its core job is:

enumerate external facts, interpret them as ontology objects, and idempotently project them into the graph so reactive semantics can run

The connector/recipe split keeps each concern in the place where it can be tested and evolved independently.

It also removes a false equivalence:

file-csv is not the same kind of thing as obsidian-thought
http-json is not the same kind of thing as media-snapshot
wechat-cli and wxark-http are two ways to obtain the same business conversation facts

Alternatives Considered

1. Keep adding concrete `DataSourceSpec` entries

Rejected as the sole model.

Reason:

every CSV/HTTP/SQL import becomes a new adapter even when only the mapping differs
run mode and transport concerns stay mixed with business semantics
products cannot offer a clean "connect this source, choose a recipe" flow

2. Collapse ingest into kernels

Rejected.

Reason:

violates ADR-0002
hides adapter semantics in effect functions
encourages raw payload movement through DSL effects

3. Let recipes output raw JSON and make upsert generic

Rejected.

Reason:

violates ADR-0004
weakens typed fields and links
pushes identity and reference conventions into payload discipline

4. Replace all existing datasources immediately

Rejected.

Reason:

unnecessary churn
higher regression risk
current business adapters remain useful presets

Consequences

Positive

ingest vocabulary matches the real architecture
generic CSV/HTTP/SQL/file imports become possible without bespoke adapters
WeChat CLI and wxark HTTP can share one semantic mapper
products can inspect connectors, recipes, and saved jobs separately
CanonicalItem remains the graph boundary

Negative / Tradeoffs

the registry gains more explicit concepts
adapter authors must decide whether they are writing a connector, a recipe, or a preset source
some existing DataSourceSpec docs need compatibility wording

We accept this because the extra names reflect real boundaries instead of inventing ceremony.

Design Rules

Connectors may require capabilities; recipes may not perform direct I/O.
Recipes may emit CanonicalItem; connectors may not.
Watch/cursor/deletion behavior belongs to connectors or ingest jobs.
Ontology bundles are declared by recipes or preset datasources.
Existing preset datasources must continue to expose the old manifest surface until a product-facing migration exists.

ADR-0008: Split Ingest Into Connector, Recipe, and Job ​

Context ​

Decision ​

Definitions ​

Connector ​

Recipe ​

Ingest Job ​

Compatibility ​

Why ​

Alternatives Considered ​

1. Keep adding concrete DataSourceSpec entries ​

2. Collapse ingest into kernels ​

3. Let recipes output raw JSON and make upsert generic ​

4. Replace all existing datasources immediately ​

Consequences ​

Positive ​

Negative / Tradeoffs ​

Design Rules ​

Related Documents ​

ADR-0008: Split Ingest Into Connector, Recipe, and Job

Context

Decision

Definitions

Connector

Recipe

Ingest Job

Compatibility

Why

Alternatives Considered

1. Keep adding concrete `DataSourceSpec` entries

2. Collapse ingest into kernels

3. Let recipes output raw JSON and make upsert generic

4. Replace all existing datasources immediately

Consequences

Positive

Negative / Tradeoffs

Design Rules

Related Documents