Reader Node
What Reader is and why it exists
Reader is the entry boundary for external information entering a NodeFox workflow. In directed-graph systems, the quality of downstream behavior depends heavily on how input arrives, how it is shaped, and how failures are surfaced. Reader exists so ingestion is explicit, inspectable, and testable instead of hidden inside custom code blocks.
In practice, Reader should be treated as a contract boundary rather than a convenience utility. It is where you decide what source of truth is allowed, what data shape can move forward, and what fallback route should execute if upstream data is unavailable or malformed.
Execution behavior
Reader emits payloads into output slots, and those payloads drive downstream eligibility. Depending on variant and configuration, Reader may execute once per run, once per item in a batch window, or once per iteration in list processing. Because NodeFox is deterministic at the orchestration layer, it is good practice to keep Reader output structure stable and immediately normalize it in Code or Data before policy branching.
Reader does not replace validation. It fetches or loads data. Validation, policy decisions, and write authorization should happen downstream in Decision and Writer-adjacent paths.
Variants and configuration details
Text variant
Text is used when source content is plain text and may need chunking. It is common for ingestion of notes, tickets, logs, or free-form documents.
| Field | What it controls | Practical guidance |
|---|---|---|
Path | Source path in registered storage handles | Use stable, environment-aware naming so the same network can run across staging and production. |
Split By | Delimiter used for segmentation | Use deterministic delimiters (\n, \n\n, custom tokens) and document the expected format in workflow notes. |
Batch | Number of segments emitted per cycle | Keep batch sizes bounded to avoid sudden fan-out cost spikes. |
Emit On Complete | Whether completion signal is emitted after final segment | Enable when downstream paths depend on explicit completion behavior. |
CSVExcel variant
CSVExcel is used for structured table ingestion from CSV or spreadsheet-like sources.
| Field | What it controls | Practical guidance |
|---|---|---|
Path | CSV/XLSX source path | Keep file naming conventions consistent with run timestamping and source ownership. |
Target Row Start/End | Row window to process | Useful for partial replays, checkpoints, and controlled backfills. |
Search Column / Search Value | Optional in-file filtering | Use only for deterministic key lookup patterns; avoid brittle fuzzy matching at ingest stage. |
Column Selections | Mapping from source columns to output slots | Document these mappings so downstream nodes can rely on stable slot semantics. |
File variant
File is used for binary content such as PDFs, images, archives, or raw blobs.
| Field | What it controls | Practical guidance |
|---|---|---|
Path | Binary source location | Route large binaries through controlled branches and avoid loading oversized payloads into unnecessary paths. |
API variant
API performs inbound HTTP calls and is typically used for system integrations, data pulls, and status lookups.
| Field | What it controls | Practical guidance |
|---|---|---|
URL | Endpoint (supports $N interpolation) | Keep URL construction deterministic and avoid optional query fields that change behavior unpredictably. |
HTTP Method | Request method (GET, POST, PUT, PATCH, DELETE) | Use method semantics that match source system contracts; do not overload a single endpoint for multiple actions without clear branch logic. |
API Key Reference | Credential reference from integrations/settings | Never hardcode secrets in node text fields. |
Headers | Request header map | Keep auth and version headers explicit and environment-specific. |
Response Type | Expected parse mode (JSON, Text, Blob) | Align response parsing to downstream slot expectations to reduce schema drift. |
CloudFile variant
CloudFile retrieves objects from cloud-backed storage providers.
| Field | What it controls | Practical guidance |
|---|---|---|
Provider | Cloud provider connection target | Isolate by workspace/environment to reduce blast radius. |
File Identifier | Object key or file id | Keep identifier resolution explicit in upstream branches. |
API Key Reference | Credential path | Pair with key rotation policy and provider-side scope limits. |
How to use Reader well in production
Reader performs best when followed by explicit normalization and validation. A common robust pattern is Reader -> Code(normalize) -> Data(extract) -> Decision(validate/policy). This keeps ingestion concerns separated from business logic and preserves traceability when incidents occur.
For high-volume ingestion, combine Reader batching with Buffer/Stack and explicit retry limits. For policy-sensitive workflows, route all upstream failure modes to deterministic fallback paths instead of allowing ambiguous null-like payloads to flow into high-impact branches.
Common mistakes to avoid
A common mistake is to treat Reader as an all-in-one ETL and policy layer. That usually creates brittle graphs where routing logic depends on inconsistent source shapes. Another frequent issue is overloading one Reader configuration for multiple source contracts; this reduces reproducibility and complicates rollback.
When in doubt, duplicate Reader boundaries by source contract and keep each configuration narrow and explicit.