ReAct Architecture for Documents: Why fluex Chose Agentic Extraction

In this piece

The three default approaches and where they break
What ReAct means in a documents context
How fluex implements ReAct
Three worked examples
Tradeoffs: cost, latency, complexity
When ReAct is overkill
Audit and debuggability — the underrated win
Closing

"Agentic" became a marketing word in 2024 and a load-bearing engineering decision in 2026. We get the question often: is ReAct an actual architectural choice for document extraction, or a fashionable name for "we call an LLM in a loop"? This piece is the long answer.

The three default approaches — and where they break

Most document-extraction systems are built on one of three patterns. Each works for a narrow class of problems and fails on the others.

1. Classical OCR + rules

Tesseract or AWS Textract or Google Vision pull text out of an image; regex and positional rules turn the text into structured fields. This works beautifully for standardized forms — a US W-2 from one payroll provider, a specific bank's statement layout. It breaks the moment the layout changes. Every new template is a new rule set, and a vendor with this architecture either freezes their template list or lives in permanent catch-up mode.

2. Pure RAG (retrieval-augmented generation)

OCR the document into chunks, retrieve relevant chunks for each field, ask the LLM to produce the answer. RAG works well for free-text documents with conversational queries — contracts, policies — and badly for structured documents where the LLM is expected to find a specific number on a specific line. RAG also struggles when fields require cross-referencing between document sections (compute YTD as a check on period totals) because retrieval is per-query, not per-document.

3. Single-pass LLM extraction

Send the whole document (as image or extracted text) to a vision-language model with a schema, get JSON back. This is the simplest and the fastest path to a demo. It also has the highest variance in production — a single LLM call has no opportunity to verify, retry, or cross-check, and the output reflects whatever the model "decided" in one inference. For compliance-critical workflows where one wrong number is one denied loan or one rejected claim, single-pass is too brittle.

Each of these works at the corner of the problem space they're optimized for. Document AI in 2026 needs something that holds up across the full space — multiple document types, unfamiliar layouts, fields that require validation, audit trail that survives.

What ReAct means in a documents context

ReAct — "Reason + Act" — comes from the 2022 paper of the same name (Yao et al). The core idea is simple: an LLM doesn't just generate the answer; it produces a reasoning trace interleaved with calls to external tools, and uses tool results to inform subsequent reasoning. The model acts and observes, like an agent rather than a function.

Translated to document extraction, ReAct means the system has a small number of capabilities — extract a field, search the document, validate against rules, query a knowledge base, ask a clarifying sub-LLM — and a planner that decides which to invoke when. A single extraction request becomes a short trace of reasoning steps and tool calls, each of which produces an observation that feeds the next step.

"A single extraction request becomes a short trace of reasoning steps and tool calls, each of which produces an observation that feeds the next step."

The benefit isn't that the LLM is smarter; it's that the system can do things a single LLM call can't: try a second extraction approach if the first fails validation, look up the correct line number in a different schedule before answering, ask a smaller specialized model to handle MRZ-style structured data while the larger reasoning model orchestrates. The architecture is the integration of these capabilities, not the capabilities themselves.

How fluex implements ReAct

Our extraction pipeline has four roles, each backed by a specialized component:

The planner

A reasoning-strong LLM (currently Claude or GPT-4-class, configurable per tenant) receives the document and the requested schema. It produces an extraction plan: which fields to fetch first, which fields depend on others, which need validation steps, where cross-references are expected. The plan is structured — JSON, not free text — and is cached on a per-document-type-and-schema basis so the planner doesn't re-run the same plan for similar documents.

The extractors

One or more extraction calls, each scoped to a subset of the schema and a region of the document. Some fields go through a vision-language model; some go through a specialized OCR + structured-prompt path; some go through a deterministic parser (e.g., MRZ on passports has a specification — no LLM needed). Extractors are interchangeable — a tenant configured to avoid OpenAI gets Anthropic-only extractors; a tenant in a no-LLM workflow gets a deterministic-only extractor for documents that support it.

The validators

After extraction, validators run. These are deterministic Python — IRS arithmetic rules, MRZ checksum, bank-statement balance reconciliation, format checks. Validators produce a confidence delta on each field: a field that passes its validator gets confidence boosted; one that fails has confidence reduced and gets flagged for the post-processor.

The post-processor

The post-processor is where ReAct earns its keep. Given a partially-validated extraction, it decides what to do next: re-extract a failing field with a different prompt, ask a clarifying sub-LLM to resolve ambiguity ("which page contains Schedule C line 31?"), apply business rules ("if total is missing, use the largest line item" — only if explicitly allowed), or escalate to a human review queue. Each of these is a tool the planner could have anticipated, but real documents arrive with surprises that the initial plan didn't account for.

The full trace — planner output, extractor calls, validator results, post-processor decisions — is the audit record. We'll come back to that.

Three worked examples

Example 1: A W-2 with a layout the model hasn't seen

An employer using a small payroll provider with a non-standard W-2 layout. Single-pass LLM gets 11 of 14 boxes right; misses Box 12 and 14 (which contain employer-specific codes). A pure-OCR system would fail on field positions because the layout is unfamiliar.

The ReAct trace:

Planner: "This is a W-2. Extract boxes 1-14, expect codes in boxes 12 and 14, validate arithmetic across boxes 1, 3, 5."
Extractor: returns boxes 1-11 with high confidence, boxes 12 and 14 with low confidence.
Validator: arithmetic on boxes 1, 3, 5 passes. Boxes 12 and 14 are flagged.
Post-processor: "Re-extract boxes 12 and 14 with a prompt that lists known IRS code letters and asks for explicit code identification."
Second extractor pass: returns box 12 codes (DD: $8,420; D: $19,500) and box 14 (NY state-disability).
Validator: confirms box 12 codes are valid IRS codes.
Result: 14 of 14 boxes correct, total latency 4.2 seconds, two LLM calls.

A single-pass system would have shipped 11/14 with a confidence number that didn't reflect the missing fields. ReAct trades 1.5 seconds and a second LLM call for full accuracy.

Example 2: A 1040 with a Schedule C cross-reference

A self-employed taxpayer's 1040 line 8 ("other income") references a Schedule C net profit. The 1040 says line 8 = $48,200; the Schedule C net profit says $47,800. Off by $400. Single-pass extraction would return the 1040 value with high confidence and never notice.

The ReAct trace:

Planner: "1040 detected. Schedule C is referenced from line 8. After 1040 extraction, validate the Schedule C cross-reference."
Extractor: pulls 1040 fields and Schedule C fields.
Validator: detects line-8 vs Schedule-C-net-profit mismatch ($400 difference).
Post-processor: "Re-extract Schedule C line 31 with a focused prompt; if confirmed, flag the 1040 for human review with both values surfaced."
Result: returned to the underwriter with a clear note that line 8 disagrees with Schedule C, both values surfaced, source pages cited. The underwriter caught a customer-error filing they would have missed. The audit trail proves the catch.

This kind of cross-validation is the structural advantage of ReAct over pure RAG. Retrieval-only systems answer per-query; they don't proactively verify across the document.

Example 3: A KYC passport with a face-match failure

A KYC verification request: passport image plus selfie. Passport extraction succeeds with high confidence. Face-match returns 0.62 — below the configured 0.80 threshold.

The ReAct trace:

Planner: "Passport extraction + face-match + liveness. If face-match is borderline, run a second face-match against the passport's secondary photo (chip data) before deciding."
Extractor: passport data + photo, OK. Selfie face-match returns 0.62.
Validator: 0.62 < 0.80, flagged.
Post-processor: "Try the chip-data secondary photo if MRZ confirms it's available."
Second extractor: passport chip data is present; second face-match against chip photo returns 0.91.
Result: PASS with both face-match scores surfaced and the chip-data path documented in the audit trail. The customer who would have been falsely rejected at 0.62 is correctly onboarded; the audit trail proves the additional check was performed.

A single-pass system has no way to ask a follow-up question. ReAct makes follow-ups a first-class architectural primitive.

Tradeoffs: cost, latency, complexity

Cost

ReAct uses more LLM tokens than single-pass. In our production data, the average extraction uses 1.4 LLM calls (the 0.4 is post-processor escalations). On the documents where ReAct earns its keep — non-trivial extractions with cross-references or validation edge cases — the multiplier is closer to 2x. The cost increase is real but bounded: doubling a $0.001 extraction cost is $0.002, not a structural concern at fluex's per-page pricing.

Latency

Single-pass extraction averages 1.8 seconds in our pipeline. ReAct extraction averages 2.7 seconds for typical documents and up to 6 seconds for documents requiring the full post-processing escalation. For sync use cases (in-app KYC, embedded payments), we keep the typical case under 3 seconds and surface escalations as async webhook callbacks. For async workflows (lending packages, claims), the extra second isn't visible to anyone.

Complexity

This is the real cost of ReAct. The pipeline is genuinely more complex than a single LLM call. Components have to be testable independently. The planner-extractor-validator- post-processor contract has to be stable across iterations. Failure modes are richer — ReAct has new failure modes (the planner produces a bad plan, the post-processor escalates a recoverable case to humans, validators disagree with extractors) that single-pass systems don't have. Engineering investment in observability and component-level tests is higher up-front.

We made the call that the complexity is worth it. Document extraction is a long-tail problem — the 5% of weird documents create most of the customer-visible failures, and ReAct is structurally better at the long tail than single-pass approaches.

When ReAct is overkill

An honest section. ReAct isn't always the right answer.

One document type, one layout, high volume. If you're processing one form from one issuer, build a deterministic parser. Tesseract + regex hits 99%+ accuracy on a single, stable layout for a fraction of the cost of any LLM-based system.
Free-text knowledge retrieval where validation isn't structural. Question-answering against a contract corpus is RAG's home turf. ReAct adds complexity without much benefit when the answers are paragraphs, not numbers.
Single-pass is sufficient for low-stakes outputs. Not every document AI use case is loan underwriting. A receipt-categorization workflow where errors are cheap can run on single-pass extraction and skip the post-processor entirely.

fluex configures the architecture per workflow. A KYC workflow uses the full ReAct pipeline; a simple receipt-categorization workflow runs single-pass with a cheap validation step. The platform is the same; the configuration adapts.

Audit and debuggability — the underrated win

The architectural argument for ReAct is usually framed in terms of accuracy. The operational argument is at least as important: the trace is the audit record.

For every extraction, fluex stores the planner output, every extractor call (with prompt hash and model version), every validator result, and every post-processor decision — immutable, queryable, retained per the customer's policy. When a customer asks "why did you extract this number from this document on this date," we answer it as a database query, not an investigation. When an auditor pulls a sample, the trace is the evidence. When we ship a model upgrade, replaying the previous month's traces against the new model tells us the regression delta before the new model touches production traffic.

Single-pass extraction makes none of this easy. The audit trail collapses to "we sent the document and the LLM returned this." Cross-referencing model behavior across a quarter of traffic requires log archaeology that ReAct's structured traces obviate. We've written about this elsewhere — see our piece on SOC 2 architecture for AI startups for the audit framing, and look for our follow-up on tracing agentic extraction for the engineering specifics.

Closing

ReAct isn't a buzzword for "LLM in a loop." It's a specific architectural choice that trades complexity for the ability to verify, retry, cross-check, and reason about failures. For document AI on real-world documents — where the long tail dominates the customer experience — that tradeoff is worth making.

The complexity isn't free. ReAct demands more engineering investment in observability, component-level testing, and audit infrastructure than single-pass systems. The teams that do this work end up with a pipeline that handles weird documents gracefully and produces an audit trail that holds up to scrutiny. The teams that skip it discover the long tail the hard way, in production, on the documents that matter most.

For a hands-on look at fluex's ReAct pipeline, see the API reference or talk to our team. For the security and audit posture that goes with it, see our trust page.

ReAct architecture for documents.

The three default approaches — and where they break

1. Classical OCR + rules

2. Pure RAG (retrieval-augmented generation)

3. Single-pass LLM extraction

What ReAct means in a documents context

How fluex implements ReAct

The planner

The extractors

The validators

The post-processor

Three worked examples

Example 1: A W-2 with a layout the model hasn't seen

Example 2: A 1040 with a Schedule C cross-reference

Example 3: A KYC passport with a face-match failure

Tradeoffs: cost, latency, complexity

Cost

Latency

Complexity

When ReAct is overkill

Audit and debuggability — the underrated win

Closing