Why US tax returns are a pain
A complete US tax return is rarely one form. It's a 1040 plus an unpredictable mix of schedules (A, B, C, D, E, K-1) and 1099s (NEC, MISC, INT, DIV, B, R) — sometimes 30+ pages of forms with field codes that humans can read at a glance and OCR systems consistently misread. Lenders and underwriters need every line accurate; a misread Schedule C net profit is a denial. Manual data entry takes hours per return; legacy OCR systems hit acceptable accuracy on the 1040 itself but fall over on schedules.
How fluex does it
fluex extracts US tax forms into a flat normalized schema keyed by IRS line number — so line_22_total_income on a 1040 is the same field whether the customer used TurboTax, H&R Block, or paper-filed. Every line is returned with a confidence score, a bounding box reference back to the source page, and a validation result against IRS arithmetic rules. Schedule attachments are parsed and linked to their parent form automatically.
Sample extraction output
What you get out of the box
Every line, every schedule
Full 1040, Schedules A/B/C/D/E/K-1, and the 1099 family. Returned as a flat JSON keyed by IRS line numbers.
Arithmetic validation
IRS line arithmetic rules are checked automatically. Inconsistencies are flagged before the document reaches your model.
Multi-year stitching
Process 2 or 3 years of returns at once and get a stitched view ready for underwriting (income trend, AGI history, deduction patterns).
PII-aware audit trail
SSNs and dependent identifiers are redacted in metadata by default. Configurable retention from 0 to 7 years.
Integration patterns
For mortgage lenders and underwriters who need fast tax-return parsing, fluex's async mode handles 50-100 page returns in under 30 seconds end-to-end. Direct integrations exist for Encompass, Blend and lender CRMs. The REST API can also drop a structured JSON straight into your underwriting model.
Compliance & trust
Tax returns are highly sensitive. fluex retains them encrypted at rest with per-tenant keys, with configurable retention (default 90 days, can be 0). PII is redacted in audit metadata by default. HIPAA BAA is available on Enterprise for healthcare-adjacent workflows. See our trust page for the full posture: encryption, tenant isolation, sub-processors, GDPR DPA, CCPA, SOC 2 Type II in progress, and HIPAA BAA on Enterprise.
Get started
Pay-per-page pricing means you can start an evaluation today without an annual commit. Most teams ship their first tax-return extraction into production within a week.