Processing Tracking

Invofox tracks the main elements mentioned in our core concepts through a simplified state system:

  • Imports: Operations processing one or more files
  • Files: Individual uploaded files
  • Documents: Processed outputs with extracted data

Each element progresses through clearly defined states, generating webhook events when states change. This provides real-time visibility into your document processing pipeline.

For API endpoints to retrieve current state information, see Information Retrieval.

Import Processing

States and Transition Logic

  • processing - Import created and files are being processed
  • processed - All files have finished processing (either processed or rejected)

Transition Logic: An Import moves to processed when every File in the Import has reached either processed or rejected state.

Import Lifecycle

Only the processed state triggers webhook events for Imports.

Webhook Events

  • import.processed - Triggered when an Import reaches processed state

File Processing

States and Transition Logic

  • pending - File registered from Import or extracted from ZIP
  • processing - File ingested and being processed internally
  • rejected - File rejected before processing (invalid format, configuration issues, corruption)
  • processed - File has generated Documents and all Documents are processed

Transition Logic: A File moves to processed when:

  • Single Document: The associated Document reaches processed state
  • Multiple Documents (Splitter): All associated Documents reach processed state
  • ZIP File: All extracted Files reach either processed or rejected state

File Lifecycle

Webhook events are triggered when Files reach final states: rejected or processed.

File Error Codes

When a File reaches rejected status, it includes an error field containing one of the following error codes:

Error CodeDescription
ERR_DOWNLOADING_FILEError downloading file from source
ERR_UNSUPPORTED_FILE_FORMATFile format not supported
ERR_UNSUPPORTED_UNCOMPRESSED_FILE_FORMATUncompressed file format not supported (extracted from archive)
ERR_ZIP_ENCRYPTEDZIP file is password-protected or encrypted

Note: Additional error codes may be added in future versions. The error codes are returned as strings in the error field for programmatic handling.

Webhook Events

  • file.rejected - Triggered when a File reaches rejected state
  • file.processed - Triggered when a File reaches processed state

Document Processing

States and Transition Logic

Documents have two state systems:

Processing Status (internal):

  • processing - Document assigned ID and being processed
  • processed - Automatic processing completed

Public States (workflow):

  • pendingCorrection - Document awaiting manual corrections or review
  • approved - Document approved (manual or automatic)
  • discarded - Document discarded (manual or automatic)
  • processing - ⚠️ Deprecated - Legacy public state, use processing status instead
  • exported - ⚠️ Deprecated - Legacy exported state
  • error - ⚠️ Deprecated - Legacy error state

Note: The document workflow system including publicState field and related events (approved, discarded, updated) is currently being reevaluated and refined. Implementation details may change in future versions.

Transition Logic: A Document moves from processing to processed status when all automatic extraction and processing logic completes. Separately, documents can be assigned public workflow states (pendingCorrection, approved, discarded) for business process management.

Document Lifecycle

Webhook events are triggered for both processing states and public workflow states.

Processing Status Flow:

Public State Flow (separate from processing):

Webhook Events

Processing Events:

  • document.created - Triggered when a Document is created and immediately enters processing status
    • Document is assigned an ID and enters the processing pipeline
    • Note: There are no intermediate states - the document is created and immediately transitions to processing status. At this point the document will be empty. The API will return a 202 Accepted status indicating the document is reserved but not yet available for retrieval. Full document data will be available after document.processed event.
  • document.processed - Triggered when a Document reaches processed status
    • Automatic extraction and processing completed
    • Document status becomes processed
    • Contains all extracted information

Workflow Events (🚧 Work in Progress):

⚠️ Important: The following workflow events are proposals and are currently being reevaluated. Implementation details and event structures may change.

  • document.updated - 🚧 Proposal: Triggered when document fields are modified (including publicState changes)
    • Sends updated document data
    • Can occur multiple times during document lifecycle
    • Includes changes to extracted fields and public state transitions
  • document.approved - ⚠️ Triggered when a Document’s publicState changes to approved
    • Currently under reevaluation
  • document.discarded - ⚠️ Triggered when a Document’s publicState changes to discarded
    • Currently under reevaluation

For webhook configuration and implementation details, see Webhook Implementation.

Special Cases

These special processing scenarios affect how state transitions work for complex file types and configurations.

Document Splitting

When a File generates multiple Documents (e.g., multi-page PDF):

  1. File is in processing state while being processed, which results in Document creation
  2. Each Document follows independent lifecycle
  3. File reaches processed only when all Documents reach processed state

This splitting can happen automatically using Invofox’s Splitter premium feature, which intelligently detects document boundaries and creates separate Document records.

The File’s metadata will include information about page splits as shown in ImportInfo Structure.

For Document state details and API structure, see Document Structure.

ZIP Files

When a ZIP file is uploaded through multiple file upload endpoints:

  1. ZIP file is accepted and immediately moves from pending to processing status
  2. ZIP is extracted, creating child Files
  3. Parent File reaches processed only when all child Files reach processed or rejected
  4. Child Files follow normal File lifecycle independently

Important: ZIP files are only supported through multiple file upload endpoints (Direct Upload and URL Upload). ZIP files are processed asynchronously and transition directly to pending state upon acceptance.

For advanced ZIP processing workflows, you can combine Splitter + Classifier premium features to automatically separate and identify different document types within ZIP archives.

The parent File’s file.processed event will include information about all extracted child Files. For compressed file metadata structure, see File API Structure.