Information Retrieval

This section provides the API endpoints and data structures for retrieving information about your Imports, Files, and Documents. For real-time updates on processing status, use webhooks instead of polling these endpoints.

When to use these APIs:

  • Initial synchronization with current state
  • Recovery from missed webhook events
  • Bulk status checking for dashboards
  • Retrieving detailed entity structures and metadata

Import API

Retrieve information about your import operations using our REST API.

List Imports: GET /v1/imports/ Get Import: GET /v1/imports/{id}

Import Structure

1type Import = {
2 id: string;
3 accountId: string;
4 environmentId: string;
5 channel: "api" | "upload" | "email";
6 status: "processing" | "processed";
7 createdAt: string; // ISO 8601 timestamp
8 documents: number; // Total count of documents extracted from all files
9 files: File[]; // Array of files (see File structure below)
10 clientData?: object; // Custom metadata from upload
11 providerInfo?: object; // Provider-specific information
12};

Notes:

  • Import returns a list of all files, including ZIP files and their extracted contents
  • The documents field provides a total count of all documents extracted across all files
  • The channel field indicates how the import was created (API, web upload, or email)
  • Each file in the files array follows the File structure documented below
  • Compressed files (ZIP) have compressed: true and typically have an empty documentIds array
  • Files extracted from ZIP archives include parentFileId referencing the original compressed file
  • clientData contains custom metadata provided during upload (empty object if none)
  • Rejected files include an error field containing the error code for programmatic handling

For Import state definitions and transition logic, see Import Processing.

File API

Access file processing information and status updates.

File Structure

1type File = {
2 id: string; // Unique identifier
3 accountId: string;
4 environmentId: string;
5 companyId?: string;
6 importId: string;
7 filename: string;
8 mimeType: string; // MIME type (e.g., "application/pdf", "image/jpeg")
9 status: "pending" | "processing" | "processed" | "rejected";
10 compressed: boolean; // true for ZIP files
11 documentIds: string[]; // Array of document IDs extracted from this file
12 parentFileId?: string; // Reference to parent ZIP file (if extracted from ZIP)
13 error?: string; // Error code when status is "rejected" (e.g., "ERR_UNSUPPORTED_FILE_FORMAT")
14 clientData?: object; // Custom metadata from upload
15 createdAt: string; // ISO 8601 timestamp
16 updatedAt: string; // ISO 8601 timestamp
17};

Notes:

  • Each file includes an array of documentIds (following naming convention: array of IDs uses singular + “Ids” suffix)
  • The mimeType field indicates the file format (required field)
  • Compressed files (ZIP) have compressed: true and empty documentIds array
  • Files extracted from ZIP include parentFileId field referencing the original compressed file
  • Rejected files (status: “rejected”) include an error field containing the error code for programmatic handling
  • All responses use id field as primary identifier (not _id)
  • clientData contains custom metadata provided during upload (empty object if none)

For File state definitions and transition logic, see File Processing.

Documents API

Document processing can introduce latency when retrieving information by Document ID. Use webhooks for real-time updates.

List Documents: GET /v1/documents/ Get Document: GET /v1/documents/{id}

Document Structure

Full document structure is available in the API reference documentation.

For Document state definitions and transition logic, see Document Processing.

Technical Implementation

Best Practices

  • Use webhooks first - API polling should supplement webhook notifications, not replace them
  • Implement pagination - Large result sets are paginated for performance
  • Handle rate limits - Implement exponential backoff for rate limit responses

Error Handling

The API returns standard HTTP status codes:

  • 200 - Success
  • 400 - Bad request (validation errors)
  • 401 - Unauthorized (check API key)
  • 404 - Resource not found
  • 429 - Rate limit exceeded
  • 500 - Internal server error

Error responses include detailed information to help with debugging and resolution.

Polling Strategies

Not recommended in production environments. Use webhooks for real-time updates instead.

When webhooks are not available, use these polling patterns:

  • Initial load - Poll every 5-10 seconds for active processing
  • Background sync - Poll every 30-60 seconds for status updates
  • Exponential backoff - Gradually increase polling intervals (e.g., 5s → 10s → 20s → 40s → 80s → 160s → 300s) up to 5 minutes maximum, as document processing can be complex and time-consuming
  • Batch requests - Use list endpoints to reduce API calls