The PDF Batch Process API handles batch processing of PDF documents for extraction and analysis.
Edge Function: pdf-batch-process
Base URL: https://bgbavxtjlbvgplozizxu.supabase.co/functions/v1/pdf-batch-process
All requests require authentication via Supabase Auth:
Authorization: Bearer <supabase_access_token>
Create a new batch processing job for multiple documents.
Method: POST
Path: /
Request:
{
documents: Array<{
documentId: string,
extractionType: 'markdown' | 'tables' | 'images' | 'all',
priority?: 'low' | 'normal' | 'high'
}>,
workspaceId?: string,
userId?: string,
options?: {
includeImages?: boolean,
includeMetadata?: boolean,
chunkSize?: number,
overlapSize?: number,
outputFormat?: 'json' | 'markdown',
maxConcurrent?: number,
notifyOnComplete?: boolean,
webhookUrl?: string
}
}
Response:
{
success: true,
data: {
batchId: string,
status: 'queued',
totalDocuments: number,
processedDocuments: 0,
failedDocuments: 0,
estimatedCompletionTime?: string,
results: Array<{
documentId: string,
status: 'pending',
extractionId?: string,
error?: string,
processingTime?: number
}>
}
}
Example:
const { data, error } = await supabase.functions.invoke('pdf-batch-process', {
body: {
documents: [
{
documentId: 'doc-123',
extractionType: 'all',
priority: 'high'
},
{
documentId: 'doc-456',
extractionType: 'markdown',
priority: 'normal'
}
],
workspaceId: 'workspace-789',
options: {
includeImages: true,
includeMetadata: true,
chunkSize: 1000,
overlapSize: 200,
outputFormat: 'json',
maxConcurrent: 3,
notifyOnComplete: true
}
}
});
Get the status of a batch processing job.
Method: GET
Path: /?batchId={batchId}
Query Parameters:
batchId (required): Batch job IDResponse:
{
success: true,
data: {
batchId: string,
status: 'queued' | 'processing' | 'completed' | 'failed' | 'partial',
totalDocuments: number,
processedDocuments: number,
failedDocuments: number,
createdAt: string,
updatedAt: string,
completedAt?: string,
results: Array<{
documentId: string,
status: 'pending' | 'processing' | 'completed' | 'failed',
extractionId?: string,
error?: string,
processingTime?: number
}>
}
}
Example:
const response = await fetch(
`${API_BASE}/pdf-batch-process?batchId=batch-123`,
{
headers: {
'Authorization': `Bearer ${session.access_token}`
}
}
);
Cancel a running or queued batch job.
Method: DELETE
Path: /?batchId={batchId}
Query Parameters:
batchId (required): Batch job IDResponse:
{
success: true,
message: 'Batch job cancelled successfully',
data: {
batchId: string,
status: 'cancelled',
processedDocuments: number,
cancelledDocuments: number
}
}
Example:
const response = await fetch(
`${API_BASE}/pdf-batch-process?batchId=batch-123`,
{
method: 'DELETE',
headers: {
'Authorization': `Bearer ${session.access_token}`
}
}
);
| Type | Description |
|---|---|
markdown |
Extract text content as markdown |
tables |
Extract tables from PDF |
images |
Extract images from PDF |
all |
Extract everything (markdown + tables + images) |
| Priority | Description | Processing Order |
|---|---|---|
high |
Urgent processing | Processed first |
normal |
Standard processing | Default queue |
low |
Background processing | Processed last |
queued → processing → completed
↘ failed
↘ partial (some docs failed)
Include image extraction in the processing (default: true)
Include document metadata in results (default: true)
Size of text chunks for processing (default: 1000)
Overlap between chunks (default: 200)
Output format for results: json or markdown (default: json)
Maximum concurrent document processing (default: 5)
Send notification when batch completes (default: false)
Webhook URL to call when batch completes
If webhookUrl is provided, a POST request will be sent on completion:
{
batchId: string,
status: 'completed' | 'failed' | 'partial',
totalDocuments: number,
processedDocuments: number,
failedDocuments: number,
completedAt: string,
results: Array<{
documentId: string,
status: string,
extractionId?: string,
error?: string
}>
}
{
success: false,
error: string,
statusCode?: number
}
Common Error Codes:
400 - Bad request (invalid documents array, missing documentId)401 - Unauthorized404 - Batch job not found500 - Internal server error