Complete reference for the duplicate detection and product merging system.
The Duplicate Material Detection & Merging system helps maintain data quality by identifying and consolidating duplicate products in the knowledge base. CRITICAL: Duplicates are ONLY detected when products are from the SAME factory/manufacturer.
Duplicates are defined by factory/manufacturer match, NOT visual similarity:
Three-Layer Matching (After Factory Verification):
Layer 1: Factory Match (REQUIRED)
factory, manufacturer, factory_group, brand, companyLayer 2: Name Similarity (50% weight)
Layer 3: Description & Metadata (30% + 20% weight)
Overall Score Calculation: overall_score = (name_sim × 0.50) + (desc_sim × 0.30) + (meta_sim × 0.20)
Confidence Levels:
Tracks all merge operations with full audit trail and undo capability.
Fields: id, workspace_id, merged_at, merged_by, source_product_ids, source_product_names, target_product_id, target_product_name, similarity_score, merge_reason, merge_strategy ('manual', 'auto', 'suggested'), source_products_snapshot, target_product_before_merge, target_product_after_merge, is_undone, undone_at, undone_by.
Stores pre-computed duplicate pairs for quick lookup.
Fields: id, workspace_id, product_id_1, product_id_2, overall_similarity_score, name_similarity, description_similarity, metadata_similarity, similarity_breakdown, is_duplicate, confidence_level ('high', 'medium', 'low'), status ('pending', 'reviewed', 'merged', 'dismissed'), reviewed_by, reviewed_at.
Endpoint: POST /api/duplicates/detect
Find potential duplicates for a specific product.
The request takes product_id, workspace_id, and an optional similarity_threshold. The response includes a list of matching products with their overall_similarity, name_similarity, description_similarity, metadata_similarity, and confidence_level.
CRITICAL: Returns empty list if product has no factory metadata.
Endpoint: POST /api/duplicates/batch-detect
Scan entire workspace for duplicate products.
The request takes workspace_id, similarity_threshold, and limit. The response includes all detected duplicate pairs, each showing both product IDs, names, shared factory, overall similarity, and confidence level.
Endpoint: GET /api/duplicates/cached
Retrieve cached duplicate detections.
Query Parameters:
workspace_id (required): Workspace to querystatus (optional): Filter by status ('pending', 'reviewed', 'merged', 'dismissed')min_similarity (optional): Minimum similarity score (default: 0.60)Endpoint: POST /api/duplicates/update-status
Update the status of a cached duplicate detection.
The request takes cache_id, status, and user_id.
Valid Statuses:
pending - Not yet reviewedreviewed - Admin has reviewedmerged - Products have been mergeddismissed - Not actually duplicatesEndpoint: POST /api/duplicates/merge
Merge duplicate products into a single product.
The request takes target_product_id, source_product_ids, workspace_id, user_id, merge_strategy, and merge_reason.
Merge Process:
Data Merge Strategy:
Endpoint: POST /api/duplicates/undo-merge
Undo a product merge operation.
The request takes history_id and user_id.
Undo Process:
Endpoint: GET /api/duplicates/merge-history
Retrieve merge history for a workspace.
Query Parameters:
workspace_id (required): Workspace to querylimit (optional): Maximum results (default: 50)The response includes a list of merge records with merged_at, merged_by, source_product_names, target_product_name, similarity_score, merge_strategy, and is_undone.
Factory information is extracted from product metadata in priority order: factory (Primary), manufacturer (Secondary), factory_group (Tertiary), brand (Fallback), company (Last resort). The extracted value is normalized to lowercase.
Name Similarity: Uses sequence matching with normalization — lowercase, trimmed comparison of product names.
Description Similarity: Text similarity using word overlap — counts common words divided by the maximum word count of either description.
Metadata Similarity: Property comparison (excluding factory keys) — counts matching metadata properties divided by total properties.
Send a POST request to /api/duplicates/detect with the product_id, workspace_id, and similarity_threshold in the request body, including your authorization token.
Send a POST request to /api/duplicates/merge specifying the target_product_id, source_product_ids array, workspace_id, user_id, merge_strategy, and merge_reason.
Send a POST request to /api/duplicates/undo-merge with the history_id and user_id.
When integrating with the platform:
PDF Processing Pipeline
/api/duplicates/detect during product creationAdmin Dashboard
Batch Operations
/api/duplicates/batch-detect for workspace cleanupPossible Causes:
Solution:
Possible Causes:
Solution:
Possible Causes:
Solution:
Last Updated: November 9, 2025 Status: Production Ready API Version: 1.0