Training with Existing Materials
This guide explains how to train the system with existing materials, focusing on dataset import capabilities and model training.
Dataset Import Options
Our system supports multiple methods for importing existing material data:
1. Hugging Face Datasets
The system can import datasets directly from Hugging Face:
// Example dataset: Material in Context (MINC-2500)
// https://huggingface.co/datasets/mcimpoi/minc-2500_split_1
Import Process:
- Navigate to the Admin Panel → Datasets section
- Select "Import External Dataset"
- Enter the Hugging Face dataset ID:
mcimpoi/minc-2500_split_1
- Configure import options:
- Map dataset categories to system material types
- Set maximum images per class (recommended: 250-500)
- Enable/disable metadata import
- Select specific classes to import (optional)
- Start the import process
The system will:
- Download dataset samples from Hugging Face
- Extract category and image information
- Map external dataset fields to internal metadata fields
- Create properly categorized material samples
2. CSV/Structured Dataset Import
For CSV or structured datasets with mapped fields:
- Prepare a CSV with columns mapping to system fields
- Include material type, properties, and image file paths
- Upload through the Admin Panel → Datasets → Import CSV
3. Local Directory Import
For datasets stored in local directories:
- Organize materials by category in subdirectories
- Specify the root directory path during import
- The system will analyze the structure and suggest mappings
Field Mapping System
The system includes a flexible field mapping capability:
- Maps external dataset fields to internal metadata fields
- Provides predefined mappings for common material datasets
- Allows custom mapping configuration
- Handles automatic property extraction
Training Models with Imported Materials
After importing materials, you can train recognition models:
- Navigate to Admin Panel → Models → Training
- Select the imported dataset from available datasets
- Configure training parameters:
- Base model (ResNet, MobileNet, etc.)
- Batch size and learning rate
- Number of epochs
- Transfer learning settings
- Start training
- Monitor progress in real-time
- Evaluate model performance with validation metrics
Detailed Training Process
The training process involves several sophisticated steps:
-
Model Initialization
- Base models are loaded dynamically from ML framework libraries (not stored in our repo)
- TensorFlow or PyTorch pre-trained architectures (MobileNetV2, ResNet, EfficientNet)
- Classification layers are replaced with custom layers for material recognition
-
Transfer Learning Optimization
- Initial layers of base models are frozen to preserve general features
- Only the top classification layers are trained initially
- Sparse categorical cross-entropy loss is used for classification tasks
- Adam optimizer with carefully tuned learning rate (typically 0.0001)
- Later training phases gradually unfreeze more layers for fine-tuning
-
Training Enhancement Techniques
- Early stopping with validation loss monitoring
- Learning rate reduction on plateau
- Data augmentation specific to material properties
- Regularization to prevent overfitting (dropout, L2)
-
Model Storage and Versioning
- Trained models are saved with metadata in the output directory structure:
/models/
├── /
│ ├── model.h5 (or .pt for PyTorch)
│ ├── metadata.json
│ ├── training_history.json
│ └── hyperparameters.json- Complete training history is preserved for analysis
- Models are versioned for tracking improvements
-
Vector Database Integration
- The trained models generate embeddings for all materials
- These embeddings (not the models themselves) are stored in the vector database
- FAISS indexing enables efficient similarity search
- Each material's embedding links to knowledge base entries through material IDs
Example: Training with MINC-2500
For the MINC-2500 dataset, which contains material images across 10 categories:
-
Import the dataset using the Hugging Face importer:
- Use ID:
mcimpoi/minc-2500_split_1
- The system will automatically map categories like 'wood', 'metal', 'fabric', etc.
- System will assign appropriate material types based on content
- Use ID:
-
Train a material recognition model:
- Use transfer learning on a pre-trained image model
- Configure 10-20 epochs for good results
- Enable data augmentation for improved generalization
- Set learning rate to ~0.0001 for stable training
- Apply sparse categorical cross-entropy loss for classification
-
Evaluate results:
- The system will display accuracy per material category
- Review performance metrics to identify areas for improvement
- Test with sample images to verify recognition quality
- Analyze confusion matrix to understand misclassifications
- Review embedding quality metrics for similarity search applications
-
Model Deployment:
- The trained model is automatically versioned and stored
- Embeddings are generated for all materials in the dataset
- Vector database is updated with new embeddings
- Recognition system starts using the new model immediately
Implementation Notes
- The dataset importer supports automatic detection of dataset structure
- Field mapping can be customized for specific material types
- Metadata extracted from datasets is stored for future reference
- Training parameters are automatically optimized based on dataset characteristics
For more advanced training techniques, see the ML Training Documentation.