Material Scraping Workflow Guide

Quick Start

Step 1: Enter Your URL

Start by entering the website URL or search query you want to scrape.

Examples:

Step 2: Choose Scraping Mode

🎯 Single Page

When to use: Testing extraction on one specific page

Example: Scrape one product page to test field mappings

πŸ—ΊοΈ Sitemap

When to use: You have a sitemap.xml with product URLs

Example: https://example.com/sitemap.xml β†’ scrapes all product URLs

πŸ•·οΈ Crawl

When to use: Auto-discover all pages on a website

Example: Start at homepage, find all product pages automatically

πŸ” Search

When to use: Find pages via search engines

Example: "marble suppliers Greece" β†’ finds and scrapes relevant pages

πŸ“‹ Map

When to use: Get URL list without scraping content

Example: Get all URLs from a website to review before scraping

Step 3: Configure Field Mappings

Define what data to extract from each page:

Standard Fields:

Custom Fields: You can add custom fields based on your needs.

Step 4: Preview Extraction

Before running the full scrape:

  1. System scrapes one sample page
  2. Shows extracted materials
  3. You review the data quality
  4. Adjust field mappings if needed
  5. Confirm to proceed

Step 5: Run Full Scrape

Once confirmed:

Scraping Mode Comparison

Feature Single Page Sitemap Crawl Search Map
Speed ⚑⚑⚑ ⚑⚑ ⚑ ⚑⚑ ⚑⚑⚑
Pages 1 10-1000 10-1000 10-100 100-10000
Discovery Manual Sitemap Auto Search Auto
Best For Testing Bulk Unknown Research Planning
Complexity Simple Medium High Medium Low

Configuration Tips

Firecrawl Options

Essential Settings:

Output Formats:

Performance Settings:

Common Workflows

Workflow 1: Test Single Product

  1. Mode: Single Page
  2. URL: One product page
  3. Preview: Review extraction
  4. Adjust: Fix field mappings
  5. Scale: Switch to Sitemap/Crawl mode

Workflow 2: Bulk Scrape E-commerce

  1. Mode: Sitemap
  2. URL: https://example.com/sitemap.xml
  3. Max Pages: 100
  4. Preview: Test on first page
  5. Run: Process all pages

Workflow 3: Discover Suppliers

  1. Mode: Search
  2. Query: "ceramic tile suppliers Spain"
  3. Max Results: 20
  4. Preview: Review found pages
  5. Run: Scrape all results

Workflow 4: Map Then Scrape

  1. Mode: Map
  2. URL: https://example.com
  3. Get: All URLs
  4. Review: Filter product URLs
  5. Switch: Use Sitemap mode with filtered URLs

Troubleshooting

No Materials Found

Missing Images

Timeout Errors

Rate Limit Errors

Best Practices

Before Scraping:

  1. βœ… Test with single page first
  2. βœ… Review website's robots.txt
  3. βœ… Set reasonable page limits
  4. βœ… Configure field mappings
  5. βœ… Preview before full scrape

During Scraping:

  1. βœ… Monitor progress
  2. βœ… Check for errors
  3. βœ… Review extracted data
  4. βœ… Adjust if needed

After Scraping:

  1. βœ… Verify data quality
  2. βœ… Check image URLs
  3. βœ… Review embeddings
  4. βœ… Test search functionality

Next Steps

After successful scraping:

  1. Materials Created: View in Materials section
  2. Embeddings Generated: Ready for AI search
  3. Chunks Created: Optimized for AI processing
  4. Search Enabled: Find materials semantically

Support

For issues or questions: