Sources

A Source is a website or URL that feeds data into an aggregator. Each source has its own extraction configuration and schedule.

What is a Source?

A source represents a single data feed. For example, if you’re building a job board aggregator, each company’s career page would be a separate source:

https://jobs.lever.co/acme → Source 1
https://boards.greenhouse.io/acme → Source 2
https://weworkremotely.com/categories/devops → Source 3

All three sources feed into the same aggregator and produce data in the same schema format.

Source Configuration

Each source has:

Property	Description
URL	The target page to scrape
Extractor Config	Rules for extracting data (CSS selectors, field mappings)
Schedule	How often to run (hourly, daily, weekly, or manual)
Schema Version	Which schema version this source uses
Status	Active, paused, or needs correction

Adding a Source

When you add a source to an aggregator:

Paste the target URL
Describe what to extract (or write the extractor config manually)
The AI generates an extractor config mapped to your schema
Preview the extracted data
Run a test flight to verify
Choose a schedule and save

Extractor Configuration

The extractor config defines how to pull data from the HTML. Example:

{
  "container": ".job-listing",
  "fields": {
    "title": {
      "selector": "h2.job-title",
      "type": "text"
    },
    "company": {
      "selector": ".company-name",
      "type": "text"
    },
    "url": {
      "selector": "a.job-link",
      "type": "attribute",
      "attribute": "href"
    }
  }
}

Source Status

Status	Meaning
Active	Running normally on schedule
Paused	Manually paused or limit reached
Needs Correction	Failed multiple times, requires attention

When a source fails 3 consecutive times, it’s marked as “needs correction” and you’ll be notified.

Limits

The number of sources you can create depends on your plan:

Plan	Sources
Free	3
Starter	25
Pro	100

Aggregators - The container for sources
Schemas - How extracted data is structured