Skip to content

Sources

A Source is a website or URL that feeds data into an aggregator. Each source has its own extraction configuration and schedule.

A source represents a single data feed. For example, if you’re building a job board aggregator, each company’s career page would be a separate source:

  • https://jobs.lever.co/acme → Source 1
  • https://boards.greenhouse.io/acme → Source 2
  • https://weworkremotely.com/categories/devops → Source 3

All three sources feed into the same aggregator and produce data in the same schema format.

Each source has:

PropertyDescription
URLThe target page to scrape
Extractor ConfigRules for extracting data (CSS selectors, field mappings)
ScheduleHow often to run (hourly, daily, weekly, or manual)
Schema VersionWhich schema version this source uses
StatusActive, paused, or needs correction

When you add a source to an aggregator:

  1. Paste the target URL
  2. Describe what to extract (or write the extractor config manually)
  3. The AI generates an extractor config mapped to your schema
  4. Preview the extracted data
  5. Run a test flight to verify
  6. Choose a schedule and save

The extractor config defines how to pull data from the HTML. Example:

{
"container": ".job-listing",
"fields": {
"title": {
"selector": "h2.job-title",
"type": "text"
},
"company": {
"selector": ".company-name",
"type": "text"
},
"url": {
"selector": "a.job-link",
"type": "attribute",
"attribute": "href"
}
}
}
StatusMeaning
ActiveRunning normally on schedule
PausedManually paused or limit reached
Needs CorrectionFailed multiple times, requires attention

When a source fails 3 consecutive times, it’s marked as “needs correction” and you’ll be notified.

The number of sources you can create depends on your plan:

PlanSources
Free3
Starter25
Pro100