Article Schema

The Article schema is designed for aggregating news articles, blog posts, and other written content.

Fields

Field	Type	Required	Identity	Description
`title`	string	Yes	Yes (1)	Article title
`source`	string	Yes	Yes (2)	Publication or website name
`url`	string	No	No	Link to the full article
`published_at`	string	No	No	Publication date
`summary`	string	No	No	Article summary or excerpt
`author`	string	No	No	Author name
`tags`	string[]	No	No	Topics or categories

Identity Fields

The identity is computed from title + source in that order. This handles cases where the same article might appear on multiple aggregator sites but should be deduplicated based on the original source.

Example Item

{
  "title": "The Future of Serverless Computing",
  "source": "TechCrunch",
  "url": "https://techcrunch.com/2026/01/20/serverless-future",
  "published_at": "2026-01-20",
  "summary": "A deep dive into where serverless is heading in 2026...",
  "author": "Jane Smith",
  "tags": ["serverless", "cloud", "infrastructure"]
}

Use Cases

Niche news aggregators
Industry-specific content feeds
Research and monitoring tools
Content curation platforms

Extractor Example

{
  "container": "article.post",
  "fields": {
    "title": {
      "selector": "h1.article-title",
      "type": "text"
    },
    "source": {
      "selector": ".publication-name",
      "type": "text"
    },
    "url": {
      "selector": "a.read-more",
      "type": "attribute",
      "attribute": "href"
    },
    "published_at": {
      "selector": "time",
      "type": "attribute",
      "attribute": "datetime"
    },
    "summary": {
      "selector": ".excerpt",
      "type": "text"
    },
    "author": {
      "selector": ".author-name",
      "type": "text"
    }
  }
}