Article Schema
The Article schema is designed for aggregating news articles, blog posts, and other written content.
Fields
Section titled “Fields”| Field | Type | Required | Identity | Description |
|---|---|---|---|---|
title | string | Yes | Yes (1) | Article title |
source | string | Yes | Yes (2) | Publication or website name |
url | string | No | No | Link to the full article |
published_at | string | No | No | Publication date |
summary | string | No | No | Article summary or excerpt |
author | string | No | No | Author name |
tags | string[] | No | No | Topics or categories |
Identity Fields
Section titled “Identity Fields”The identity is computed from title + source in that order. This handles cases where the same article might appear on multiple aggregator sites but should be deduplicated based on the original source.
Example Item
Section titled “Example Item”{ "title": "The Future of Serverless Computing", "source": "TechCrunch", "url": "https://techcrunch.com/2026/01/20/serverless-future", "published_at": "2026-01-20", "summary": "A deep dive into where serverless is heading in 2026...", "author": "Jane Smith", "tags": ["serverless", "cloud", "infrastructure"]}Use Cases
Section titled “Use Cases”- Niche news aggregators
- Industry-specific content feeds
- Research and monitoring tools
- Content curation platforms
Extractor Example
Section titled “Extractor Example”{ "container": "article.post", "fields": { "title": { "selector": "h1.article-title", "type": "text" }, "source": { "selector": ".publication-name", "type": "text" }, "url": { "selector": "a.read-more", "type": "attribute", "attribute": "href" }, "published_at": { "selector": "time", "type": "attribute", "attribute": "datetime" }, "summary": { "selector": ".excerpt", "type": "text" }, "author": { "selector": ".author-name", "type": "text" } }}