Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sacra.com/llms.txt

Use this file to discover all available pages before exploring further.

Sacra aggregates news from across the web and processes it into a clean, company-attributed feed. Raw articles go through three stages before they surface in the API: entity identification, clustering, and summarization.

Entity identification

Private companies don’t have tickers. Matching a news article to the right company requires understanding the text — not just looking for an exact string match. Sacra uses AI to read each article and identify which companies are mentioned. This handles the common cases that string matching gets wrong: name variations (“Stripe” vs. “Stripe Inc.”), abbreviations, informal references, and articles that discuss a company without ever using its exact registered name. The result is that each news item is reliably attributed to the companies it actually covers, and you can query news by company domain without worrying about whether the source spelled the name correctly.

Clustering

News from different outlets frequently covers the same underlying event — a funding round, a product launch, a leadership change. Without deduplication, a single story can generate dozens of redundant items. Sacra clusters articles that cover the same event into a single news item. Clustering is based on semantic similarity: articles are grouped when they refer to the same facts, even if the wording differs across sources. Each cluster is treated as one news event, regardless of how many outlets picked it up. This means the news feed you get from the API reflects distinct events, not raw article volume.

AI-generated headlines and summaries

Once articles are clustered into an event, Sacra uses AI to write a single headline and summary for the cluster. The headline and summary are generated from the full set of source articles, so they reflect the most complete picture of the event rather than any one outlet’s framing. Each news item in the API includes:
FieldDescription
headlineFull AI-generated title for the event
short_headlineCondensed version for compact display
descriptionShort paragraph synthesizing what happened across all source articles
release_dateWhen the event was first reported
articlesThe source articles that were clustered together
Each article in the articles array carries its original headline, link, date_published, publication, and thumbnail. A publication_score is also included to indicate source quality, which you can use to sort or filter the underlying sources. The source articles are always available so you can trace any claim back to its origin.

Filtering news

The news endpoint accepts several filters:
  • company_domain — fetch news for a single company by domain (e.g. stripe.com)
  • company_domains — comma-separated list for up to 200 companies in a single GET request, or up to 1,000 via the POST batch endpoint
  • news_typemajor (default) returns only significant events; all includes minor mentions
  • release_date_start / release_date_end — filter by when the event was first reported
  • updated_at_gte / updated_at_lte — filter by when the news item was last updated; recommended for polling workflows where you want only what’s changed since your last sync
Use updated_at_gte for incremental syncs rather than re-fetching by release date — see Sync Sacra Research & Data for a complete polling guide. News items can be updated as new articles are added to a cluster after the initial release.