Webdatacommons
webdatacommons.orgRank Trend
Ranking history over time.
About Webdatacommons
Web Data Commons extracts structured data from the Common Crawl, the largest publicly available web corpus. The project provides this data for public download to support researchers and companies in utilizing the vast information available on the web.
Access structured data extracted from the Common Crawl for research and analysis.
What You Can Do
- Download structured data sets
- Explore schema.org class-specific subsets
- Utilize benchmarks for entity matching methods
- Access data for performance evaluation
Frequently Asked Questions
What is the Web Data Commons project?
The Web Data Commons project extracts structured data from the Common Crawl and makes it available for public download.
How can I use the data provided by Web Data Commons?
The data can be used for research and analysis in various fields, including data science and web development.
Is the data from Web Data Commons free to access?
Yes, all data provided by Web Data Commons is available for free public download.
What types of data sets are available?
The project offers various data sets including RDFa, Microdata, Microformat, and JSON-LD, among others.
How often is the data updated?
Data sets are updated regularly, with new releases corresponding to the latest Common Crawl extractions.