COVID-19 Open Research Dataset (CORD-19) | Allen Institute for AI
About
The source code outlining how this product gathers, transforms, revises and publishes its datasets is available at <https://github.com/rearc-data/covid-19-open-research>.
## Product Description
In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.
This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.
### Data Sources
This resource includes the `metadata.csv` file released weekly by the Allen Institute for AI, which documents COVID-19 updates and new research published in peer-reviewed publications. The columns of the dataset are:
`cord_uid, sha,source_x, title,doi, pmcid, pubmed_id, license, abstract, publish_time, authors, journal, microsoft_academic_paper_id, who_covidence, has_pdf_parse, has_pmc_xml_parse, full_text_file, url`
To explore addtional COVID-19 resources distributed by the Allen Institute for AI, please click [here](https://www.semanticscholar.org/cord19/download).
## More Information
* Source: [Allen Institute for AI](https://allenai.org/)
* [Dataset License](https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/2020-03-13/COVID.DATA.LIC.AGMT.pdf)
* [COVID-19 Open Research Dataset Homepage](https://pages.semanticscholar.org/coronavirus-research)
* Frequency: Weekly
* Format: CSV
## Contact Details
* If you find any issues with or have enhancement ideas for this product, open up a GitHub [issue](https://github.com/rearc-data/covid-19-open-research/issues) and we will gladly take a look at it. Better yet, submit a pull request. Any contributions you make are greatly appreciated ❤️.
* If you are looking for specific open datasets currently not available on ADX, please submit a request on our project board [here](https://github.com/rearc-data/covid-datasets-aws-data-exchange/projects/1).
* If you have questions about the source data, please contact feedback@semanticscholar.org.
* If you have any other questions or feedback, send us an email at data@rearc.io.
## About Rearc
Rearc is a cloud, software and services company. We believe that empowering engineers drives innovation. Cloud-native architectures, modern software and data practices, and the ability to safely experiment can enable engineers to realize their full potential. We have partnered with several enterprises and startups to help them achieve agility. Our approach is simple — empower engineers with the best tools possible to make an impact within their industry.
Related Products
show moreHow it works?
Search
Search 25000+ products and services vetted by AWS.
Request private offer
Our team will send you an offer link to view.
Purchase
Accept the offer in your AWS account, and start using the software.
Manage
All your transactions will be consolidated into one bill in AWS.
Create Your Marketplace with Webvar!
Launch your marketplace effortlessly with our solutions. Optimize sales processes and expand your reach with our platform.