This is a guide to UMN library and open access resources that are available in some form for text mining purposes.

Collaborative Archive & Data Research Environment (CADRE)

collaborative archive & data research environment logoCADRE is a cloud-based platform that provides access to a standardized version of the bibliometric Web of Science and Microsoft Academic Graph datasets. You can query, analyze, visualize and store data in CADRE all from the comfort of your browser. You do not need to have any coding experience to perform data querying or analysis on CADRE. The platform is currently in its beta phase but is being used by researchers.

Getting started

Visit the CADRE Gateway and Log In. You will be prompted to a CILogon portal, where you can select "University of Minnesota" and use your UMN email address to log on. You can begin working on the platform immediately.

Find extensive walkthroughs of CADRE’s features and demos for running and analyzing your first queries in the collection of informational videos on CADRE’s Resources page.

CADRE Datasets

As a partner to the CADRE project, University of Minnesota researchers can access the following datasets via CADRE:

  • Web of Science: a leading commercial dataset that includes 73 million papers and 1.7 billion citations.
  • Microsoft Academic Graph: an open bibliometric dataset that holds 250 million documents and 2.4 billion citations.
  • U.S. Patent and Trademark Office: an open government dataset that includes 9 million patent application documents (available in graph database spring 2021, available in raw format now).

Microsoft Academic Graph includes a broad spectrum of internet research documents for all sciences, while Web of Science indexes selected journals that cover “core” and "emerging" sciences. All of CADRE’s datasets are updated by the CADRE team as updates become available to ensure researchers are working with the latest data release.

Working with data

Queries, analysis, & visualization

CADRE’s Gateway contains the tools you need to query, analyze, and publish your research. Everything created in CADRE can be reproduced by other researchers. CADRE’s Gateway includes:

  • Query Builder: The user-friendly online Query Builder allows you to easily query big bibliometric datasets. Researchers can also use the Query Builder to return institutional addresses as a column in the query output. Beginning March 5, 2021, users will be able to choose the citing direction of a network query.
  • Jupyter Notebooks: Proficient coders can take advantage of the Jupyter Notebook feature to build custom data-analysis and visualization tools.
  • Marketplace: After users create data-analysis tools in Jupyter Notebooks, they can publish them to the Marketplace for other users to apply to their own research. The Marketplace also allows you to publish and reproduce queries, derived data, and workflows.
Storage & preservation

Users can store their query outputs, data-analysis tools, and research results in the CADRE cloud. Users will soon have the ability to attach DOIs to their reproducible packages (spring 2021). CADRE will provide three tiers of DOI allocation:

  1. Packages with no DOI or metadata (discoverable only by users with a CADRE account)
  2. Packages with temporary DOIs and metadata (discoverable only by users with a CADRE account)
  3. Permanently archived packages with DOIs and metadata (discoverable by anyone)
     

How researchers are using CADRE

MCAP: Mapping collaborations and partnerships in SDG research
Researchers used CADRE’s datasets to study research output and patterns of global collaboration that support the United Nations’ Sustainable Development Goals (SDGs). Read the paper the team published about the research it conducted on CADRE.

The Global network of air links and scientific collaboration – a Quasi-experimental analysis
The research team is determining how the introduction and availability of long-distance flights impacted international scientific collaboration by measuring collaboration through co-authorship and co-affiliation on CADRE’s datasets.

Study of pandemic publishing
How Scholarly Literature is Affected by COVID-19 Pandemic: Researchers are studying the quality of COVID-19 related scholarly works by using CADRE’s datasets to identify signs of incoherency, irreproducibility, and haste.

Last Updated: Sep 20, 2021 2:03 PM