This is a guide to UMN library and open access resources that are available in some form for text mining purposes.
Text analysis tools
Lightweight tools
- Google Ngram ViewerAn easy-to-use tool to map trends in the Google Books corpus.
- HathiTrust Research Center AnalyticsAnalyze text from the HathiTrust Digital Library online using pre-defined algorithms. (Registration with a .edu email address required)
- LexosLexos is a web-based tool from Wheaton College to help you explore your favorite corpus of digitized texts.
- MediaCloudThree online tools—Explorer, Topic Mapper, and Source Manager—to analyze stories and topics across print, broadcast and digital news collections. Developed by the MIT Center for Civic Media and the Berkman Klein Center for Internet & Society at Harvard. Data is also available via API and as open source software.
- OverviewDocsSearch, visualize, and review your documents using this free online tool (account required).
- Social Media Macroscope[Note: this resource is currently unavailable. LATIS is currently exploring whether they can provide a local instance of this tool for UMN researchers.]
Free social media data, analytics and visualization tools for researchers at all levels of expertise. Includes SMILE, an open-source social media analytics tool to collect and analyze data from Twitter and Reddit; and BAE, a Brand Analytics Environment to gain insight into how individuals and groups may interact with brands and various organizations. - VoyantA free web-based environment for analyzing digital texts.
Programming & software
- NVivoQualitative analysis software that you can use to analyze text documents such as interviews, survey responses, and more.
- RR is a free software environment for statistical computing and graphics. See Introduction to the tm Package: Text Mining in R to learn more about data import, corpus handling, preprocessing, metadata management, and creation of term-document matrices.
- PythonPython is a free open source and general-purpose programming language that often serves as a foundation for text analysis projects.
- Watson Natural Language Understanding APIAnalyze semantic features of text input, including categories, concepts, emotion, entities, keywords, metadata, relations, semantic roles, and sentiment.
Text data: Wrangling & cleaning
- DocumentCloudA tool for journalists to search and analyze collections of public documents. Also available to annotate your own Open Calais documents.
- Open CalaisOpen Calais processes the text you submit and returns: Entities, Topic codes, Events, Relations and SocialTags. (Thomson Reuters)
- OpenRefineOpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
- RayyanFree web app to help authors create systematic reviews, collaborate on them, maintain them over time and get suggestions for article inclusion.
Resources for learning
- Programming HistorianIntroductory lessons on distant reading for academics, including lessons on Stylometry with Python, Basic Text Processing with R, and Getting started with Topic Modeling and Mallet.
Natural Language Processing with Python (ebook )
ISBN: 9780596516499This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation.Text Analysis with R for Students of Literature by
Text Analysis with R for Students of Literature is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological tool kit to include quantitative and computational approaches to the study of text.
Last Updated: Jan 7, 2025 3:13 PM
URL: https://libguides.umn.edu/text-mining