Skip to Main Content

Digital Scholarship: Text Analysis

An introduction to the concepts and uses of text analysis

Resources for Texts

There are numerous resources for texts online.  Listed here are a few large and/or popular collections that are used for text mining.  If you have a unique text that you would like to digitize and use in text analysis, please contact the Digital Scholarship team at KSL.  freedmancenter@case.edu

HathiTrust Digital Library - Hathi Trust's collections include over 16 million volumes that span the history of printed text, primarily in English, but also in over 400 other languages.

Internet Archive eBooks and Texts - Over 20 million freely downloadable books and texts

Project Gutenberg - A library of over 60,000 free e-books, most of which are older works in the public domain

University of Oxford Text Archive - Thousands of full-text literary and linguistic sources in more than 25 languages

corpus.byu.edu - A number of large corpora compiled by Prof. Mark Davies (Linguistics) of Brigham Young University

Caselaw Access Project - Harvard's downloadable database of 360 years of United States caselaw

Chronicling America: Historic American Newspapers - access to information about historic U.S. newspapers and millions of digitized newspaper pages