This guide is an in-depth listing of Chinese Studies resources available to students and faculty at the University of Minnesota.
Linguistic Tools
- CKIP LabFor modern Taiwan-style Chinese in traditional characters, CKIP may be a good Python option. It has a demo interface where you can paste in text and retrieve a tokenized version with POS, NER, etc.
- GuwenBERT 古文预训练语言模型GuwenBERTis a RoBERTa model trained on Classical Chinese text. In natural language processing, pre-trained language models have become a very important basic technology. At present, there are a large number of modern Chinese BERT models available for download on the Internet, but the language model of Classical Chinese is lacking. In order to promote the research of Classical Chinese and natural language processing, we released the Classical Chinese pre-trained language model called GuwenBERT.
- jieba “结巴”中文分词"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module. For modern PRC-style Chinese in simplified characters, Jieba is probably the most commonly used Python package. You can find reviews and comparison (in Chinese) of a range of others at https://blog.csdn.net/shuihupo/article/details/81540433 and https://www.52nlp.cn/五款中文分词工具线上pk-jieba-snownlp-pkuseg-thulac-hanlp.
- Open Chinese Convert 開放中文轉換Open Chinese Convert (OpenCC, 開放中文轉換) is an opensource project for conversions between Traditional Chinese, Simplified Chinese and Japanese Kanji (Shinjitai). It supports character-level and phrase-level conversion, character variant conversion and regional idioms among Mainland China, Taiwan and Hong Kong. This is not translation tool between Mandarin and Cantonese, etc.
- Stanford ParserA natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Their development was one of the biggest breakthroughs in natural language processing in the 1990s. You can try out Stanford parser online.
Traditional Chinese Linguistics Resources
- Bamboo and Silk 简帛Bamboo and Silk is a peer-reviewed, academic journal in English published by Brill with editorship based at the Center of Bamboo and Silk Manuscripts of Wuhan University. The journal focuses on excavated bamboo and silk documents from China’s Pre-Qin, Qin, Han, and Wei-Jin periods with a main emphasis on paleography and textual editing and related research on society, politics, economy, legal system, ideology, culture, language habits, among other topics.
- Chinese Etymology 字源A website operated by Richard Sears, also named Uncle Hanzi, trying to explain the original forms and the original logic of each Chinese character.
- Shuo wen jie zi 说文解字A digital version of Xu Shen's Shuo wen jie zi.
Reference Resources
- Chinese-English Dictionary OnlineThis is the first Chinese-English dictionary devoted solely to the premodern language. Being a practical lexicon of more than 8,000 characters, arranged alphabetically by Pinyin romanization, it is meant to facilitate the reading and translating of historical, literary, and religious texts dating from approximately 500 BCE to 1000 CE.
- Chinese-English DictionaryThis Chinese/English dictionary provides a searchable interface for the CEDICT dictionary originally put together by Paul Denisowksi. Searches can be conducted by Chinese (using either the GB, Big5, or Unicode encodings), pinyin, or English. Results will show the Chinese word, the pinyin representation of the word, and the English definition. You can choose to have the Chinese characters appear as GIF pictures. You can also click on the pinyin to hear how it is pronounced.
- Han Dian 汉典A large Chinese online dictionary.
- zi.tools 字統网You can search the origin, meaning and different pronunciations of a Chinese character.
- 古音小镜This is a website to share materials and tools of historical linguistics, especially of Old Chinese sounds. You can also find many other reference materials such as 汉语大词典, 经籍籑诂, 甲骨字形库, etc.
- 小学堂A tool used for searching the history of a Chinese character.
- 搜文解字An online tool used for searching the pronunciation, meaning and use of a Chinese character or phrase and also provides relevant knowledge on Chinese language and literature.
Last Updated: Mar 20, 2025 5:52 PM
URL: https://libguides.umn.edu/china_advanced