楼主您好。一些信息,或许有用,供参考:
一:
Google Ngram Exports 之 Chinese (simplified)部分。
一个玩得停不下来的Google神器:Ngram - 知乎
二:词频迷思
语料库取得的frequency VS 改进的frequency–WORD PREVALENCE VS 输入法词库中词条的词频
We present word prevalence data for 61,858 English words. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people. Word prevalence data are useful for gauging the difficulty of words and, as such, for matching stimulus materials in experimental conditions or selecting stimulus materials for vocabulary tests. Word prevalence also predicts word processing times, over and above the effects of word frequency, word length, similarity to other words, and age of acquisition, in line with previous findings in the Dutch language.