楼主动力没了么?是我说的话的问题么?其实感觉分词版是不错,输入精确率很高,某天晚上还做了个梦,梦见7.0版本发布了,分词版词库400万,达到32位rime上限了靠
我在研究如何自动分词的算法问题
一如既往支持
大神不要灰心
坚持就是胜利
楼主您好。一些信息,或许有用,供参考:
一:
Google Ngram Exports 之 Chinese (simplified)部分。
一个玩得停不下来的Google神器:Ngram - 知乎
二:词频迷思
语料库取得的frequency VS 改进的frequency–WORD PREVALENCE VS 输入法词库中词条的词频
We present word prevalence data for 61,858 English words. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people. Word prevalence data are useful for gauging the difficulty of words and, as such, for matching stimulus materials in experimental conditions or selecting stimulus materials for vocabulary tests. Word prevalence also predicts word processing times, over and above the effects of word frequency, word length, similarity to other words, and age of acquisition, in line with previous findings in the Dutch language.
这里面的中文数据不够。比较简单。
谷歌的中文分词技术应该是谷歌人工审核制作的
研究好分词技术后,最强分词版估计要登场
在卡饭论坛回复了,新版7.0不错