帮翻译一下~~~

来源:百度知道 编辑:UC知道 时间:2024/05/18 02:01:42
文本的分词,统计搜索是实施文本挖掘的前提。词语统计主要步骤是把字符串分解为独立的词、标点符号处理、排除一些无意义词、单词后缀 、字母大小写的一些处理,建立一个包括词和词的行列信息的数据集合。关键词的搜索则在此集合上进行搜索。为提高效率,数据的存储读取建立等用的是Map,Vector,list等容器类型。

Segmentation and statistics search of the text is the premise of mining the text. Major steps for tatistics are decomposing the strings of words into independent words, processing the punctuations, excluding some meaningless words, word suffixes and the case of letters, including establishing a data set of information about word ranks. Keyword search is based on the pool of the aggregation. To improve efficiency, data storage and other reading use the containers including Map, Vector, List etc..