请帮我翻译一下“文本聚类”的摘要

来源:百度知道 编辑:UC知道 时间:2024/06/03 09:43:12
随着Internet 的快速发展,网上的电子资源日益丰富,要从如此令人眼花缭乱的资源中挖掘出自己需要的信息变得很困难,对资源分类为人们信息检索(IR)提供了一定的导航机制。因此,文本发掘技术(如文本分类、文本聚类等)在web 领域得到了高度的重视。
而文本聚类作为一种无监督的机器学习方法,聚类由于不需要训练过程,以及不需要预先对文本手工标注类别,因此具有一定的灵活性和较高的自动化处理能力,已经成为对文本信息进行有效地组织、摘要和导航的重要手段,为越来越多的研究人员所关注。
本篇论文介绍了文本聚类的基础知识和一般过程,并且实现了3 种最基本的聚类算法:K-means聚类算法,PAM聚类算法和简单凝聚聚类算法。论文对8个类的1360 篇文本进行测试,比较和分析了3个聚类算法的优缺点。

第一段已经翻译好了,请帮我翻译第二和第三段。
With the rapid development of the Internet, there are more and mores resources that we can hardly find the information we want. Document classification provides some navigation functions to information retrieval. The techniques of text mining such as text categorization and text clustering play more and more important role in the Web area.

第二段:
Copies of as a cluster of machines unsupervised learning methods, clustering process because it does not require training, and not advance to the text manually tagging category, it must have the flexibility and high throughput automation, has become the text information effectively organize and summary and an important means of navigation, more and more research attention.
第三段:
This paper introduced the text clustering basic knowledge and the general process of the realization of the three basic types of clustering algorithm : K-means clustering algorithm, PAM clustering algorithm and a simple rallying clustering algorithm. Papers on eight categories of the 1360 version for testing, comparison and analysis of the three advantages of clustering algorithm .

But text origin gather a type of the machine which Be 1 kind to have no direct study method, gather a type because of not demand train process, and not demand in advance to text origin the handicraft mark