
来源:百度知道 编辑:UC知道 时间:2024/05/18 02:52:02
The position of the word according to the paragraph that it exists in can be another identifying feature
of the keywords since we also expect them to be in the beginning and end of the paragraph.
Furthermore, the position of the word in the sentence may be an identifying feature. For instance,
while more important terms are found in the beginning or end of the sentence in English, they are
placed before the last word in Turkish. Therefore, if we apply learning method for extracting
keywords, we can use it independent of the structure or type of the language.
Naive Bayes makes the assumption that the feature values are independent. With this assumption, we
can compute the probability that a word is a key given its TFxIDF score(T), the distance to the
beginning of the paragraph (D), the relative position of the word with respect to the whole text
(PT)and the sentence that it exists in (PS) by using Bayes Theorem [5]:
where P(key) denotes

可以计算概率的一个词是一个关键鉴于其tfxidf评分(吨) ,距离到
开头部分( d )段,相对位置的Word与尊重全文
( PT )和句子,它存在于( PS )的使用贝叶斯定理[ 5 ] :
其中p (关键)是指先验概率,一个字是一个关键的(假定一律平等,所有的话
我们的问题) , P (下吨|键) ,指的概率有tfxidf评分吨,由于Word中是一个关键,
P (下|键) ,指的概率有邻居的距离D的前发生的
同一个词,由于Word中是一个关键, P (下铂|键) ,指的概率有相对距离铂
到前发生的同一个词,由于Word中是一个关键, P (下的PS |键) ,是指
和P (吨署署长,铂, PS )的概率是指一个字有tfxidf评分吨,邻居的距离
发展,地位,在文本中PT和立场,在句子中的PS 。