300分求lucene资料翻译,请不要用翻译软件谢谢8

来源:百度知道 编辑:UC知道 时间:2024/05/04 17:17:21
FUTURE WORK
While we have not achieved the benchmark set by the original Lucene implementation, we have demonstrated other approach that which might be useful in some special cases. In particular, the modified version is more suitable for smaller set of short documents and tries implementation is good for long document sets. However more experimentation is needed to evaluate the
performance and space consumption.
Also, besides the incorporation the Porter stemmer into tries, chaining of bottom nodes to avoid traversal overhead, and the deployment of top-down compressed tries, we can also combine the computing of IDF’s and vector length in the one iteration before the final write to disk to avoid iterating though all tokens twice(after all documents are already indexed)
Finally, pre-allotting the memory to a statistical depth of d(making the structure static)for tries and use the same tries structure for all documents would be a nice idea as we can then rid of the e

今后的工作
虽然我们没有达到预期的基准所订的原Lucene的实施,我们展示了另一种做法认为这可能是有用的,在某些特殊情况下。特别是,该修改版本,更适合较小的短文件,并试图实施是一件好事,只要文件集。但更多的实验,是需要评估
性能和空间消耗。
另外,除了团波特stemmer到尝试,链状的底部节点,以避免穿越高架,并部署顶向下压缩尝试,我们也可以把电脑的以色列国防军的和向量长度在一个迭代决赛前收件到磁盘,以避免遍历虽然所有令牌两次(后所有文件都已经索引)
最后,预分配内存统计深度D (下使结构静) ,并试图用同样的尝试结构的所有文件将是一个不错的想法,我们才能摆脱执行所需的时间,连续运行时内存分配。在这种情况下,我们只需要能够识别的数据串的现行文件,从历届。