注册 X
我已阅读并同意  服务条款
首页 > IT技术笔记 > 查看笔记

java拆分句子的词汇,Java分词

word分词是一个Java实现的中文分词组件,提供了多种基于词典的分词算法,并利用ngram模型来消除歧义。 能准确识别英文、数字,以及日期、时间等数量词,能识别人名、地名、组织机构名等未登录词。 同时提供了Lucene、Solr、ElasticSearch插件。

引入依赖 1.3版本

<dependency>
    <groupId>org.apdplat</groupId>
    <artifactId>word</artifactId>
    <version>1.3</version>
</dependency>


测试:

public class WordFilter {

    public static  void automaticSelection(String title) {        //移除停用词进行分词
        List<Word> list = WordSegmenter.seg(title);

        System.out.println(JSON.toJSONString(list));        //保留停用词
        List<Word> lists = WordSegmenter.segWithStopWords(title);
        System.out.println(JSON.toJSONString(lists));

    }    public static void main(String[] args) {
        WordFilter.automaticSelection("子查询中的返回结果字段组合是一个索引");
    }
}


输出结果:

[{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"我"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"叫"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"李太白"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"我"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"是"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"一个"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"诗人"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"我"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"生活"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"在"},{"acronymPinYin":"","antonym":[],"frequency":0,"fullPinYin":"","synonym":[],"text":"唐朝"}]



 打赏        分享



评论