irpas技术客

Elasticsearch实战(十一)---前缀模糊匹配搜索 prefix search_jzjie_elasticsearch 前缀匹配

未知 6261

Elasticsearch实战- 前缀模糊匹配搜索 prefix/wildcard/regexp search

文章目录 Elasticsearch实战- 前缀模糊匹配搜索 prefix/wildcard/regexp search1.模糊匹配场景1.1 准备数据 2.模糊搜索实现2.1 前缀搜索 prefix2.2 通配符搜索 wildcard2.3 正则搜索 regexp

1.模糊匹配场景

场景: 前面我们的数据都是精确匹配, 比如 现在content字段 有个 elasticsearch 单词, 你搜 elastic 肯定是搜不到 ,因为 elastic这个单词在 content内容中是找不到的

# 模糊匹配 elastic ,匹配字段 elasticsearch 查不到数据 get /testcopy/_search { "query":{ "match": { "content": "elastic" } } }

这种该如何实现 ?

1.1 准备数据 POST /testcopy/_bulk {"index":{"_id": 1}} {"empId" : "111","name" : "员工1","age" : 20,"sex" : "男","mobile" : "19000001111","salary":1333,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"光谷大道","address":"湖北省武汉市洪山区光谷大厦","content" : "i like to write best elasticsearch article"} {"index":{"_id": 2}} {"empId" : "222","name" : "员工2","age" : 25,"sex" : "男","mobile" : "19000002222","salary":15963,"deptName" : "销售部","provice" : "湖北省","city":"武汉","area":"江汉区","address" : "湖北省武汉市江汉路","content" : "i think java is the best programming language"} {"index":{"_id": 3}} { "empId" : "333","name" : "员工3","age" : 30,"sex" : "男","mobile" : "19000003333","salary":20000,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"经济技术开发区","address" : "湖北省武汉市经济开发区","content" : "i am only an elasticsearch beginner"} {"index":{"_id": 4}} {"empId" : "444","name" : "员工4","age" : 20,"sex" : "女","mobile" : "19000004444","salary":5600,"deptName" : "销售部","provice" : "湖北省","city":"武汉","area":"沌口开发区","address" : "湖北省武汉市沌口开发区","content" : "elasticsearch and hadoop are all very good solution, i am a beginner"} {"index":{"_id": 5}} { "empId" : "555","name" : "员工5","age" : 20,"sex" : "男","mobile" : "19000005555","salary":9665,"deptName" : "测试部","provice" : "湖北省","city":"高新开发区","area":"武汉","address" : "湖北省武汉市东湖隧道","content" : "spark is best big data solution based on scala ,an programming language similar to java"} {"index":{"_id": 6}} {"empId" : "666","name" : "员工6","age" : 30,"sex" : "女","mobile" : "19000006666","salary":30000,"deptName" : "技术部","provice" : "武汉市","city":"湖北省","area":"江汉区","address" : "湖北省武汉市江汉路","content" : "i like java developer"} {"index":{"_id": 7}} {"empId" : "777","name" : "员工7","age" : 60,"sex" : "女","mobile" : "19000007777","salary":52130,"deptName" : "测试部","provice" : "湖北省","city":"黄冈市","area":"边城区","address" : "湖北省黄冈市边城区","content" : "i like elasticsearch developer"} {"index":{"_id": 8}} {"empId" : "888","name" : "员工8","age" : 19,"sex" : "女","mobile" : "19000008888","salary":60000,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"汉阳区","address" : "湖北省武汉市江汉大学","content" : "i like spark language"} {"index":{"_id": 9}} {"empId" : "999","name" : "员工9","age" : 40,"sex" : "男","mobile" : "19000009999","salary":23000,"deptName" : "销售部","provice" : "河南省","city":"郑州市","area":"二七区","address" : "河南省郑州市郑州大学","content" : "i like java developer"} {"index":{"_id": 10}} {"empId" : "101010","name" : "张湖北","age" : 35,"sex" : "男","mobile" : "19000001010","salary":18000,"deptName" : "测试部","provice" : "湖北省","city":"武汉","area":"高新开发区","address" : "湖北省武汉市东湖高新","content" : "i like java developer i also like elasticsearch"} {"index":{"_id": 11}} {"empId" : "111111","name" : "王河南","age" : 61,"sex" : "男","mobile" : "19000001011","salary":10000,"deptName" : "销售部",,"provice" : "河南省","city":"开封市","area":"金明区","address" : "河南省开封市河南大学","content" : "i am not like java "} {"index":{"_id": 12}} {"empId" : "121212","name" : "张大学","age" : 26,"sex" : "女","mobile" : "19000001012","salary":1321,"deptName" : "测试部",,"provice" : "河南省","city":"开封市","area":"金明区","address" : "河南省开封市河南大学","content" : "i am java developer thing java is good"} {"index":{"_id": 13}} {"empId" : "131313","name" : "李江汉","age" : 36,"sex" : "男","mobile" : "19000001013","salary":1125,"deptName" : "销售部","provice" : "河南省","city":"郑州市","area":"二七区","address" : "河南省郑州市二七区","content" : "i like java and java is very best i like it do you like java "} {"index":{"_id": 14}} {"empId" : "141414","name" : "王技术","age" : 45,"sex" : "女","mobile" : "19000001014","salary":6222,"deptName" : "测试部",,"provice" : "河南省","city":"郑州市","area":"金水区","address" : "河南省郑州市金水区","content" : "i like c++"} {"index":{"_id": 15}} {"empId" : "151515","name" : "张测试","age" : 18,"sex" : "男","mobile" : "19000001015","salary":20000,"deptName" : "技术部",,"provice" : "河南省","city":"郑州市","area":"高新开发区","address" : "河南省郑州高新开发区","content" : "i think spark is good"} 2.模糊搜索实现 2.1 前缀搜索 prefix

前缀匹配只适用于 keyword ,是不做分词的且大小写敏感, 因为前缀匹配不涉及索引分词,所以只能匹配 关键字 keyword 我们不建议用 前缀搜索,因为效率太低

查询一个句子 "elasticsearch and hadoop are all very good solution, i am a beginner” 用前缀搜索试一试

#搜索以 elastic 为前缀 搜索 keyword get /testcopy/_search { "query":{ "prefix": { "content.keyword": { "value": "elas" } } } }

查询结果, 可以搜除结果 以 elas 开头的 文本 “elasticsearch and hadoop are all very good solution, i am a beginner”

换一个大写, 看看是否区分大小写

get /testcopy/_search { "query":{ "prefix": { "content.keyword": { "value": "ELas" } } } }

切换大写后, 查询不出来结果,所以说 prefix前缀搜索时区分大小写的

2.2 通配符搜索 wildcard

ES中可以实现通配符搜索 , 比如 ?—表示 一个任意字符,*–表示0~n个任意字符,通配符匹配也是扫描完整索引,通配符可以在 索引中使用,也可以在 keyword中使用 效率也是很低,也是要搜索完整索引, 不建议在生产环境中使用,

content.keyword 通配符搜索关键字

# 通配符 匹配 e*asticsea* 来尝试匹配 elasticsearch 在keyword中使用 get /testcopy/_search { "query":{ "wildcard": { "content.keyword": { "value": "e?asticsea*" } } } }

查询 结果 ,可以正确的搜出结果

上面尝试了匹配 keyword 中 wildcard 来通配符匹配 下面尝试在 倒排索引中搜索 通配符 " oo* " 看看是否能匹配 倒排索引

# wildcard 通配符匹配 content 倒排索引搜索 *o*o* get /testcopy/_search { "query":{ "wildcard": { "content": { "value": "*o*o*" } } } }

看下查询结果 ,可以看到 在分词倒排索引中也是生效的

2.3 正则搜索 regexp

ES中可以实现正则搜索 , 比如 [A-z]表示 任意一个字母,.表示一个字符, + 表示前面表达式可以出现多次 正则搜索匹配也是扫描完整索引,效率也是很低,不建议在生产环境中使用

# 正则 匹配 .*op[A-z]{0,1} 来尝试匹配 xxxop后 0个或者1个字符的 doc get /testcopy/_search { "query":{ "regexp": { "content": ".*op[A-z]{0,1}" } } }

查询 结果 ,可以正确的搜出结果 hadoop, 只找到了一个 ”elasticsearch and hadoop are all very good solution, i am a beginne“

如果把 {0,1} 放开, 放到 0,3, 看看是否能匹配其他 doc ,hadoop 及 developer 都符合这个正则 可以看到查询结果, developer也被查询出来


至此 我们已经学习了 模糊搜索,包括 前缀匹配搜索, 正则表达式匹配搜索, 通配符匹配搜索等等,但是这种模糊匹配都不建议在生产中使用,因为都需要扫描 整个doc的索引,所以效率很低,所以不建议生产中使用


1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #ElasticSearch #前缀匹配 #elasticsearch实战 #前缀模糊匹配搜索 #prefix #前缀搜索 #wildcard #通配符搜索regexp