SlideShare una empresa de Scribd logo
1 de 52
Descargar para leer sin conexión
Présentation ElasticSearch
1
Indexation d’un annuaire de restaurant
● Titre
● Description
● Prix
● Adresse
● Type
2
Création d’un index sans mapping
PUT restaurant
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
3
Indexation sans mapping
PUT restaurant/restaurant/1
{
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
4
Risque de l’indexation sans mapping
PUT restaurant/restaurant/2
{
"title": "Pizza de l'ormeau",
"description": "Dans cette pizzeria on trouve
des pizzas très bonnes et très variés",
"price": 10,
"adresse": "1 place de l'ormeau, 31400
TOULOUSE",
"type": "italien"
}
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [title]",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: "Pizza de
l'ormeau""
}
},
"status": 400
} 5
Mapping inféré
GET /restaurant/_mapping
{
"restaurant": {
"mappings": {
"restaurant": {
"properties": {
"adresse": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"prix": {
"type": "long"
},
"title": {
"type": "long"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
6
Création d’un mapping
PUT :url/restaurant
{
"settings": {
"index": {"number_of_shards": 3, "number_of_replicas": 2}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"description": {"type": "text"},
"price": {"type": "integer"},
"adresse": {"type": "text"},
"type": { "type": "keyword"}
}
}
}
}
7
Indexation de quelques restaurants
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse":
"10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13
route de labège, 31400 TOULOUSE", "type": "asiatique"}
8
Recherche basique
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatique"
}
}
}
{
"hits": {
"total": 1,
"max_score": 0.6395861,
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix
contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
9
Mise en défaut de notre mapping
GET :url/restaurant/_search
{
"query": {
"match": {
"description": "asiatiques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
10
Qu’est ce qu’un analyseur
● Transforme une chaîne de caractères en token
○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”]
● Les tokens permettent de construire un index inversé
11
Qu’est ce qu’un index inversé
12
Explication: analyseur par défaut
GET /_analyze
{
"analyzer": "standard",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [{
"token": "un",
"start_offset": 0, "end_offset": 2,
"type": "<ALPHANUM>", "position": 0
},{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiatique",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "très",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieux",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
}
13
Explication: analyseur “french”
GET /_analyze
{
"analyzer": "french",
"text": "Un restaurant asiatique très copieux"
}
{
"tokens": [
{
"token": "restaurant",
"start_offset": 3, "end_offset": 13,
"type": "<ALPHANUM>", "position": 1
},{
"token": "asiat",
"start_offset": 14, "end_offset": 23,
"type": "<ALPHANUM>", "position": 2
},{
"token": "trè",
"start_offset": 24, "end_offset": 28,
"type": "<ALPHANUM>", "position": 3
},{
"token": "copieu",
"start_offset": 29, "end_offset": 36,
"type": "<ALPHANUM>", "position": 4
}
]
} 14
Décomposition d’un analyseur
Elasticsearch décompose l’analyse en trois étapes:
● Filtrage des caractères (ex: suppression de balises html)
● Découpage en “token”
● Filtrage des tokens:
○ Suppression de token (mot vide de sens “un”, “le”, “la”)
○ Transformation (lemmatisation...)
○ Ajout de tokens (synonyme)
15
Décomposition de l’analyseur french
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "elision",
"articles_case": true,
"articles": [
"l", "m", "t", "qu", "n", "s", "j", "d", "c",
"jusqu", "quoiqu", "lorsqu", "puisqu"
]
}, {
"type": "stop", "stopwords": "_french_"
}, {
"type": "stemmer", "language": "french"
}
],
"text": "ce n'est qu'un restaurant asiatique très copieux"
}
“ce n’est qu’un restaurant asiatique très
copieux”
[“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“ce”, “est”, “un”, “restaurant”, “asiatique”,
“très”, “copieux”]
[“restaurant”, “asiatique”, “très”, “copieux”]
[“restaurant”, “asiat”, “trè”, “copieu”]
elision
standard tokenizer
stopwords
french stemming
16
Spécification de l’analyseur dans le mapping
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {fields: {"type": "text", "analyzer": "french"}},
"description": {"type": "text", "analyzer": "french"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"type": { "type": "keyword"}
}
}
}
}
17
Recherche résiliente aux erreurs de frappe
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
18
Une solution le ngram token filter
GET /_analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "ngram",
"min_gram": 3,
"max_gram": 7
}
],
"text": "asiatuque"
}
[
"asi",
"asia",
"asiat",
"asiatu",
"asiatuq",
"sia",
"siat",
"siatu",
"siatuq",
"siatuqu",
"iat",
"iatu",
"iatuq",
"iatuqu",
"iatuque",
"atu",
"atuq",
"atuqu",
"atuque",
"tuq",
"tuqu",
"tuque",
"uqu",
"uque",
"que"
]
19
Création d’un analyseur custom pour utiliser le ngram filter
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "ngram_analyzer"},
"description": {"type": "text", "analyzer": "ngram_analyzer"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "ngram_analyzer"},
"type": {"type": "keyword"}
}
}
}
20
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "asiatuques"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.60128295,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}, {
"_score": 0.46237043,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où
tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
21
Bruit induit par le ngram
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": "gastronomique"
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.6277555,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat
coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},{
"_score": 0.56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un
prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
22
Spécifier plusieurs analyseurs pour un champs
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}},
"analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {
"type": "text", "analyzer": "french",
"fields": {
"ngram": { "type": "text", "analyzer": "ngram_analyzer"}
},
"price": {"type": "integer"},
23
Utilisation de plusieurs champs lors d’une recherche
GET /restaurant/restaurant/_search
{
"query": {
"multi_match": {
"query": "gastronomique",
"fields": [
"description^4",
"description.ngram"
]
}
}
}
{
"hits": {
"hits": [
{
"_score": 2.0649285,
"_source": {
"title": 42,
"description": "Un restaurant gastronomique où tout plat coûte 42 euros",
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
},
{
"_score": 0 .56373334,
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux pour un prix contenu",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
},
{
"_index": "restaurant",
24
Ignorer ou ne pas ignorer les stopwords tel est la question
POST :url/restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price":
42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très
variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"}
25
Les stopwords ne sont pas
forcément vide de sens
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": 42,
"description": "Un restaurant gastronomique donc
cher ou tout plat coûte cher (42 euros)",
"price": 42,
"adresse": "10 rue de l'industrie, 31000
TOULOUSE",
"type": "gastronomie"
}
},{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant asiatique très copieux
et pas cher",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
26
Modification de l’analyser french
pour garder les stopwords
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision",
"articles_case": true,
"articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"}
},
"analyzer": {
"custom_french": {
"tokenizer": "standard",
"filter": [
"french_elision",
"lowercase",
"french_stemmer"
]
}
27
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": "pas cher"
}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title": "Chez l'oncle chan",
"description": "Restaurant
asiatique très copieux et pas cher",
"price": 14,
"adresse": "13 route de labège,
31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
28
Rechercher avec les stopwords sans diminuer les
performances
GET /restaurant/restaurant/_search
{
"query": {
"match": {
"description": {
"query": "restaurant pas
cher",
"cutoff_frequency": 0.01
}
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{"term": {"description": "restaurant"}},
{"term": {"description": "cher"}}]
}
},
"should": [
{"match": {
"description": "pas"
}}
]
}
29
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"functions": [{
"filter": { "terms": { "type": ["asiatique", "italien"]}},
"weight": 2
}]
}
}
}
30
Personnaliser le “scoring”
GET /restaurant/restaurant/_search
{
"query": {
"function_score": {
"query": {
"match": {
"adresse": "toulouse"
}
},
"script_score": {
"script": {
"lang": "painless",
"inline": "_score * ( 1 + 10/doc['prix'].value)"
}
}
}
}
}
{
"hits": {
"hits": [
{
"_score": 0.53484553,
"_source": {
"title": "Pizza de l'ormeau",
"price": 10,
"adresse": "1 place de l'ormeau, 31400 TOULOUSE",
"type": "italien"
}
}, {
"_score": 0.26742277,
"_source": {
"title": 42,
"price": 42,
"adresse": "10 rue de l'industrie, 31000 TOULOUSE",
"type": "gastronomie"
}
}, {
"_score": 0.26742277,
"_source": {
"title": "Chez l'oncle chan",
"price": 14,
"adresse": "13 route de labège, 31400 TOULOUSE",
"type": "asiatique"
}
}
]
}
}
31
Comment indexer les documents multilingues
Trois cas:
● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"})
○ Ngram
○ Analysé plusieurs fois le même champs avec un analyseur par langage
● Un champ par langue:
○ Facile car on peut spécifier un analyseur différent par langue
○ Attention de ne pas se retrouver avec un index parsemé
● Une version du document par langue (à favoriser)
○ Un index par document
○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique)
32
Gestion des synonymes
PUT /restaurant
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision", "articles_case": true,
"articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_stemmer": {"type": "stemmer", "language": "light_french"},
"french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]}
},
"analyzer": {
"french_with_synonym": {
"tokenizer": "standard",
"filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"},
"price": {"type": "integer"},
"adresse": {"type": "text", "analyzer": "french"},
"coord": {"type": "geo_point"},
33
Gestions des synonymes
GET /restaurant/restaurant/_search
{
"query": {
"match": {"description": "sous-marins"}
}
}
{
"hits": {
"hits": [
{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne,
31520 RAMONVILLE",
"type": "fastfood",
"coord": "43.5577519,1.4625753"
}
}
]
}
}
34
Données géolocalisées
PUT /restaurant
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"description": {"type": "text", "analyzer": "french"
},
"price": {"type": "integer"},
"adresse": {"type": "text","analyzer": "french"},
"coord": {"type": "geo_point"},
"type": { "type": "keyword"}
}
}
}
}
35
Données géolocalisées
POST restaurant/restaurant/_bulk
{"index": {"_id": 1}}
{"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents",
"price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"}
{"index": {"_id": 2}}
{"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés",
"price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"}
{"index": {"_id": 3}}
{"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14,
"adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"}
{"index": {"_id": 4}}
{"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8,
"adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"}
{"index": {"_id": 5}}
{"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood",
"coord": "43.5577519,1.4625753"}
{"index": {"_id": 6}}
{"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
Filtrage et trie sur données
géolocalisées
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"filter": [
{"term": {"type":"français"}},
{"geo_distance": {
"distance": "1km",
"coord": {"lat": 43.5739329, "lon": 1.4893669}
}}
]
}
},
"sort": [{
"geo_distance": {
"coord": {"lat": 43.5739329, "lon": 1.4893669},
"unit": "km"
}
}]
{
"hits": {
"hits": [
{
"_source": {
"title": "bistronomique",
"description": "Un restaurant bon mais un petit peu cher, les desserts sont
"price": 17,
"adresse": "73 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.57417,1.4905748"
},
"sort": [0.10081529266640063]
},{
"_source": {
"title:": "L'évidence",
"description": "restaurant copieux et pas cher, cependant c'est pas bon",
"price": 12,
"adresse": "38 route de revel, 31400 TOULOUSE",
"type": "français",
"coord": "43.5770109,1.4846573"
},
"sort": [0.510960087579506]
},{
"_source": {
"title:": "Chez Ingalls",
"description": "Contemporain et rustique, ce restaurant avec cheminée sert
savoyardes et des grillades",
37
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must": {"match": {"description": "sandwitch"}},
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}}
],
"must_not": [
{"match_phrase": {
"description": "pas bon"
}}
],
"filter": [
{"range": {"price": {
"lte": "20"
}}}
]
}
} 38
Explication de la requête Bool
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"should" : [
{"match": {"description": "bon"}},
{"match": {"description": "excellent"}},
{"match": {"description": "service rapide"}}
],
"minimum_number_should_match": 2
}
}
}
39
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": "-"pas bon" +(pizzi~2 OR sandwitch)"
}
}
}
GET /restaurant/restaurant/_search
{
"query": {
"bool": {
"must_not": {
"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"type": "phrase",
"query": "pas bon"
}
},
"should": [
{"multi_match": {
"fields": [ "description", , "title^2", "adresse", "type"],
"fuziness": 2,
"max_expansions": 50,
"query": "pizzi"
}
},
{"multi_match": {
"fields": [ "description", , "title^2", "adresse",
"type"],
"query": "sandwitch"
} 40
Alias: comment se donner des marges de manoeuvre
PUT /restaurant_v1/
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text"},
"lat": {"type": "double"},
"lon": {"type": "double"}
}
}
}
}
POST /_aliases
{
"actions": [
{"add": {"index": "restaurant_v1", "alias": "restaurant_search"}},
{"add": {"index": "restaurant_v1", "alias": "restaurant_write"}}
]
}
41
Alias, Pipeline et reindexion
PUT /restaurant_v2
{
"mappings": {
"restaurant": {
"properties": {
"title": {"type": "text", "analyzer": "french"},
"position": {"type": "geo_point"}
}
}
}
}
PUT /_ingest/pipeline/fixing_position
{
"description": "move lat lon into position parameter",
"processors": [
{"rename": {"field": "lat", "target_field": "position.lat"}},
{"rename": {"field": "lon", "target_field": "position.lon"}}
]
}
POST /_aliases
{
"actions": [
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_search"}},
{"remove": {"index": "restaurant_v1", "alias":
"restaurant_write"}},
{"add": {"index": "restaurant_v2", "alias":
"restaurant_search"}},
{"add": {"index": "restaurant_v2", "alias": "restaurant_write"}}
]
}
POST /_reindex
{
"source": {"index": "restaurant_v1"},
"dest": {"index": "restaurant_v2", "pipeline": "fixing_position"}
}
42
Analyse des données des interventions des pompiers
de 2005 à 2014
PUT /pompier
{
"mappings": {
"intervention": {
"properties": {
"date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"},
"type_incident": { "type": "keyword" },
"description_groupe": { "type": "keyword" },
"caserne": { "type": "integer"},
"ville": { "type": "keyword"},
"arrondissement": { "type": "keyword"},
"division": {"type": "integer"},
"position": {"type": "geo_point"},
"nombre_unites": {"type": "integer"}
}
}
}
}
43
Voir les différents incidents
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"type_incident": {
"terms": {"field": "type_incident", "size": 100}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{"key": "Premier répondant", "doc_count": 437891},
{"key": "Appel de Cie de détection", "doc_count": 76157},
{"key": "Alarme privé ou locale", "doc_count": 60879},
{"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734},
{"key": "10-22 sans feu", "doc_count": 29283},
{"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663},
{"key": "Inondation", "doc_count": 26801},
{"key": "Problèmes électriques", "doc_count": 23495},
{"key": "Aliments surchauffés", "doc_count": 23428},
{"key": "Odeur suspecte - gaz", "doc_count": 21158},
{"key": "Déchets en feu", "doc_count": 18007},
{"key": "Ascenseur", "doc_count": 12703},
{"key": "Feu de champ *", "doc_count": 11518},
{"key": "Structure dangereuse", "doc_count": 9958},
{"key": "10-22 avec feu", "doc_count": 9876},
{"key": "Alarme vérification", "doc_count": 8328},
{"key": "Aide à un citoyen", "doc_count": 7722},
{"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351},
{"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232},
{"key": "Feu de véhicule extérieur", "doc_count": 5943},
{"key": "Fausse alerte 10-19", "doc_count": 4680},
{"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494},
{"key": "Assistance serv. muni.", "doc_count": 3431},
{"key": "Avertisseur de CO", "doc_count": 2542},
{"key": "Fuite gaz naturel 10-22", "doc_count": 1928},
{"key": "Matières dangereuses / 10-22", "doc_count": 1905},
{"key": "Feu de bâtiment", "doc_count": 1880},
{"key": "Senteur de feu à l'extérieur", "doc_count": 1566},
{"key": "Surchauffe - véhicule", "doc_count": 1499},
{"key": "Feu / Agravation possible", "doc_count": 1281},
{"key": "Fuite gaz naturel 10-09", "doc_count": 1257},
{"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015},
{"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971},
44
Agrégations imbriquées
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"ville": {
"terms": {"field": "ville"},
"aggs": {
"arrondissement": {
"terms": {"field": "arrondissement"}
}
}
}
}
}
{
"aggregations": {"ville": {"buckets": [
{
"key": "Montréal", "doc_count": 768955,
"arrondissement": {"buckets": [
{"key": "Ville-Marie", "doc_count": 83010},
{"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272},
{"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933},
{"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951},
{"key": "Rosemont / Petite-Patrie", "doc_count": 59213},
{"key": "Ahuntsic / Cartierville", "doc_count": 57721},
{"key": "Plateau Mont-Royal", "doc_count": 53344},
{"key": "Montréal-Nord", "doc_count": 40757},
{"key": "Sud-Ouest", "doc_count": 39936},
{"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139}
]}
}, {
"key": "Dollard-des-Ormeaux", "doc_count": 17961,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13452},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477},
{"key": "Pierrefonds / Senneville", "doc_count": 10},
{"key": "Dorval / Ile Dorval", "doc_count": 8},
{"key": "Pointe-Claire", "doc_count": 8},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6}
]}
}, {
"key": "Pointe-Claire", "doc_count": 17925,
"arrondissement": {"buckets": [
{"key": "Indéterminé", "doc_count": 13126},
{"key": "Pointe-Claire", "doc_count": 4766},
{"key": "Dorval / Ile Dorval", "doc_count": 12},
{"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7},
{"key": "Kirkland", "doc_count": 7},
{"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5},
{"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1},
{"key": "St-Laurent", "doc_count": 1}
45
Calcul de moyenne et trie d'agrégation
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"avg_nombre_unites_general": {
"avg": {"field": "nombre_unites"}
},
"type_incident": {
"terms": {
"field": "type_incident",
"size": 5,
"order" : {"avg_nombre_unites": "desc"}
},
"aggs": {
"avg_nombre_unites": {
"avg": {"field": "nombre_unites"}
}
}
}
}
{
"aggregations": {
"type_incident": {
"buckets": [
{
"key": "Feu / 5e Alerte", "doc_count": 162,
"avg_nombre_unites": {"value": 70.9074074074074}
}, {
"key": "Feu / 4e Alerte", "doc_count": 100,
"avg_nombre_unites": {"value": 49.36}
}, {
"key": "Troisième alerte/autre que BAT", "doc_count": 1,
"avg_nombre_unites": {"value": 43.0}
}, {
"key": "Feu / 3e Alerte", "doc_count": 173,
"avg_nombre_unites": {"value": 41.445086705202314}
}, {
"key": "Deuxième alerte/autre que BAT", "doc_count": 8,
"avg_nombre_unites": {"value": 37.5}
}
]
},
"avg_nombre_unites_general": {"value": 2.1374461758713728}
}
} 46
Percentile
GET /pompier/interventions/_search
{
"size": 0,
"aggs": {
"unites_percentile": {
"percentiles": {
"field": "nombre_unites",
"percents": [25, 50, 75, 100]
}
}
}
}
{
"aggregations": {
"unites_percentile": {
"values": {
"25.0": 1.0,
"50.0": 1.0,
"75.0": 3.0,
"100.0": 275.0
}
}
}
}
47
Histogram
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"unites_histogram": {
"histogram": {
"field": "nombre_unites",
"order": {"_key": "asc"},
"interval": 1
},
"aggs": {
"ville": {
"terms": {"field": "ville", "size": 1}
}
}
}
}
}
{
"aggregations": {
"unites_histogram": {
"buckets": [
{
"key": 1.0, "doc_count": 23507,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]}
},{
"key": 2.0, "doc_count": 1550,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]}
},{
"key": 3.0, "doc_count": 563,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]}
},{
"key": 4.0, "doc_count": 449,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]}
},{
"key": 5.0, "doc_count": 310,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]}
},{
"key": 6.0, "doc_count": 215,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]}
},{
"key": 7.0, "doc_count": 136,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]}
},{
"key": 8.0, "doc_count": 35,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]}
},{
"key": 9.0, "doc_count": 10,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 10.0, "doc_count": 11,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]}
},{
"key": 11.0, "doc_count": 2,
"ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]}
48
“Significant term”
GET /pompier/interventions/_search
{
"size": 0,
"query": {
"term": {"type_incident": "Inondation"}
},
"aggs": {
"ville": {
"significant_terms": {"field": "ville", "size": 5, "percentage": {}}
}
}
}
{
"aggregations": {
"ville": {
"doc_count": 26801,
"buckets": [
{
"key": "Ile-Bizard",
"score": 0.10029498525073746,
"doc_count": 68, "bg_count": 678
},
{
"key": "Montréal-Nord",
"score": 0.0826544804291675,
"doc_count": 416, "bg_count": 5033
},
{
"key": "Roxboro",
"score": 0.08181818181818182,
"doc_count": 27, "bg_count": 330
},
{
"key": "Côte St-Luc",
"score": 0.07654825526563974,
"doc_count": 487, "bg_count": 6362
},
{
"key": "Saint-Laurent",
"score": 0.07317073170731707,
"doc_count": 465, "bg_count": 6355
49
Agrégation et données géolocalisées
GET :url/pompier/interventions/_search
{
"size": 0,
"query": {
"regexp": {"type_incident": "Feu.*"}
},
"aggs": {
"distance_from_here": {
"geo_distance": {
"field": "position",
"unit": "km",
"origin": {
"lat": 45.495902,
"lon": -73.554263
},
"ranges": [
{ "to": 2},
{"from":2, "to": 4},
{"from":4, "to": 6},
{"from": 6, "to": 8},
{"from": 8}]
}
}
}
{
"aggregations": {
"distance_from_here": {
"buckets": [
{
"key": "*-2.0",
"from": 0.0,
"to": 2.0,
"doc_count": 80
},
{
"key": "2.0-4.0",
"from": 2.0,
"to": 4.0,
"doc_count": 266
},
{
"key": "4.0-6.0",
"from": 4.0,
"to": 6.0,
"doc_count": 320
},
{
"key": "6.0-8.0",
"from": 6.0,
"to": 8.0,
"doc_count": 326
},
{
"key": "8.0-*",
"from": 8.0,
"doc_count": 1720
}
]
}
}
}
50
Il y a t-il des questions ?
? 51
Proposer une recherche avancé
à vos utilisateurs
GET /restaurant/restaurant/_search
{
"query": {
"simple_query_string": {
"fields": ["description", "title^2", "adresse", "type"],
"query": ""service rapide"~2"
}
}
}
"hits": {
"hits": [
{
"_source": {
"title:": "Un fastfood très connu",
"description": "service très rapide,
rapport qualité/prix médiocre",
"price": 8,
"adresse": "210 route de narbonne, 31520
RAMONVILLE",
"type": "fastfood",
"coord": "43.5536343,1.476165"
}
},{
"_source": {
"title:": "Subway",
"description": "service très rapide,
rapport qualité/prix médiocre mais on peut choisir la
composition de son sandwitch",
"price": 8,
"adresse": "211 route de narbonne, 31520
GET /restaurant/restaurant/_search
{
"query": {
"match_phrase": {
"description": {
"slop": 2,
"query": "service rapide"
}
}
}
52

Más contenido relacionado

La actualidad más candente

Orion Context Broker 20220526
Orion Context Broker 20220526Orion Context Broker 20220526
Orion Context Broker 20220526Fermin Galan
 
Introduction to JWT and How to integrate with Spring Security
Introduction to JWT and How to integrate with Spring SecurityIntroduction to JWT and How to integrate with Spring Security
Introduction to JWT and How to integrate with Spring SecurityBruno Henrique Rother
 
Prometheus Project Journey
Prometheus Project JourneyPrometheus Project Journey
Prometheus Project JourneyJinwoong Kim
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance TuningPuneet Behl
 
4. 대용량 아키텍쳐 설계 패턴
4. 대용량 아키텍쳐 설계 패턴4. 대용량 아키텍쳐 설계 패턴
4. 대용량 아키텍쳐 설계 패턴Terry Cho
 
REST API and CRUD
REST API and CRUDREST API and CRUD
REST API and CRUDPrem Sanil
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks EDB
 
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsHydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsMarkus Lanthaler
 
Working with NoSQL in a SQL Database (XDevApi)
Working with NoSQL in a SQL Database (XDevApi)Working with NoSQL in a SQL Database (XDevApi)
Working with NoSQL in a SQL Database (XDevApi)Lior Altarescu
 
Full-on Hypermedia APIs with Hydra
Full-on Hypermedia APIs with HydraFull-on Hypermedia APIs with Hydra
Full-on Hypermedia APIs with HydraMarkus Lanthaler
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
더 나은 개발자 되기
더 나은 개발자 되기더 나은 개발자 되기
더 나은 개발자 되기JeongHun Byeon
 
Model Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresModel Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresMarkus Lanthaler
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases MongoDB
 
RxJS Evolved
RxJS EvolvedRxJS Evolved
RxJS Evolvedtrxcllnt
 

La actualidad más candente (20)

Orion Context Broker 20220526
Orion Context Broker 20220526Orion Context Broker 20220526
Orion Context Broker 20220526
 
React introduction
React introductionReact introduction
React introduction
 
Introduction to JWT and How to integrate with Spring Security
Introduction to JWT and How to integrate with Spring SecurityIntroduction to JWT and How to integrate with Spring Security
Introduction to JWT and How to integrate with Spring Security
 
JSON-LD and MongoDB
JSON-LD and MongoDBJSON-LD and MongoDB
JSON-LD and MongoDB
 
Prometheus Project Journey
Prometheus Project JourneyPrometheus Project Journey
Prometheus Project Journey
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
4. 대용량 아키텍쳐 설계 패턴
4. 대용량 아키텍쳐 설계 패턴4. 대용량 아키텍쳐 설계 패턴
4. 대용량 아키텍쳐 설계 패턴
 
REST API and CRUD
REST API and CRUDREST API and CRUD
REST API and CRUD
 
GRPC.pptx
GRPC.pptxGRPC.pptx
GRPC.pptx
 
MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks MongoDB vs. Postgres Benchmarks
MongoDB vs. Postgres Benchmarks
 
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIsHydra: A Vocabulary for Hypermedia-Driven Web APIs
Hydra: A Vocabulary for Hypermedia-Driven Web APIs
 
Working with NoSQL in a SQL Database (XDevApi)
Working with NoSQL in a SQL Database (XDevApi)Working with NoSQL in a SQL Database (XDevApi)
Working with NoSQL in a SQL Database (XDevApi)
 
Full-on Hypermedia APIs with Hydra
Full-on Hypermedia APIs with HydraFull-on Hypermedia APIs with Hydra
Full-on Hypermedia APIs with Hydra
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Data Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LDData Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LD
 
더 나은 개발자 되기
더 나은 개발자 되기더 나은 개발자 되기
더 나은 개발자 되기
 
Angular Observables & RxJS Introduction
Angular Observables & RxJS IntroductionAngular Observables & RxJS Introduction
Angular Observables & RxJS Introduction
 
Model Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresModel Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON Structures
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
 
RxJS Evolved
RxJS EvolvedRxJS Evolved
RxJS Evolved
 

Más de LINAGORA

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels LINAGORA
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !LINAGORA
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques LINAGORA
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS MeetupLINAGORA
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFILINAGORA
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)LINAGORA
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseLINAGORA
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalLINAGORA
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésLINAGORA
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »LINAGORA
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet LINAGORA
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLINAGORA
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wirelessLINAGORA
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du CloudLINAGORA
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPLINAGORA
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINIDLINAGORA
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...LINAGORA
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...LINAGORA
 

Más de LINAGORA (20)

Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels Personal branding : e-recrutement et réseaux sociaux professionnels
Personal branding : e-recrutement et réseaux sociaux professionnels
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
 
ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques ChatBots et intelligence artificielle arrivent dans les banques
ChatBots et intelligence artificielle arrivent dans les banques
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Call a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFICall a C API from Python becomes more enjoyable with CFFI
Call a C API from Python becomes more enjoyable with CFFI
 
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
[UDS] Cloud Computing "pour les nuls" (Exemple avec LinShare)
 
Angular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entrepriseAngular v2 et plus : le futur du développement d'applications en entreprise
Angular v2 et plus : le futur du développement d'applications en entreprise
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Industrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec DrupalIndustrialisez le développement et la maintenance de vos sites avec Drupal
Industrialisez le développement et la maintenance de vos sites avec Drupal
 
CapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivitésCapDémat Evolution plateforme de GRU pour collectivités
CapDémat Evolution plateforme de GRU pour collectivités
 
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »Présentation du marché P2I UGAP « Support sur Logiciels Libres »
Présentation du marché P2I UGAP « Support sur Logiciels Libres »
 
Offre de demat d'Adullact projet
Offre de demat d'Adullact projet Offre de demat d'Adullact projet
Offre de demat d'Adullact projet
 
La dématérialisation du conseil minicipal
La dématérialisation du conseil minicipalLa dématérialisation du conseil minicipal
La dématérialisation du conseil minicipal
 
Open stack @ sierra wireless
Open stack @ sierra wirelessOpen stack @ sierra wireless
Open stack @ sierra wireless
 
OpenStack - open source au service du Cloud
OpenStack - open source au service du CloudOpenStack - open source au service du Cloud
OpenStack - open source au service du Cloud
 
Architecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAPArchitecture d'annuaire hautement disponible avec OpenLDAP
Architecture d'annuaire hautement disponible avec OpenLDAP
 
Présentation offre LINID
Présentation offre LINIDPrésentation offre LINID
Présentation offre LINID
 
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
Matinée pour conmrendre consacrée à LinID.org, gestion, fédération et contrôl...
 
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
Matinée pour conmrendre consacrée à LinShare.org, application de partage de f...
 

Último

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Último (20)

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

Comment faire ses mappings ElasticSearch aux petits oignons ? - LINAGORA

  • 2. Indexation d’un annuaire de restaurant ● Titre ● Description ● Prix ● Adresse ● Type 2
  • 3. Création d’un index sans mapping PUT restaurant { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } } 3
  • 4. Indexation sans mapping PUT restaurant/restaurant/1 { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } 4
  • 5. Risque de l’indexation sans mapping PUT restaurant/restaurant/2 { "title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } { "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [title]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [title]", "caused_by": { "type": "number_format_exception", "reason": "For input string: "Pizza de l'ormeau"" } }, "status": 400 } 5
  • 6. Mapping inféré GET /restaurant/_mapping { "restaurant": { "mappings": { "restaurant": { "properties": { "adresse": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "prix": { "type": "long" }, "title": { "type": "long" }, "type": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } } 6
  • 7. Création d’un mapping PUT :url/restaurant { "settings": { "index": {"number_of_shards": 3, "number_of_replicas": 2} }, "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "description": {"type": "text"}, "price": {"type": "integer"}, "adresse": {"type": "text"}, "type": { "type": "keyword"} } } } } 7
  • 8. Indexation de quelques restaurants POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 8
  • 9. Recherche basique GET :url/restaurant/_search { "query": { "match": { "description": "asiatique" } } } { "hits": { "total": 1, "max_score": 0.6395861, "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 9
  • 10. Mise en défaut de notre mapping GET :url/restaurant/_search { "query": { "match": { "description": "asiatiques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 10
  • 11. Qu’est ce qu’un analyseur ● Transforme une chaîne de caractères en token ○ Ex: “Le chat est rouge” -> [“le”, “chat”, “est”, “rouge”] ● Les tokens permettent de construire un index inversé 11
  • 12. Qu’est ce qu’un index inversé 12
  • 13. Explication: analyseur par défaut GET /_analyze { "analyzer": "standard", "text": "Un restaurant asiatique très copieux" } { "tokens": [{ "token": "un", "start_offset": 0, "end_offset": 2, "type": "<ALPHANUM>", "position": 0 },{ "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiatique", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "très", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieux", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 13
  • 14. Explication: analyseur “french” GET /_analyze { "analyzer": "french", "text": "Un restaurant asiatique très copieux" } { "tokens": [ { "token": "restaurant", "start_offset": 3, "end_offset": 13, "type": "<ALPHANUM>", "position": 1 },{ "token": "asiat", "start_offset": 14, "end_offset": 23, "type": "<ALPHANUM>", "position": 2 },{ "token": "trè", "start_offset": 24, "end_offset": 28, "type": "<ALPHANUM>", "position": 3 },{ "token": "copieu", "start_offset": 29, "end_offset": 36, "type": "<ALPHANUM>", "position": 4 } ] } 14
  • 15. Décomposition d’un analyseur Elasticsearch décompose l’analyse en trois étapes: ● Filtrage des caractères (ex: suppression de balises html) ● Découpage en “token” ● Filtrage des tokens: ○ Suppression de token (mot vide de sens “un”, “le”, “la”) ○ Transformation (lemmatisation...) ○ Ajout de tokens (synonyme) 15
  • 16. Décomposition de l’analyseur french GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "elision", "articles_case": true, "articles": [ "l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu" ] }, { "type": "stop", "stopwords": "_french_" }, { "type": "stemmer", "language": "french" } ], "text": "ce n'est qu'un restaurant asiatique très copieux" } “ce n’est qu’un restaurant asiatique très copieux” [“ce”, “n’est”, “qu’un”, “restaurant”, “asiatique”, “très”, “copieux”] [“ce”, “est”, “un”, “restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiatique”, “très”, “copieux”] [“restaurant”, “asiat”, “trè”, “copieu”] elision standard tokenizer stopwords french stemming 16
  • 17. Spécification de l’analyseur dans le mapping { "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 2 } }, "mappings": { "restaurant": { "properties": { "title": {fields: {"type": "text", "analyzer": "french"}}, "description": {"type": "text", "analyzer": "french"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "type": { "type": "keyword"} } } } } 17
  • 18. Recherche résiliente aux erreurs de frappe GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "total": 0, "max_score": null, "hits": [] } } 18
  • 19. Une solution le ngram token filter GET /_analyze { "tokenizer": "standard", "filter": [ { "type": "ngram", "min_gram": 3, "max_gram": 7 } ], "text": "asiatuque" } [ "asi", "asia", "asiat", "asiatu", "asiatuq", "sia", "siat", "siatu", "siatuq", "siatuqu", "iat", "iatu", "iatuq", "iatuqu", "iatuque", "atu", "atuq", "atuqu", "atuque", "tuq", "tuqu", "tuque", "uqu", "uque", "que" ] 19
  • 20. Création d’un analyseur custom pour utiliser le ngram filter PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]}} } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "ngram_analyzer"}, "description": {"type": "text", "analyzer": "ngram_analyzer"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "ngram_analyzer"}, "type": {"type": "keyword"} } } } 20
  • 21. GET /restaurant/restaurant/_search { "query": { "match": { "description": "asiatuques" } } } { "hits": { "hits": [ { "_score": 0.60128295, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_score": 0.46237043, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" 21
  • 22. Bruit induit par le ngram GET /restaurant/restaurant/_search { "query": { "match": { "description": "gastronomique" } } } { "hits": { "hits": [ { "_score": 0.6277555, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_score": 0.56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, 22
  • 23. Spécifier plusieurs analyseurs pour un champs PUT /restaurant { "settings": { "analysis": { "filter": {"custom_ngram": {"type": "ngram", "min_gram": 3, "max_gram": 7}}, "analyzer": {"ngram_analyzer": {"tokenizer": "standard", "filter": ["asciifolding", "custom_ngram"]} } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "fields": { "ngram": { "type": "text", "analyzer": "ngram_analyzer"} }, "price": {"type": "integer"}, 23
  • 24. Utilisation de plusieurs champs lors d’une recherche GET /restaurant/restaurant/_search { "query": { "multi_match": { "query": "gastronomique", "fields": [ "description^4", "description.ngram" ] } } } { "hits": { "hits": [ { "_score": 2.0649285, "_source": { "title": 42, "description": "Un restaurant gastronomique où tout plat coûte 42 euros", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0 .56373334, "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } }, { "_index": "restaurant", 24
  • 25. Ignorer ou ne pas ignorer les stopwords tel est la question POST :url/restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique"} 25
  • 26. Les stopwords ne sont pas forcément vide de sens GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": 42, "description": "Un restaurant gastronomique donc cher ou tout plat coûte cher (42 euros)", "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } },{ "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } 26
  • 27. Modification de l’analyser french pour garder les stopwords PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": [“l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"} }, "analyzer": { "custom_french": { "tokenizer": "standard", "filter": [ "french_elision", "lowercase", "french_stemmer" ] } 27
  • 28. GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": "pas cher" } } } { "hits": { "hits": [ { "_source": { "title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux et pas cher", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 28
  • 29. Rechercher avec les stopwords sans diminuer les performances GET /restaurant/restaurant/_search { "query": { "match": { "description": { "query": "restaurant pas cher", "cutoff_frequency": 0.01 } } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must": { "bool": { "should": [ {"term": {"description": "restaurant"}}, {"term": {"description": "cher"}}] } }, "should": [ {"match": { "description": "pas" }} ] } 29
  • 30. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "functions": [{ "filter": { "terms": { "type": ["asiatique", "italien"]}}, "weight": 2 }] } } } 30
  • 31. Personnaliser le “scoring” GET /restaurant/restaurant/_search { "query": { "function_score": { "query": { "match": { "adresse": "toulouse" } }, "script_score": { "script": { "lang": "painless", "inline": "_score * ( 1 + 10/doc['prix'].value)" } } } } } { "hits": { "hits": [ { "_score": 0.53484553, "_source": { "title": "Pizza de l'ormeau", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien" } }, { "_score": 0.26742277, "_source": { "title": 42, "price": 42, "adresse": "10 rue de l'industrie, 31000 TOULOUSE", "type": "gastronomie" } }, { "_score": 0.26742277, "_source": { "title": "Chez l'oncle chan", "price": 14, "adresse": "13 route de labège, 31400 TOULOUSE", "type": "asiatique" } } ] } } 31
  • 32. Comment indexer les documents multilingues Trois cas: ● Champs avec plusieurs langages (ex: {"message": "warning | attention | cuidado"}) ○ Ngram ○ Analysé plusieurs fois le même champs avec un analyseur par langage ● Un champ par langue: ○ Facile car on peut spécifier un analyseur différent par langue ○ Attention de ne pas se retrouver avec un index parsemé ● Une version du document par langue (à favoriser) ○ Un index par document ○ Surtout ne pas utiliser des types pour chaque langue dans le même index (problème de statistique) 32
  • 33. Gestion des synonymes PUT /restaurant { "settings": { "analysis": { "filter": { "french_elision": { "type": "elision", "articles_case": true, "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"] }, "french_stemmer": {"type": "stemmer", "language": "light_french"}, "french_synonym": {"type": "synonym", "synonyms": ["sou marin => sandwitch", "formul, menu"]} }, "analyzer": { "french_with_synonym": { "tokenizer": "standard", "filter": ["french_elision", "lowercase", "french_stemmer", "french_synonym"] } } } }, "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": { "type": "text", "analyzer": "french", "search_analyzer": "french_with_synonym"}, "price": {"type": "integer"}, "adresse": {"type": "text", "analyzer": "french"}, "coord": {"type": "geo_point"}, 33
  • 34. Gestions des synonymes GET /restaurant/restaurant/_search { "query": { "match": {"description": "sous-marins"} } } { "hits": { "hits": [ { "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753" } } ] } } 34
  • 35. Données géolocalisées PUT /restaurant { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "description": {"type": "text", "analyzer": "french" }, "price": {"type": "integer"}, "adresse": {"type": "text","analyzer": "french"}, "coord": {"type": "geo_point"}, "type": { "type": "keyword"} } } } } 35
  • 36. Données géolocalisées POST restaurant/restaurant/_bulk {"index": {"_id": 1}} {"title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont excellents", "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748"} {"index": {"_id": 2}} {"title": "Pizza de l'ormeau", "description": "Dans cette pizzeria on trouve des pizzas très bonnes et très variés", "price": 10, "adresse": "1 place de l'ormeau, 31400 TOULOUSE", "type": "italien", "coord": "43.579225,1.4835248"} {"index": {"_id": 3}} {"title": "Chez l'oncle chan", "description": "Restaurant asiatique très copieux pour un prix contenu", "price": 14, "adresse": "18 rue des cosmonautetes, 31400 TOULOUSE", "type": "asiatique", "coord": "43.5612759,1.4936073"} {"index": {"_id": 4}} {"title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165"} {"index": {"_id": 5}} {"title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5577519,1.4625753"} {"index": {"_id": 6}} {"title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573"} 36
  • 37. Filtrage et trie sur données géolocalisées GET /restaurant/restaurant/_search { "query": { "bool": { "filter": [ {"term": {"type":"français"}}, {"geo_distance": { "distance": "1km", "coord": {"lat": 43.5739329, "lon": 1.4893669} }} ] } }, "sort": [{ "geo_distance": { "coord": {"lat": 43.5739329, "lon": 1.4893669}, "unit": "km" } }] { "hits": { "hits": [ { "_source": { "title": "bistronomique", "description": "Un restaurant bon mais un petit peu cher, les desserts sont "price": 17, "adresse": "73 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.57417,1.4905748" }, "sort": [0.10081529266640063] },{ "_source": { "title:": "L'évidence", "description": "restaurant copieux et pas cher, cependant c'est pas bon", "price": 12, "adresse": "38 route de revel, 31400 TOULOUSE", "type": "français", "coord": "43.5770109,1.4846573" }, "sort": [0.510960087579506] },{ "_source": { "title:": "Chez Ingalls", "description": "Contemporain et rustique, ce restaurant avec cheminée sert savoyardes et des grillades", 37
  • 38. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "must": {"match": {"description": "sandwitch"}}, "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}} ], "must_not": [ {"match_phrase": { "description": "pas bon" }} ], "filter": [ {"range": {"price": { "lte": "20" }}} ] } } 38
  • 39. Explication de la requête Bool GET /restaurant/restaurant/_search { "query": { "bool": { "should" : [ {"match": {"description": "bon"}}, {"match": {"description": "excellent"}}, {"match": {"description": "service rapide"}} ], "minimum_number_should_match": 2 } } } 39
  • 40. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": "-"pas bon" +(pizzi~2 OR sandwitch)" } } } GET /restaurant/restaurant/_search { "query": { "bool": { "must_not": { "multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "type": "phrase", "query": "pas bon" } }, "should": [ {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "fuziness": 2, "max_expansions": 50, "query": "pizzi" } }, {"multi_match": { "fields": [ "description", , "title^2", "adresse", "type"], "query": "sandwitch" } 40
  • 41. Alias: comment se donner des marges de manoeuvre PUT /restaurant_v1/ { "mappings": { "restaurant": { "properties": { "title": {"type": "text"}, "lat": {"type": "double"}, "lon": {"type": "double"} } } } } POST /_aliases { "actions": [ {"add": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v1", "alias": "restaurant_write"}} ] } 41
  • 42. Alias, Pipeline et reindexion PUT /restaurant_v2 { "mappings": { "restaurant": { "properties": { "title": {"type": "text", "analyzer": "french"}, "position": {"type": "geo_point"} } } } } PUT /_ingest/pipeline/fixing_position { "description": "move lat lon into position parameter", "processors": [ {"rename": {"field": "lat", "target_field": "position.lat"}}, {"rename": {"field": "lon", "target_field": "position.lon"}} ] } POST /_aliases { "actions": [ {"remove": {"index": "restaurant_v1", "alias": "restaurant_search"}}, {"remove": {"index": "restaurant_v1", "alias": "restaurant_write"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_search"}}, {"add": {"index": "restaurant_v2", "alias": "restaurant_write"}} ] } POST /_reindex { "source": {"index": "restaurant_v1"}, "dest": {"index": "restaurant_v2", "pipeline": "fixing_position"} } 42
  • 43. Analyse des données des interventions des pompiers de 2005 à 2014 PUT /pompier { "mappings": { "intervention": { "properties": { "date": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss"}, "type_incident": { "type": "keyword" }, "description_groupe": { "type": "keyword" }, "caserne": { "type": "integer"}, "ville": { "type": "keyword"}, "arrondissement": { "type": "keyword"}, "division": {"type": "integer"}, "position": {"type": "geo_point"}, "nombre_unites": {"type": "integer"} } } } } 43
  • 44. Voir les différents incidents GET /pompier/interventions/_search { "size": 0, "aggs": { "type_incident": { "terms": {"field": "type_incident", "size": 100} } } } { "aggregations": { "type_incident": { "buckets": [ {"key": "Premier répondant", "doc_count": 437891}, {"key": "Appel de Cie de détection", "doc_count": 76157}, {"key": "Alarme privé ou locale", "doc_count": 60879}, {"key": "Ac.véh./1R/s.v./ext/29B/D", "doc_count": 41734}, {"key": "10-22 sans feu", "doc_count": 29283}, {"key": "Acc. sans victime sfeu - ext.", "doc_count": 27663}, {"key": "Inondation", "doc_count": 26801}, {"key": "Problèmes électriques", "doc_count": 23495}, {"key": "Aliments surchauffés", "doc_count": 23428}, {"key": "Odeur suspecte - gaz", "doc_count": 21158}, {"key": "Déchets en feu", "doc_count": 18007}, {"key": "Ascenseur", "doc_count": 12703}, {"key": "Feu de champ *", "doc_count": 11518}, {"key": "Structure dangereuse", "doc_count": 9958}, {"key": "10-22 avec feu", "doc_count": 9876}, {"key": "Alarme vérification", "doc_count": 8328}, {"key": "Aide à un citoyen", "doc_count": 7722}, {"key": "Fuite ext.:hydrocar. liq. div.", "doc_count": 7351}, {"key": "Ac.véh./1R/s.v./V.R./29B/D", "doc_count": 6232}, {"key": "Feu de véhicule extérieur", "doc_count": 5943}, {"key": "Fausse alerte 10-19", "doc_count": 4680}, {"key": "Acc. sans victime sfeu - v.r", "doc_count": 3494}, {"key": "Assistance serv. muni.", "doc_count": 3431}, {"key": "Avertisseur de CO", "doc_count": 2542}, {"key": "Fuite gaz naturel 10-22", "doc_count": 1928}, {"key": "Matières dangereuses / 10-22", "doc_count": 1905}, {"key": "Feu de bâtiment", "doc_count": 1880}, {"key": "Senteur de feu à l'extérieur", "doc_count": 1566}, {"key": "Surchauffe - véhicule", "doc_count": 1499}, {"key": "Feu / Agravation possible", "doc_count": 1281}, {"key": "Fuite gaz naturel 10-09", "doc_count": 1257}, {"key": "Acc.véh/1rép/vict/ext 29D04", "doc_count": 1015}, {"key": "Acc. véh victime sfeu - (ext.)", "doc_count": 971}, 44
  • 45. Agrégations imbriquées GET /pompier/interventions/_search { "size": 0, "aggs": { "ville": { "terms": {"field": "ville"}, "aggs": { "arrondissement": { "terms": {"field": "arrondissement"} } } } } } { "aggregations": {"ville": {"buckets": [ { "key": "Montréal", "doc_count": 768955, "arrondissement": {"buckets": [ {"key": "Ville-Marie", "doc_count": 83010}, {"key": "Mercier / Hochelaga-Maisonneuve", "doc_count": 67272}, {"key": "Côte-des-Neiges / Notre-Dame-de-Grâce", "doc_count": 65933}, {"key": "Villeray / St-Michel / Parc Extension", "doc_count": 60951}, {"key": "Rosemont / Petite-Patrie", "doc_count": 59213}, {"key": "Ahuntsic / Cartierville", "doc_count": 57721}, {"key": "Plateau Mont-Royal", "doc_count": 53344}, {"key": "Montréal-Nord", "doc_count": 40757}, {"key": "Sud-Ouest", "doc_count": 39936}, {"key": "Rivière-des-Prairies / Pointe-aux-Trembles", "doc_count": 38139} ]} }, { "key": "Dollard-des-Ormeaux", "doc_count": 17961, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13452}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 4477}, {"key": "Pierrefonds / Senneville", "doc_count": 10}, {"key": "Dorval / Ile Dorval", "doc_count": 8}, {"key": "Pointe-Claire", "doc_count": 8}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 6} ]} }, { "key": "Pointe-Claire", "doc_count": 17925, "arrondissement": {"buckets": [ {"key": "Indéterminé", "doc_count": 13126}, {"key": "Pointe-Claire", "doc_count": 4766}, {"key": "Dorval / Ile Dorval", "doc_count": 12}, {"key": "Dollard-des-Ormeaux / Roxboro", "doc_count": 7}, {"key": "Kirkland", "doc_count": 7}, {"key": "Beaconsfield / Baie d'Urfé", "doc_count": 5}, {"key": "Ile-Bizard / Ste-Geneviève / Ste-A-de-B", "doc_count": 1}, {"key": "St-Laurent", "doc_count": 1} 45
  • 46. Calcul de moyenne et trie d'agrégation GET /pompier/interventions/_search { "size": 0, "aggs": { "avg_nombre_unites_general": { "avg": {"field": "nombre_unites"} }, "type_incident": { "terms": { "field": "type_incident", "size": 5, "order" : {"avg_nombre_unites": "desc"} }, "aggs": { "avg_nombre_unites": { "avg": {"field": "nombre_unites"} } } } } { "aggregations": { "type_incident": { "buckets": [ { "key": "Feu / 5e Alerte", "doc_count": 162, "avg_nombre_unites": {"value": 70.9074074074074} }, { "key": "Feu / 4e Alerte", "doc_count": 100, "avg_nombre_unites": {"value": 49.36} }, { "key": "Troisième alerte/autre que BAT", "doc_count": 1, "avg_nombre_unites": {"value": 43.0} }, { "key": "Feu / 3e Alerte", "doc_count": 173, "avg_nombre_unites": {"value": 41.445086705202314} }, { "key": "Deuxième alerte/autre que BAT", "doc_count": 8, "avg_nombre_unites": {"value": 37.5} } ] }, "avg_nombre_unites_general": {"value": 2.1374461758713728} } } 46
  • 47. Percentile GET /pompier/interventions/_search { "size": 0, "aggs": { "unites_percentile": { "percentiles": { "field": "nombre_unites", "percents": [25, 50, 75, 100] } } } } { "aggregations": { "unites_percentile": { "values": { "25.0": 1.0, "50.0": 1.0, "75.0": 3.0, "100.0": 275.0 } } } } 47
  • 48. Histogram GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "unites_histogram": { "histogram": { "field": "nombre_unites", "order": {"_key": "asc"}, "interval": 1 }, "aggs": { "ville": { "terms": {"field": "ville", "size": 1} } } } } } { "aggregations": { "unites_histogram": { "buckets": [ { "key": 1.0, "doc_count": 23507, "ville": {"buckets": [{"key": "Montréal", "doc_count": 19417}]} },{ "key": 2.0, "doc_count": 1550, "ville": {"buckets": [{"key": "Montréal", "doc_count": 1229}]} },{ "key": 3.0, "doc_count": 563, "ville": {"buckets": [{"key": "Montréal", "doc_count": 404}]} },{ "key": 4.0, "doc_count": 449, "ville": {"buckets": [{"key": "Montréal", "doc_count": 334}]} },{ "key": 5.0, "doc_count": 310, "ville": {"buckets": [{"key": "Montréal", "doc_count": 253}]} },{ "key": 6.0, "doc_count": 215, "ville": {"buckets": [{"key": "Montréal", "doc_count": 173}]} },{ "key": 7.0, "doc_count": 136, "ville": {"buckets": [{"key": "Montréal", "doc_count": 112}]} },{ "key": 8.0, "doc_count": 35, "ville": {"buckets": [{"key": "Montréal", "doc_count": 30}]} },{ "key": 9.0, "doc_count": 10, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 10.0, "doc_count": 11, "ville": {"buckets": [{"key": "Montréal", "doc_count": 8}]} },{ "key": 11.0, "doc_count": 2, "ville": {"buckets": [{"key": "Montréal", "doc_count": 2}]} 48
  • 49. “Significant term” GET /pompier/interventions/_search { "size": 0, "query": { "term": {"type_incident": "Inondation"} }, "aggs": { "ville": { "significant_terms": {"field": "ville", "size": 5, "percentage": {}} } } } { "aggregations": { "ville": { "doc_count": 26801, "buckets": [ { "key": "Ile-Bizard", "score": 0.10029498525073746, "doc_count": 68, "bg_count": 678 }, { "key": "Montréal-Nord", "score": 0.0826544804291675, "doc_count": 416, "bg_count": 5033 }, { "key": "Roxboro", "score": 0.08181818181818182, "doc_count": 27, "bg_count": 330 }, { "key": "Côte St-Luc", "score": 0.07654825526563974, "doc_count": 487, "bg_count": 6362 }, { "key": "Saint-Laurent", "score": 0.07317073170731707, "doc_count": 465, "bg_count": 6355 49
  • 50. Agrégation et données géolocalisées GET :url/pompier/interventions/_search { "size": 0, "query": { "regexp": {"type_incident": "Feu.*"} }, "aggs": { "distance_from_here": { "geo_distance": { "field": "position", "unit": "km", "origin": { "lat": 45.495902, "lon": -73.554263 }, "ranges": [ { "to": 2}, {"from":2, "to": 4}, {"from":4, "to": 6}, {"from": 6, "to": 8}, {"from": 8}] } } } { "aggregations": { "distance_from_here": { "buckets": [ { "key": "*-2.0", "from": 0.0, "to": 2.0, "doc_count": 80 }, { "key": "2.0-4.0", "from": 2.0, "to": 4.0, "doc_count": 266 }, { "key": "4.0-6.0", "from": 4.0, "to": 6.0, "doc_count": 320 }, { "key": "6.0-8.0", "from": 6.0, "to": 8.0, "doc_count": 326 }, { "key": "8.0-*", "from": 8.0, "doc_count": 1720 } ] } } } 50
  • 51. Il y a t-il des questions ? ? 51
  • 52. Proposer une recherche avancé à vos utilisateurs GET /restaurant/restaurant/_search { "query": { "simple_query_string": { "fields": ["description", "title^2", "adresse", "type"], "query": ""service rapide"~2" } } } "hits": { "hits": [ { "_source": { "title:": "Un fastfood très connu", "description": "service très rapide, rapport qualité/prix médiocre", "price": 8, "adresse": "210 route de narbonne, 31520 RAMONVILLE", "type": "fastfood", "coord": "43.5536343,1.476165" } },{ "_source": { "title:": "Subway", "description": "service très rapide, rapport qualité/prix médiocre mais on peut choisir la composition de son sandwitch", "price": 8, "adresse": "211 route de narbonne, 31520 GET /restaurant/restaurant/_search { "query": { "match_phrase": { "description": { "slop": 2, "query": "service rapide" } } } 52