Percolate queries are also known as Persistent queries, Prospective search, document routing, search in reverse, and inverse search.
The traditional way of conducting searches involves storing documents and performing search queries against them. However, there are cases where we want to apply a query to a newly incoming document to signal a match. Some scenarios where this is desired include monitoring systems that collect data and notify users about specific events, such as reaching a certain threshold for a metric or a particular value appearing in the monitored data. Another example is news aggregation, where users may want to be notified only about certain categories or topics, or even specific “keywords.”
In these situations, traditional search is not the best fit, as it assumes the desired search is performed over the entire collection. This process gets multiplied by the number of users, resulting in many queries running over the entire collection, which can cause significant additional load. The alternative approach described in this section involves storing the queries instead and testing them against an incoming new document or a batch of documents.
Google Alerts, AlertHN, Bloomberg Terminal, and other systems that allow users to subscribe to specific content utilize similar technology.
- See percolate for information on creating a PQ table.
- See Adding rules to a percolate table to learn how to add percolate rules (also known as PQ rules). Here’s a quick example:
The key thing to remember about percolate queries is that your search queries are already in the table. What you need to provide are documents to check if any of them match any of the stored rules.
You can perform a percolate query via SQL or JSON interfaces, as well as using programming language clients. The SQL approach offers more flexibility, while the HTTP method is simpler and provides most of what you need. The table below can help you understand the differences.
Desired Behavior | SQL | HTTP | PHP |
---|---|---|---|
Provide a single document | CALL PQ('tbl', '{doc1}') |
query.percolate.document{doc1} |
$client->pq()->search([$percolate]) |
Provide a single document (alternative) | CALL PQ('tbl', 'doc1', 0 as docs_json) |
- | |
Provide multiple documents | CALL PQ('tbl', ('doc1', 'doc2'), 0 as docs_json) |
query.percolate.documents[{doc1}, {doc2}] |
$client->pq()->search([$percolate]) |
Provide multiple documents (alternative) | CALL PQ('tbl', ('{doc1}', '{doc2}')) |
- | - |
Provide multiple documents (alternative) | CALL PQ('tbl', '[{doc1}, {doc2}]') |
- | - |
Return matching document ids | 0/1 as docs (disabled by default) | Enabled by default | Enabled by default |
Use document’s own id to show in the result | ‘id field’ as docs_id (disabled by default) | Not available | Not available |
Consider input documents are JSON | 1 as docs_json (1 by default) | Enabled by default | Enabled by default |
Consider input documents are plain text | 0 as docs_json (1 by default) | Not available | Not available |
Sparsed distribution mode | default | default | default |
Sharded distribution mode | sharded as mode | Not available | Not available |
Return all info about matching query | 1 as query (0 by default) | Enabled by default | Enabled by default |
Skip invalid JSON | 1 as skip_bad_json (0 by default) | Not available | Not available |
Extended info in SHOW META | 1 as verbose (0 by default) | Not available | Not available |
Define the number which will be added to document ids if no docs_id fields provided (mostly relevant in distributed PQ modes) | 1 as shift (0 by default) | Not available | Not available |
To demonstrate how this works, here are a few examples. Let’s create a PQ table with two fields:
and three rules in it:
@title bag
@title shoes
. Filters: color='red'
@title shoes
. Filters: color IN('blue', 'green')
CREATE TABLE products(title text, color string) type='pq';
INSERT INTO products(query) values('@title bag');
INSERT INTO products(query,filters) values('@title shoes', 'color=\'red\'');
INSERT INTO products(query,filters) values('@title shoes', 'color in (\'blue\', \'green\')');
select * from products;
+---------------------+--------------+------+---------------------------+
id | query | tags | filters |
| +---------------------+--------------+------+---------------------------+
1657852401006149635 | @title shoes | | color IN ('blue, 'green') |
| | 1657852401006149636 | @title shoes | | color='red' |
| 1657852401006149637 | @title bag | | |
+---------------------+--------------+------+---------------------------+
PUT /pq/products/doc/
{
"query": {
"match": {
"title": "bag"
}
},
"filters": ""
}
PUT /pq/products/doc/
{
"query": {
"match": {
"title": "shoes"
}
},
"filters": "color='red'"
}
PUT /pq/products/doc/
{
"query": {
"match": {
"title": "shoes"
}
},
"filters": "color IN ('blue', 'green')"
}
{
"index": "products",
"type": "doc",
"_id": "1657852401006149661",
"result": "created"
}
{
"index": "products",
"type": "doc",
"_id": "1657852401006149662",
"result": "created"
}
{
"index": "products",
"type": "doc",
"_id": "1657852401006149663",
"result": "created"
}
$index = [
'index' => 'products',
'body' => [
'columns' => [
'title' => ['type' => 'text'],
'color' => ['type' => 'string']
],
'settings' => [
'type' => 'pq'
]
]
];
$client->indices()->create($index);
$query = [
'index' => 'products',
'body' => [ 'query'=>['match'=>['title'=>'bag']]]
];
$client->pq()->doc($query);
$query = [
'index' => 'products',
'body' => [ 'query'=>['match'=>['title'=>'shoes']],'filters'=>"color='red'"]
];
$client->pq()->doc($query);
$query = [
'index' => 'products',
'body' => [ 'query'=>['match'=>['title'=>'shoes']],'filters'=>"color IN ('blue', 'green')"]
];
$client->pq()->doc($query);
Array(
[index] => products
[type] => doc
[_id] => 1657852401006149661
[result] => created
)
Array(
[index] => products
[type] => doc
[_id] => 1657852401006149662
[result] => created
)
Array(
[index] => products
[type] => doc
[_id] => 1657852401006149663
[result] => created
)
Python
'create table products(title text, color string) type=\'pq\'')
utilsApi.sql("index" : "products", "doc" : {"query" : "@title bag" }})
indexApi.insert({"index" : "products", "doc" : {"query" : "@title shoes", "filters": "color='red'" }})
indexApi.insert({"index" : "products", "doc" : {"query" : "@title shoes","filters": "color IN ('blue', 'green')" }}) indexApi.insert({
'created': True,
{'found': None,
'id': 0,
'index': 'products',
'result': 'created'}
'created': True,
{'found': None,
'id': 0,
'index': 'products',
'result': 'created'}
'created': True,
{'found': None,
'id': 0,
'index': 'products',
'result': 'created'}
javascript
= await utilsApi.sql('create table products(title text, color string) type=\'pq\'');
res = indexApi.insert({"index" : "products", "doc" : {"query" : "@title bag" }});
res = indexApi.insert({"index" : "products", "doc" : {"query" : "@title shoes", "filters": "color='red'" }});
res = indexApi.insert({"index" : "products", "doc" : {"query" : "@title shoes","filters": "color IN ('blue', 'green')" }}); res
"_index":"products","_id":0,"created":true,"result":"created"}
"_index":"products","_id":0,"created":true,"result":"created"}
{"_index":"products","_id":0,"created":true,"result":"created"} {
java
sql("create table products(title text, color string) type='pq'");
utilsApi.new HashMap<String,Object>(){{
doc = put("query", "@title bag");
}};new InsertDocumentRequest();
newdoc = index("products").setDoc(doc);
newdoc.insert(newdoc);
indexApi.
new HashMap<String,Object>(){{
doc = put("query", "@title shoes");
put("filters", "color='red'");
}};new InsertDocumentRequest();
newdoc = index("products").setDoc(doc);
newdoc.insert(newdoc);
indexApi.
new HashMap<String,Object>(){{
doc = put("query", "@title shoes");
put("filters", "color IN ('blue', 'green')");
}};new InsertDocumentRequest();
newdoc = index("products").setDoc(doc);
newdoc.insert(newdoc); indexApi.
0, error=, warning=}
{total=class SuccessResponse {
index: products0
id: true
created:
result: creatednull
found:
}class SuccessResponse {
index: products0
id: true
created:
result: creatednull
found:
}class SuccessResponse {
index: products0
id: true
created:
result: creatednull
found: }
The first document doesn’t match any rules. It could match the first two, but they require additional filters.
The second document matches one rule. Note that CALL PQ by default expects a document to be a JSON, but if you use 0 as docs_json
, you can pass a plain string instead.
SQL:
CALL PQ('products', 'Beautiful shoes', 0 as docs_json);
CALL PQ('products', 'What a nice bag', 0 as docs_json);
CALL PQ('products', '{"title": "What a nice bag"}');
+---------------------+
id |
| +---------------------+
1657852401006149637 |
| +---------------------+
+---------------------+
id |
| +---------------------+
1657852401006149637 |
| +---------------------+
JSON:
POST /pq/products/_search
{
"query": {
"percolate": {
"document": {
"title": "What a nice bag"
}
}
}
}
{
"took": 0,
"timed_out": false,
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "products",
"_type": "doc",
"_id": "1657852401006149644",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
}
},
"fields": {
"_percolator_document_slot": [
1
]
}
}
]
}
}
PHP:
$percolate = [
'index' => 'products',
'body' => [
'query' => [
'percolate' => [
'document' => [
'title' => 'What a nice bag'
]
]
]
]
];
$client->pq()->search($percolate);
Array
(
[took] => 0
[timed_out] =>
[hits] => Array
(
[total] => 1
[max_score] => 1
[hits] => Array
(
[0] => Array
(
[_index] => products
[_type] => doc
[_id] => 1657852401006149644
[_score] => 1
[_source] => Array
(
[query] => Array
(
[match] => Array
(
[title] => bag
)
)
)
[fields] => Array
(
[_percolator_document_slot] => Array
(
[0] => 1
)
)
)
)
)
)
Python
'products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}}) searchApi.percolate(
'hits': {'hits': [{u'_id': u'2811025403043381480',
{u'_index': u'products',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'@title bag'}},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}}],
'total': 1},
'profile': None,
'timed_out': False,
'took': 0}
javascript
= await searchApi.percolate('products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}}); res
{"took": 0,
"timed_out": false,
"hits": {
"total": 1,
"hits": [
{"_index": "products",
"_type": "doc",
"_id": "2811045522851233808",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
},
}"fields": {
"_percolator_document_slot": [
1
]
}
}
]
} }
java
new PercolateRequest();
PercolateRequest percolateRequest = new HashMap<String,Object>(){{
query = put("percolate",new HashMap<String,Object >(){{
put("document", new HashMap<String,Object >(){{
put("title","what a nice bag");
}});
}});
}};query(query);
percolateRequest.percolate("test_pq",percolateRequest); searchApi.
class SearchResponse {
0
took: false
timedOut: class SearchResponseHits {
hits: 1
total: 1
maxScore: 2811045522851234109, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[1]}}]
hits: [{_index=products, _type=doc, _id=null
aggregations:
}null
profile: }
SQL:
CALL PQ('products', '{"title": "What a nice bag"}', 1 as query);
+---------------------+------------+------+---------+
id | query | tags | filters |
| +---------------------+------------+------+---------+
1657852401006149637 | @title bag | | |
| +---------------------+------------+------+---------+
JSON:
POST /pq/products/_search
{
"query": {
"percolate": {
"document": {
"title": "What a nice bag"
}
}
}
}
{
"took": 0,
"timed_out": false,
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "products",
"_type": "doc",
"_id": "1657852401006149644",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
}
},
"fields": {
"_percolator_document_slot": [
1
]
}
}
]
}
}
PHP:
$percolate = [
'index' => 'products',
'body' => [
'query' => [
'percolate' => [
'document' => [
'title' => 'What a nice bag'
]
]
]
]
];
$client->pq()->search($percolate);
Array
(
[took] => 0
[timed_out] =>
[hits] => Array
(
[total] => 1
[max_score] => 1
[hits] => Array
(
[0] => Array
(
[_index] => products
[_type] => doc
[_id] => 1657852401006149644
[_score] => 1
[_source] => Array
(
[query] => Array
(
[match] => Array
(
[title] => bag
)
)
)
[fields] => Array
(
[_percolator_document_slot] => Array
(
[0] => 1
)
)
)
)
)
)
Python
'products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}}) searchApi.percolate(
'hits': {'hits': [{u'_id': u'2811025403043381480',
{u'_index': u'products',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'@title bag'}},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}}],
'total': 1},
'profile': None,
'timed_out': False,
'took': 0}
javascript
= await searchApi.percolate('products',{"query":{"percolate":{"document":{"title":"What a nice bag"}}}}); res
{"took": 0,
"timed_out": false,
"hits": {
"total": 1,
"hits": [
{"_index": "products",
"_type": "doc",
"_id": "2811045522851233808",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
},
}"fields": {
"_percolator_document_slot": [
1
]
}
}
]
} }
java
new PercolateRequest();
PercolateRequest percolateRequest = new HashMap<String,Object>(){{
query = put("percolate",new HashMap<String,Object >(){{
put("document", new HashMap<String,Object >(){{
put("title","what a nice bag");
}});
}});
}};query(query);
percolateRequest.percolate("test_pq",percolateRequest); searchApi.
class SearchResponse {
0
took: false
timedOut: class SearchResponseHits {
hits: 1
total: 1
maxScore: 2811045522851234109, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[1]}}]
hits: [{_index=products, _type=doc, _id=null
aggregations:
}null
profile: }
Note that with CALL PQ
, you can provide multiple documents in different ways:
('doc1', 'doc2')
. This requires 0 as docs_json
('{doc1}', '{doc2}')
'[{doc1}, {doc2}]'
SQL:
CALL PQ('products', ('nice pair of shoes', 'beautiful bag'), 1 as query, 0 as docs_json);
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "red"}', '{"title": "beautiful bag"}'), 1 as query);
CALL PQ('products', '[{"title": "nice pair of shoes", "color": "blue"}, {"title": "beautiful bag"}]', 1 as query);
+---------------------+------------+------+---------+
id | query | tags | filters |
| +---------------------+------------+------+---------+
1657852401006149637 | @title bag | | |
| +---------------------+------------+------+---------+
+---------------------+--------------+------+-------------+
id | query | tags | filters |
| +---------------------+--------------+------+-------------+
1657852401006149636 | @title shoes | | color='red' |
| 1657852401006149637 | @title bag | | |
| +---------------------+--------------+------+-------------+
+---------------------+--------------+------+---------------------------+
id | query | tags | filters |
| +---------------------+--------------+------+---------------------------+
1657852401006149635 | @title shoes | | color IN ('blue, 'green') |
| | 1657852401006149637 | @title bag | | |
+---------------------+--------------+------+---------------------------+
JSON:
POST /pq/products/_search
{
"query": {
"percolate": {
"documents": [
{"title": "nice pair of shoes", "color": "blue"},
{"title": "beautiful bag"}
]
}
}
}
{
"took": 0,
"timed_out": false,
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "products",
"_type": "doc",
"_id": "1657852401006149644",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
}
},
"fields": {
"_percolator_document_slot": [
2
]
}
},
{
"_index": "products",
"_type": "doc",
"_id": "1657852401006149646",
"_score": "1",
"_source": {
"query": {
"ql": "@title shoes"
}
},
"fields": {
"_percolator_document_slot": [
1
]
}
}
]
}
}
PHP:
$percolate = [
'index' => 'products',
'body' => [
'query' => [
'percolate' => [
'documents' => [
['title' => 'nice pair of shoes','color'=>'blue'],
['title' => 'beautiful bag']
]
]
]
]
];
$client->pq()->search($percolate);
Array
(
[took] => 23
[timed_out] =>
[hits] => Array
(
[total] => 2
[max_score] => 1
[hits] => Array
(
[0] => Array
(
[_index] => products
[_type] => doc
[_id] => 2810781492890828819
[_score] => 1
[_source] => Array
(
[query] => Array
(
[match] => Array
(
[title] => bag
)
)
)
[fields] => Array
(
[_percolator_document_slot] => Array
(
[0] => 2
)
)
)
[1] => Array
(
[_index] => products
[_type] => doc
[_id] => 2810781492890828821
[_score] => 1
[_source] => Array
(
[query] => Array
(
[match] => Array
(
[title] => shoes
)
)
)
[fields] => Array
(
[_percolator_document_slot] => Array
(
[0] => 1
)
)
)
)
)
)
Python
'products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}}) searchApi.percolate(
'hits': {'hits': [{u'_id': u'2811025403043381494',
{u'_index': u'products',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'@title bag'}},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [2]}},
u'_id': u'2811025403043381496',
{u'_index': u'products',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'@title shoes'}},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}}],
'total': 2},
'profile': None,
'timed_out': False,
'took': 0}
javascript
= await searchApi.percolate('products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}}); res
{"took": 6,
"timed_out": false,
"hits": {
"total": 2,
"hits": [
{"_index": "products",
"_type": "doc",
"_id": "2811045522851233808",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
},
}"fields": {
"_percolator_document_slot": [
2
]
},
}
{"_index": "products",
"_type": "doc",
"_id": "2811045522851233810",
"_score": "1",
"_source": {
"query": {
"ql": "@title shoes"
},
}"fields": {
"_percolator_document_slot": [
1
]
}
}
]
} }
java
new PercolateRequest();
percolateRequest = new HashMap<String,Object>(){{
query = put("percolate",new HashMap<String,Object >(){{
put("documents", new ArrayList<Object>(){{
add(new HashMap<String,Object >(){{
put("title","nice pair of shoes");
put("color","blue");
}});add(new HashMap<String,Object >(){{
put("title","beautiful bag");
}});
}});
}});
}};query(query);
percolateRequest.percolate("products",percolateRequest); searchApi.
class SearchResponse {
0
took: false
timedOut: class SearchResponseHits {
hits: 2
total: 1
maxScore: 2811045522851234133, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[2]}}, {_index=products, _type=doc, _id=2811045522851234135, _score=1, _source={query={ql=@title shoes}}, fields={_percolator_document_slot=[1]}}]
hits: [{_index=products, _type=doc, _id=null
aggregations:
}null
profile: }
Using the option 1 as docs
allows you to see which documents of the provided ones match which rules. SQL:
CALL PQ('products', '[{"title": "nice pair of shoes", "color": "blue"}, {"title": "beautiful bag"}]', 1 as query, 1 as docs);
+---------------------+-----------+--------------+------+---------------------------+
id | documents | query | tags | filters |
| +---------------------+-----------+--------------+------+---------------------------+
1657852401006149635 | 1 | @title shoes | | color IN ('blue, 'green') |
| | 1657852401006149637 | 2 | @title bag | | |
+---------------------+-----------+--------------+------+---------------------------+
JSON:
POST /pq/products/_search
{
"query": {
"percolate": {
"documents": [
{"title": "nice pair of shoes", "color": "blue"},
{"title": "beautiful bag"}
]
}
}
}
{
"took": 0,
"timed_out": false,
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "products",
"_type": "doc",
"_id": "1657852401006149644",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
}
},
"fields": {
"_percolator_document_slot": [
2
]
}
},
{
"_index": "products",
"_type": "doc",
"_id": "1657852401006149646",
"_score": "1",
"_source": {
"query": {
"ql": "@title shoes"
}
},
"fields": {
"_percolator_document_slot": [
1
]
}
}
]
}
}
PHP:
$percolate = [
'index' => 'products',
'body' => [
'query' => [
'percolate' => [
'documents' => [
['title' => 'nice pair of shoes','color'=>'blue'],
['title' => 'beautiful bag']
]
]
]
]
];
$client->pq()->search($percolate);
Array
(
[took] => 23
[timed_out] =>
[hits] => Array
(
[total] => 2
[max_score] => 1
[hits] => Array
(
[0] => Array
(
[_index] => products
[_type] => doc
[_id] => 2810781492890828819
[_score] => 1
[_source] => Array
(
[query] => Array
(
[match] => Array
(
[title] => bag
)
)
)
[fields] => Array
(
[_percolator_document_slot] => Array
(
[0] => 2
)
)
)
[1] => Array
(
[_index] => products
[_type] => doc
[_id] => 2810781492890828821
[_score] => 1
[_source] => Array
(
[query] => Array
(
[match] => Array
(
[title] => shoes
)
)
)
[fields] => Array
(
[_percolator_document_slot] => Array
(
[0] => 1
)
)
)
)
)
)
Python
'products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}}) searchApi.percolate(
'hits': {'hits': [{u'_id': u'2811025403043381494',
{u'_index': u'products',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'@title bag'}},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [2]}},
u'_id': u'2811025403043381496',
{u'_index': u'products',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'@title shoes'}},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}}],
'total': 2},
'profile': None,
'timed_out': False,
'took': 0}
javascript
= await searchApi.percolate('products',{"query":{"percolate":{"documents":[{"title":"nice pair of shoes","color":"blue"},{"title":"beautiful bag"}]}}}); res
{"took": 6,
"timed_out": false,
"hits": {
"total": 2,
"hits": [
{"_index": "products",
"_type": "doc",
"_id": "2811045522851233808",
"_score": "1",
"_source": {
"query": {
"ql": "@title bag"
},
}"fields": {
"_percolator_document_slot": [
2
]
},
}
{"_index": "products",
"_type": "doc",
"_id": "2811045522851233810",
"_score": "1",
"_source": {
"query": {
"ql": "@title shoes"
},
}"fields": {
"_percolator_document_slot": [
1
]
}
}
]
} }
java
new PercolateRequest();
percolateRequest = new HashMap<String,Object>(){{
query = put("percolate",new HashMap<String,Object >(){{
put("documents", new ArrayList<Object>(){{
add(new HashMap<String,Object >(){{
put("title","nice pair of shoes");
put("color","blue");
}});add(new HashMap<String,Object >(){{
put("title","beautiful bag");
}});
}});
}});
}};query(query);
percolateRequest.percolate("products",percolateRequest); searchApi.
class SearchResponse {
0
took: false
timedOut: class SearchResponseHits {
hits: 2
total: 1
maxScore: 2811045522851234133, _score=1, _source={query={ql=@title bag}}, fields={_percolator_document_slot=[2]}}, {_index=products, _type=doc, _id=2811045522851234135, _score=1, _source={query={ql=@title shoes}}, fields={_percolator_document_slot=[1]}}]
hits: [{_index=products, _type=doc, _id=null
aggregations:
}null
profile: }
By default, matching document ids correspond to their relative numbers in the list you provide. However, in some cases, each document already has its own id. For this case, there’s an option 'id field name' as docs_id
for CALL PQ
.
Note that if the id cannot be found by the provided field name, the PQ rule will not be shown in the results.
This option is only available for CALL PQ
via SQL.
CALL PQ('products', '[{"id": 123, "title": "nice pair of shoes", "color": "blue"}, {"id": 456, "title": "beautiful bag"}]', 1 as query, 'id' as docs_id, 1 as docs);
+---------------------+-----------+--------------+------+---------------------------+
id | documents | query | tags | filters |
| +---------------------+-----------+--------------+------+---------------------------+
1657852401006149664 | 456 | @title bag | | |
| 1657852401006149666 | 123 | @title shoes | | color IN ('blue, 'green') |
| +---------------------+-----------+--------------+------+---------------------------+
When using CALL PQ with separate JSONs, you can use the option 1 as skip_bad_json to skip any invalid JSONs in the input. In the example below, the 2nd query fails due to an invalid JSON, but the 3rd query avoids the error by using 1 as skip_bad_json. Keep in mind that this option is not available when sending JSON queries over HTTP, as the whole JSON query must be valid in that case.
SQL:
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'));
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag}'));
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag}'), 1 as skip_bad_json);
+---------------------+
id |
| +---------------------+
1657852401006149635 |
| 1657852401006149637 |
| +---------------------+
1064 (42000): Bad JSON objects in strings: 2
ERROR
+---------------------+
id |
| +---------------------+
1657852401006149635 |
| +---------------------+
Percolate queries are designed with high throughput and large data volumes in mind. To optimize performance for lower latency and higher throughput, consider the following.
There are two modes of distribution for a percolate table and how a percolate query can work against it:
Assume you have table pq_d2
defined as:
table pq_d2
{
type = distributed
agent = 127.0.0.1:6712:pq
agent = 127.0.0.1:6712:ptitle
}
Each of ‘pq’ and ‘ptitle’ contains:
SELECT * FROM pq;
+------+-------------+------+-------------------+
id | query | tags | filters |
| +------+-------------+------+-------------------+
1 | filter test | | gid>=10 |
| 2 | angry | | gid>=10 OR gid<=3 |
| +------+-------------+------+-------------------+
2 rows in set (0.01 sec)
POST /pq/pq/_search
{
"took":0,
"timed_out":false,
"hits":{
"total":2,
"hits":[
{
"_id":"1",
"_score":1,
"_source":{
"query":{ "ql":"filter test" },
"tags":"",
"filters":"gid>=10"
}
},
{
"_id":"2",
"_score":1,
"_source":{
"query":{"ql":"angry"},
"tags":"",
"filters":"gid>=10 OR gid<=3"
}
}
]
}
}
$params = [
'index' => 'pq',
'body' => [
]
];
$response = $client->pq()->search($params);
(
[took] => 0
[timed_out] =>
[hits] =>
(
[total] => 2
[hits] =>
(
[0] =>
(
[_id] => 1
[_score] => 1
[_source] =>
(
[query] =>
(
[ql] => filter test
)
[tags] =>
[filters] => gid>=10
)
),
[1] =>
(
[_id] => 1
[_score] => 1
[_source] =>
(
[query] =>
(
[ql] => angry
)
[tags] =>
[filters] => gid>=10 OR gid<=3
)
)
)
)
)
Python
"index":"pq","query":{"match_all":{}}}) searchApi.search({
'hits': {'hits': [{u'_id': u'2811025403043381501',
{u'_score': 1,
u'_source': {u'filters': u"gid>=10",
u'query': u'filter test',
u'tags': u''}},
u'_id': u'2811025403043381502',
{u'_score': 1,
u'_source': {u'filters': u"gid>=10 OR gid<=3",
u'query': u'angry',
u'tags': u''}}],
'total': 2},
'profile': None,
'timed_out': False,
'took': 0}
javascript
= await searchApi.search({"index":"pq","query":{"match_all":{}}}); res
'hits': {'hits': [{u'_id': u'2811025403043381501',
{'_score': 1,
u'_source': {u'filters': u"gid>=10",
u'query': u'filter test',
u'tags': u''}},
u'_id': u'2811025403043381502',
{u'_score': 1,
u'_source': {u'filters': u"gid>=10 OR gid<=3",
u'query': u'angry',
u'tags': u''}}],
u'total': 2},
'profile': None,
'timed_out': False,
'took': 0}
javascript
= await searchApi.search({"index":"pq","query":{"match_all":{}}}); res
"hits": {"hits": [{"_id": "2811025403043381501",
{"_score": 1,
"_source": {"filters": u"gid>=10",
"query": "filter test",
"tags": ""}},
"_id": "2811025403043381502",
{"_score": 1,
"_source": {"filters": u"gid>=10 OR gid<=3",
"query": "angry",
"tags": ""}}],
"total": 2},
"timed_out": false,
"took": 0}
java
Map<String,Object> query = new HashMap<String,Object>();
put("match_all",null);
query.new SearchRequest();
SearchRequest searchRequest = setIndex("pq");
searchRequest.setQuery(query);
searchRequest.search(searchRequest); SearchResponse searchResponse = searchApi.
class SearchResponse {
0
took: false
timedOut: class SearchResponseHits {
hits: 2
total: null
maxScore: 2811045522851233962, _score=1, _source={filters=gid>=10, query=filter test, tags=}}, {_id=2811045522851233951, _score=1, _source={filters=gid>=10 OR gid<=3, query=angry,tags=}}]
hits: [{_id=null
aggregations:
}null
profile: }
And you execute CALL PQ
on the distributed table with a couple of documents.
CALL PQ ('pq_d2', ('{"title":"angry test", "gid":3 }', '{"title":"filter test doc2", "gid":13}'), 1 AS docs);
+------+-----------+
id | documents |
| +------+-----------+
1 | 2 |
| 2 | 1 |
| +------+-----------+
POST /pq/pq/_search -d '
"query":
{
"percolate":
{
"documents" : [
{ "title": "angry test", "gid": 3 },
{ "title": "filter test doc2", "gid": 13 }
]
}
}
'
{
"took":0,
"timed_out":false,
"hits":{
"total":2,"hits":[
{
"_id":"2",
"_score":1,
"_source":{
"query":{"title":"angry"},
"tags":"",
"filters":"gid>=10 OR gid<=3"
}
}
{
"_id":"1",
"_score":1,
"_source":{
"query":{"ql":"filter test"},
"tags":"",
"filters":"gid>=10"
}
},
]
}
}
$params = [
'index' => 'pq',
'body' => [
'query' => [
'percolate' => [
'documents' => [
[
'title'=>'angry test',
'gid' => 3
],
[
'title'=>'filter test doc2',
'gid' => 13
],
]
]
]
]
];
$response = $client->pq()->search($params);
(
[took] => 0
[timed_out] =>
[hits] =>
(
[total] => 2
[hits] =>
(
[0] =>
(
[_index] => pq
[_type] => doc
[_id] => 2
[_score] => 1
[_source] =>
(
[query] =>
(
[ql] => angry
)
[tags] =>
[filters] => gid>=10 OR gid<=3
),
[fields] =>
(
[_percolator_document_slot] =>
(
[0] => 1
)
)
),
[1] =>
(
[_index] => pq
[_id] => 1
[_score] => 1
[_source] =>
(
[query] =>
(
[ql] => filter test
)
[tags] =>
[filters] => gid>=10
)
[fields] =>
(
[_percolator_document_slot] =>
(
[0] => 0
)
)
)
)
)
)
Python
'pq',{"percolate":{"documents":[{"title":"angry test","gid":3},{"title":"filter test doc2","gid":13}]}}) searchApi.percolate(
'hits': {'hits': [{u'_id': u'2811025403043381480',
{u'_index': u'pq',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'angry'},u'tags':u'',u'filters':u"gid>=10 OR gid<=3"},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}},
u'_id': u'2811025403043381501',
{u'_index': u'pq',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'filter test'},u'tags':u'',u'filters':u"gid>=10"},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}}],
'total': 2},
'profile': None,
'timed_out': False,
'took': 0}
javascript
= await searchApi.percolate('pq',{"percolate":{"documents":[{"title":"angry test","gid":3},{"title":"filter test doc2","gid":13}]}}); res
'hits': {'hits': [{u'_id': u'2811025403043381480',
{'_index': u'pq',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'angry'},u'tags':u'',u'filters':u"gid>=10 OR gid<=3"},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}},
u'_id': u'2811025403043381501',
{u'_index': u'pq',
u'_score': u'1',
u'_source': {u'query': {u'ql': u'filter test'},u'tags':u'',u'filters':u"gid>=10"},
u'_type': u'doc',
u'fields': {u'_percolator_document_slot': [1]}}],
u'total': 2},
'profile': None,
'timed_out': False,
'took': 0}
java
new PercolateRequest();
percolateRequest = new HashMap<String,Object>(){{
query = put("percolate",new HashMap<String,Object >(){{
put("documents", new ArrayList<Object>(){{
add(new HashMap<String,Object >(){{
put("title","angry test");
put("gid",3);
}});add(new HashMap<String,Object >(){{
put("title","filter test doc2");
put("gid",13);
}});
}});
}});
}};query(query);
percolateRequest.percolate("pq",percolateRequest); searchApi.
class SearchResponse {
10
took: false
timedOut: class SearchResponseHits {
hits: 2
total: 1
maxScore: 2811045522851234165, _score=1, _source={query={ql=@title angry}}, fields={_percolator_document_slot=[1]}}, {_index=pq, _type=doc, _id=2811045522851234166, _score=1, _source={query={ql=@title filter test doc2}}, fields={_percolator_document_slot=[2]}}]
hits: [{_index=pq, _type=doc, _id=null
aggregations:
}null
profile: }
In the previous example, we used the default sparse mode. To demonstrate the sharded mode, let’s create a distributed PQ table consisting of 2 local PQ tables and add 2 documents to “products1” and 1 document to “products2”:
create table products1(title text, color string) type='pq';
create table products2(title text, color string) type='pq';
create table products_distributed type='distributed' local='products1' local='products2';
INSERT INTO products1(query) values('@title bag');
INSERT INTO products1(query,filters) values('@title shoes', 'color=\'red\'');
INSERT INTO products2(query,filters) values('@title shoes', 'color in (\'blue\', \'green\')');
Now, if you add 'sharded' as mode
to CALL PQ
, it will send the documents to all the agent’s tables (in this case, just local tables, but they can be remote to utilize external hardware). This mode is not available via the JSON interface.
SQL:
CALL PQ('products_distributed', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 'sharded' as mode, 1 as query);
+---------------------+--------------+------+---------------------------+
id | query | tags | filters |
| +---------------------+--------------+------+---------------------------+
1657852401006149639 | @title bag | | |
| 1657852401006149643 | @title shoes | | color IN ('blue, 'green') |
| +---------------------+--------------+------+---------------------------+
Note that the syntax of agent mirrors in the configuration (when several hosts are assigned to one agent
line, separated with |
) has nothing to do with the CALL PQ
query mode. Each agent
always represents one node, regardless of the number of HA mirrors specified for that agent.
In some cases, you might want to get more details about the performance of a percolate query. For that purpose, there is the option 1 as verbose
, which is only available via SQL and allows you to save more performance metrics. You can see them using the SHOW META
query, which you can run after CALL PQ
. See SHOW META for more info.
1 as verbose:
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 1 as verbose); show meta;
+---------------------+
id |
| +---------------------+
1657852401006149644 |
| 1657852401006149646 |
| +---------------------+
+-------------------------+-----------+
Value |
| Name | +-------------------------+-----------+
0.000 sec |
| Total | 0.000 sec |
| Setup | 2 |
| Queries matched | 0 |
| Queries failed | 2 |
| Document matched | 3 |
| Total queries stored | only queries | 3 |
| Term Fast rejected queries | 0 |
| Time per query | 27, 10 |
| Time of matched queries | 37 |
| +-------------------------+-----------+
0 as verbose (default):
CALL PQ('products', ('{"title": "nice pair of shoes", "color": "blue"}', '{"title": "beautiful bag"}'), 0 as verbose); show meta;
+---------------------+
id |
| +---------------------+
1657852401006149644 |
| 1657852401006149646 |
| +---------------------+
+-----------------------+-----------+
Value |
| Name | +-----------------------+-----------+
0.000 sec |
| Total | 2 |
| Queries matched | 0 |
| Queries failed | 2 |
| Document matched | 3 |
| Total queries stored | only queries | 3 |
| Term Fast rejected queries | 0 |
| +-----------------------+-----------+