Managing a child document
In the previous recipe we have seen how it's possible to manage relations between objects with the nested object type. The disadvantage of nested objects is their dependence to their parent. If you need to change the value of a nested object, you need to reindex the parent (this brings to a potential performance overhead if the nested objects change too quickly). To solve this problem, ElasticSearch allows defined child documents.
Getting ready
You need a working ElasticSearch cluster.
How to do it...
We can modify the order example indexing the items as separated child documents.
We need to extract the item object and create a new type document item with the _parent
property set.
{ "order": { "properties": { "id": { "type": "string", "store": "yes", "index": "not_analyzed" }, "date": { "type": "date", "store": "no", "index": "not_analyzed" }, "customer_id": { "type": "string", "store": "yes", "index": "not_analyzed" }, "sent": { "type": "boolean", "store": "no", "index": "not_analyzed" } } }, "item": { "_parent": { "type": "order" }, "type": "object", "properties": { "name": { "type": "string", "store": "no", "index": "analyzed" }, "quantity": { "type": "integer", "store": "no", "index": "not_analyzed" }, "vat": { "type": "double", "store": "no", "index": "not_analyzed" } } } }
The above mapping is similar to the ones in the previous recipe. The item object is extracted from the order (in the previous example it was nested) and added as a new mapping. The only difference is that "type": "nested"
has become "type": "object"
(it can be omitted), and the new special field, _parent
that defines the parent/child relation.
How it works...
The child object is a standard root object (document) with an extra _parent
property defined.
The type
property of _parent
refers to the type of the parent document.
The child document must be indexed in the same shard of parent, so that when indexed, an extra parameter must be passed: the parent ID. (We'll see how to do it in the next chapter.)
A child document doesn't require reindexing the parent document when we want to change its values. So it's fast in indexing, reindexing (updating), and deleting.
There's more...
In ElasticSearch we have different ways to manage relations between objects:
- Embedding with type=object: This is explicitly managed by ElasticSearch. It considers embedding as a part of the main document. It's fast, but you need to reindex the main document for changing a value of the embedded object.
- Nesting with type=nested: This allows more accurate search and filtering of parent using nested query on children. Everything works as for embedded object except for query.
- External children documents: In this way, children are external documents, with a
_parent
property to bind them to the parent. They must be indexed in the same shard of the parent. The join with the parent is a bit slower than the nested one, because the nested objects are in the same data block of the parent in the Lucene index and they are loaded with the parent, otherwise, the child documents require more read operations.
Choosing how to model the relation from objects depends on your application scenario.
Also, there is another approach that can be used, but on big data documents, bringing poor performances—its decoupling join relation. You do the join query in two steps: first you collect the ID of the children/other documents and then you search them in a field of their parent.
See also
- The Using has_child query/filter, Using top_children query, and Using has_parent query/filter recipes in Chapter 5, Search, Queries, and Filters, for more details on child/parent queries