Join query in ElasticSearch

It depends what you intend when you say JOIN. Elasticsearch is not like regular database that supports JOIN between tables. It is a text search engine that manages documents within indexes.

On the other hand you can search within the same index over multiple types using a fields that are common to every type.

For example taking your data I can create an index with 2 types and their data like follows:

curl -XPOST localhost:9200/product -d '{
    "settings" : {
        "number_of_shards" : 5
    }
}'

curl -XPOST localhost:9200/product/type1/_mapping -d '{
        "type1" : {
            "properties" : {
                "product_id" : { "type" : "string" },
                "price" : { "type" : "integer" },
                "stock" : { "type" : "integer" }
            }
        }   
}'              

curl -XPOST localhost:9200/product/type2/_mapping -d '{
        "type2" : {
            "properties" : {
                "product_id" : { "type" : "string" },
                "category" : { "type" : "string" },
                "manufacturer" : { "type" : "string" }
            }
        }
}'  

curl -XPOST localhost:9200/product/type1/1 -d '{
        product_id: "1111", 
        price: "23",
        stock: "100"
}'

curl -XPOST localhost:9200/product/type2/1 -d '{
        product_id: "1111",
        category: "iPhone case",
        manufacturer: "Belkin"
}'

I effectively created one index called product with 2 type type1 and type2. Now I can do the following query and it will return both documents:

curl -XGET 'http://localhost:9200/product/_search?pretty=1' -d '{
    "query": {
        "query_string" : {
            "query" : "product_id:1111"
        }
    }
}'

{
  "took" : 95,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.5945348,
    "hits" : [ {
      "_index" : "product",
      "_type" : "type1",
      "_id" : "1",
      "_score" : 0.5945348, "_source" : {
    product_id: "1111",
    price: "23",
    stock: "100"
}
    }, {
      "_index" : "product",
      "_type" : "type2",
      "_id" : "1",
      "_score" : 0.5945348, "_source" : {
    product_id: "1111",
    category: "iPhone case",
    manufacturer: "Belkin"
}
    } ]
  }
}

The reason is because Elasticsearch will search over all documents within that index regardless of their type. This is still different than a JOIN in the sense Elasticsearch is not going to do a Cartesian product of the documents that belong to each type.

Hope that helps


isaac.hazan's answer works quite well, but I would like to add a few points that helped me with this kind of situation:

I landed on this page when I was trying to solve a similar problem, in that I had to exclude multiple records of one index based on documents of another index. The lack of relationships is one of the main downsides of unstructured databases.

The elasticsearch documentation page on Handling Relationships explains a lot.

Four common techniques are used to manage relational data in Elasticsearch:

  • Application-side joins
  • Data denormalization
  • Nested objects
  • Parent/child relationships

Often the final solution will require a mixture of a few of these techniques.

I've used nested objects and application-side joins, mostly. While using the same field name could momentarily solve the problem, I think it is better to rethink and create best-suited mapping for your application.

For instance, you might find that you want to list all products with price greater than x, or list all products that are not in stock anymore. To deal with such scenarios it helps if you are using one of the solutions mentioned above.