What's the $unwind operator in MongoDB?

First off, welcome to MongoDB!

The thing to remember is that MongoDB employs an "NoSQL" approach to data storage, so perish the thoughts of selects, joins, etc. from your mind. The way that it stores your data is in the form of documents and collections, which allows for a dynamic means of adding and obtaining the data from your storage locations.

That being said, in order to understand the concept behind the $unwind parameter, you first must understand what the use case that you are trying to quote is saying. The example document from mongodb.org is as follows:

{
 title : "this is my title" ,
 author : "bob" ,
 posted : new Date () ,
 pageViews : 5 ,
 tags : [ "fun" , "good" , "fun" ] ,
 comments : [
             { author :"joe" , text : "this is cool" } ,
             { author :"sam" , text : "this is bad" }
 ],
 other : { foo : 5 }
}

Notice how tags is actually an array of 3 items, in this case being "fun", "good" and "fun".

What $unwind does is allow you to peel off a document for each element and returns that resulting document. To think of this in a classical approach, it would be the equivilent of "for each item in the tags array, return a document with only that item".

Thus, the result of running the following:

db.article.aggregate(
    { $project : {
        author : 1 ,
        title : 1 ,
        tags : 1
    }},
    { $unwind : "$tags" }
);

would return the following documents:

{
     "result" : [
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "fun"
             },
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "good"
             },
             {
                     "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                     "title" : "this is my title",
                     "author" : "bob",
                     "tags" : "fun"
             }
     ],
     "OK" : 1
}

Notice that the only thing changing in the result array is what is being returned in the tags value. If you need an additional reference on how this works, I've included a link here. Hopefully this helps, and good luck with your foray into one of the best NoSQL systems that I have come across thus far.


$unwind duplicates each document in the pipeline, once per array element.

So if your input pipeline contains one article doc with two elements in tags, {$unwind: '$tags'} would transform the pipeline to be two article docs that are the same except for the tags field. In the first doc, tags would contain the first element from the original doc's array, and in the second doc, tags would contain the second element.


Let's understand it by an example

This is how the company document looks like:

original document

The $unwind allows us to take documents as input that have an array valued field and produces output documents, such that there's one output document for each element in the array. source

The $unwind stage

So let's go back to our companies examples, and take a look at the use of unwind stages. This query:


db.companies.aggregate([
    { $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
    { $project: {
        _id: 0,
        name: 1,
        amount: "$funding_rounds.raised_amount",
        year: "$funding_rounds.funded_year"
    } }
])

produces documents that have arrays for both amount and year.

project output

Because we're accessing the raised amount and the funded year for every element within the funding rounds array. To fix this, we can include an unwind stage before our project stage in this aggregation pipeline, and parameterize this by saying that we want to unwind the funding rounds array:


db.companies.aggregate([
    { $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
    { $unwind: "$funding_rounds" },
    { $project: {
        _id: 0,
        name: 1,
        amount: "$funding_rounds.raised_amount",
        year: "$funding_rounds.funded_year"
    } }
])

unwind has the effect of outputting to the next stage more documents than it receives as input

If we look at the funding_rounds array, we know that for each funding_rounds, there is a raised_amount and a funded_year field. So, unwind will for each one of the documents that are elements of the funding_rounds array produce an output document. Now, in this example, our values are strings. But, regardless of the type of value for the elements in an array, unwind will produce an output document for each one of these values, such that the field in question will have just that element. In the case of funding_rounds, that element will be one of these documents as the value for funding_rounds for every document that gets passed on to our project stage. The result, then of having run this, is that now we get an amount and a year. One for each funding round for every company in our collection. What this means is that our match produced many company documents and each one of those company documents results in many documents. One for each funding round within every company document. unwind performs this operation using the documents handed to it from the match stage. And all of these documents for every company are then passed to the project stage.

unwind output

So, all documents where the funder was Greylock (as in the query example) will be split into a number of documents, equal to the number of funding rounds for every company that matches the filter $match: {"funding_rounds.investments.financial_org.permalink": "greylock" }. And each one those resulting documents will then be passed along to our project. Now, unwind produces an exact copy for every one of the documents that it receives as input. All fields have the same key and value, with one exception, and that is that the funding_rounds field rather than being an array of funding_rounds documents, instead has a value that is a single document, which is an individual funding round. So, a company that has 4 funding rounds will result in unwind creating 4 documents. Where every field is an exact copy, except for the funding_rounds field, which will instead of being an array for each of those copies will instead be an individual element from the funding_rounds array from the company document that unwind is currently processing. So, unwind has the effect of outputting to the next stage more documents than it receives as input. What that means is that our project stage now gets a funding_rounds field that again, is not an array, but is instead a nested document that has a raised_amount and a funded_year field. So, project will receive multiple documents for each company matching the filter and can therefore process each of the documents individually and identify an individual amount and year for each funding round for each company.