Is shortening MongoDB property names worthwhile?

Keep the priority for meaningful names above the priority for short names unless your own situation and testing provides a specific reason to alter those priorities.

As mentioned in the comments of SERVER-863, if you're using MongoDB 3.0+ with the WiredTiger storage option with snappy compression enabled, long field names become even less of an issue as the compression effectively takes care of the shortening for you.

I performed a little benchmark, I uploaded 252 rows of data from an Excel into two collections testShortNames and testLongNames as follows:

Long Names:

{
    "_id": ObjectId("6007a81ea42c4818e5408e9c"),
    "countryNameMaster": "Andorra",
    "countryCapitalNameMaster": "Andorra la Vella",
    "areaInSquareKilometers": 468,
    "countryPopulationNumber": NumberInt("77006"),
    "continentAbbreviationCode": "EU",
    "currencyNameMaster": "Euro"
}

Short Names:

{
    "_id": ObjectId("6007a81fa42c4818e5408e9d"),
    "name": "Andorra",
    "capital": "Andorra la Vella",
    "area": 468,
    "pop": NumberInt("77006"),
    "continent": "EU",
    "currency": "Euro"
}

I then got the stats for each, saved in disk files, then did a "diff" on the two files:

pprint.pprint(db.command("collstats", dbCollectionNameLongNames))

The image below shows two variables of interest: size and storageSize. My reading showed that storageSize is the amount of disk space used after compression, and basically size is the uncompressed size. So we see the storageSize is identical. Apparently the Wired Tiger engine compresses fieldnames quite well. enter image description here

I then ran a program to retrieve all data from each collection, and checked the response time.

Even though it was a sub-second query, the long names consistently took about 7 times longer. It of course will take longer to send the longer names across from the database server to the client program.

-------LongNames-------
Server Start DateTime=2021-01-20 08:44:38
Server End   DateTime=2021-01-20 08:44:39
StartTimeMs= 606964546  EndTimeM= 606965328
ElapsedTime MilliSeconds= 782
-------ShortNames-------
Server Start DateTime=2021-01-20 08:44:39
Server End   DateTime=2021-01-20 08:44:39
StartTimeMs= 606965328  EndTimeM= 606965421
ElapsedTime MilliSeconds= 93

In Python, I just did the following (I had to actually loop through the items to force the reads, otherwise the query returns only the cursor):

results = dbCollectionLongNames.find(query)
for result in results:
    pass

To quote Donald Knuth:

Premature optimization is the root of all evil (or at least most of it) in programming.

Build your application however seems most sensible, maintainable and logical. Then, if you have performance or storage issues, deal with those that have the greatest impact until either performance is satisfactory or the law of diminishing returns means there's no point in optimising further.

If you are uncertain of the impact of particular design decisions (like long property names), create a prototype to test various hypotheses (like "will shorter property names save much space"). Don't expect the outcome of testing to be conclusive, however it may teach you things you didn't expect to learn.

Bottom line up: So keep it as compact as it still stays meaningful.

I don't think that this is every truly required to be shortened to one letter names. Anyway you should shorten them as much as possible, and you feel comfortable with it. Lets say you have a users name: {FirstName, MiddleName, LastName} you may be good to go with even name:{first, middle, last}. If you feel comfortable you may be fine with name:{f, m,l}.
You should use short names: As it will consume disk space, memory and thus may somewhat slowdown your application(less objects to hold in memory, slower lookup times due to bigger size and longer query time as seeking over data takes longer).
A good schema documentation may tell the developer that t stands for town and not for title. Depending on your stack you may even be able to hide the developer from working with these short cuts through some helper utils to map it.

Finally I would say that there's no guideline to when and how much you should shorten your schema names. It highly depends on your environment and requirements. But you're good to keep it compact if you can supply a good documentation explaining everything and/or offering utils to ease the life of developers and admins. Anyway admins are likely to interact directly with mongodb, so I guess a good documentation shouldn't be missed.

Is shortening MongoDB property names worthwhile?

Tags:

Database Schema

Mongodb

Related

Recent Posts