DynamoDB - Put item if hash (or hash and range combination) doesn't exist

You can't. All items in DynamoDB are indexed by either their hash or hash+range (depending on your table).

A sort of summary of what is going on so far:

  • A single hash key can have multiple range keys.
  • Every item has both a hash and a range key
  • You are making a PutItem request and must provide both the hash and range
  • You are providing a ConditionExpression with attribute_not_exists on either the hash or range attribute name
  • The attribute_not_exists condition is merely checking if an attribute with that name exists, it doesn't care about the value

Let's walk through an example. Let's start with a hash+range key table with this data:

  1. hash=A,range=1
  2. hash=A,range=2

There are four possible cases:

  1. If you try to put an item with hash=A,range=3 and attribute_not_exists(hash), the PutItem will succeed because attribute_not_exists(hash) evaluates to true. No item exists with key hash=A,range=3 that satisfies the condition of attribute_not_exists(hash).

  2. If you try to put an item with hash=A,range=3 and attribute_not_exists(range), the PutItem will succeed because attribute_not_exists(range) evaluates to true. No item exists with key hash=A,range=3 that satisfies the condition of attribute_not_exists(range).

  3. If you try to put an item with hash=A,range=1 and attribute_not_exists(hash), the PutItem will fail because attribute_not_exists(hash) evaluates to false. An item exists with key hash=A,range=1 that does not satisfy the condition of attribute_not_exists(hash).

  4. If you try to put an item with hash=A,range=1 and attribute_not_exists(range), the PutItem will fail because attribute_not_exists(range) evaluates to false. An item exists with key hash=A,range=1 that does not satisfy the condition of attribute_not_exists(range).

This means that one of two things will happen:

  1. The hash+range pair exists in the database.
    • attribute_not_exists(hash) must be true
    • attribute_not_exists(range) must be true
  2. The hash+range pair does not exist in the database.
    • attribute_not_exists(hash) must be false
    • attribute_not_exists(range) must be false

In both cases, you get the same result regardless of whether you put it on the hash or the range key. The hash+range key identifies a single item in the entire table, and your condition is being evaluated on that item.

You are effectively performing a "put this item if an item with this hash+range key does not already exist".


For Googlers:

  • (a) attribute_not_exists checks whether an item with same primary key as the to-be-inserted item exists
  • (b) Additionally, it checks whether an attribute exists on that item, value does not matter
  • If you only want to prevent overwriting, then use attribute_not_exists with primary key (or partition key, or range key), since the key must exist, check (b) will always pass, only check (a) will be in effect

Reasoning:

  • The name attribute_not_exists suggests that it checks whether an attribute exists on an item
  • But there are multiple items in the table, which item does it check against?
  • The answer is it checks against the item with the same primary key as the one you are putting in
  • This happens for all condition expressions
  • But as always, it is not properly and fully documented
  • See below official document about this feature, and taste its ambiguity

Note: To prevent a new item from replacing an existing item, use a conditional expression that contains the attribute_not_exists function with the name of the attribute being used as the partition key for the table. Since every record must contain that attribute, the attribute_not_exists function will only succeed if no matching item exists.

Link


This version of explanation taken from amazon aws forum says that a search will look an item that matches a provided hash key and then only checks if the attribute exists in that record. It should works the same if you have a hash and a range keys, I suppose.

If a request try to find an existing item with hash key "b825501b-60d3-4e53-b737-02645d27c2ae". If this is first time this id is being used there will be no existing item and "attribute_not_exists(email)" will evaluate to true, Put request will go through.

If this id is already used there will be an existing item. Then condition expression will look for an existing email attribute in the existing item, if there is an email attribute the Put request will fail, if there is no email attribute the Put request will go through.

Either way it's not comparing the value of "email" attribute and it's not checking if other items in the table used the same "email" value.

If email was the hash key, then request will try to find an existing item with hash key "[email protected]".

If there is another item with same email value an existing item will be found. Since email is the hash key it has to be present in the existing item and "attribute_not_exists(email)" will evaluate to false and Put request will fail.

If "email" value is not used before existing item will not be found and "attribute_not_exists(email)" will evaluate to true hence Put request will go through.