What is an efficient way to handle inserts of unique "immutable" entities by multiple producers with optimistic concurrency approach?

Table Valued Parameters

One option is to use table valued parameters instead of individual calls to the database.

Example procedure using a table valued parameter:

create type dbo.CommonEntity_udt as table (
    CommonEntityGroupId int not null
  , Name      nvarchar(100) not null
  , primary key (CommonEntityGroupId,Name)
    );
go

create procedure dbo.CommonEntity_set (
    @CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
  set nocount on;
  set xact_abort on;
  if exists (
    select 1 
      from @CommonEntity as s
        where not exists (
          select 1 
            from dbo.CommonEntity as t
            where s.Name = t.Name
              and s.CommonEntityGroupId = t.CommonEntityGroupId
            ))
    begin;
      insert dbo.CommonEntity (Name)
        select s.Name
          from @CommonEntity as s
          where not exists (
            select 1 
              from dbo.CommonEntity as t with (updlock, holdlock)
              where s.Name = t.Name
                and s.CommonEntityGroupId = t.CommonEntityGroupId
              );
    end;
end;
go
  • Testing an Insert for Race Conditions with Ostress.exe - Kendra Little

table valued parameter reference:

  • SQL Server 2008 Table-Valued Parameters and C# Custom Iterators: A Match Made In Heaven! - Leonard Lobel
  • Using Table Valued Parameters in Entity Framework - Ritesh Sharma
  • Using Table-Valued Parameters in SQL Server and .NET - Erland Sommarskog
  • how to use TVPs with Entity Framework 4.1 and CodeFirst
  • Maximizing Performance with Table-Valued Parameters - Dan Guzman

I don't recommend merge unless there is a compelling argument for it. This situation is only looking at inserting, so it seems like overkill.

Example merge version with table valued parameter:

create procedure dbo.CommonEntity_merge (
    @CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
  set nocount on;
  set xact_abort on;
  if exists (
    select 1 
      from @CommonEntity as s
        where not exists (
          select 1 
            from dbo.CommonEntity as t
            where s.Name = t.Name
              and s.CommonEntityGroupId = t.CommonEntityGroupId
            ))
    begin;
      merge dbo.CommonEntity with (holdlock) as t
      using (select CommonEntityGroupId, Name from @CommonEntity) as s
      on (t.Name = s.Name
        and s.CommonEntityGroupId = t.CommonEntityGroupId)
      when not matched by target
        then insert (CommonEntityGroupId, Name) 
        values (s.CommonEntityGroupId, s.Name);
    end;
end;
go

merge reference:

  • Use Caution with SQL Server's MERGE Statement - Aaron Bertrand
  • The Case of the Blocking Merge Statement (LCK_M_RS_U locks) - Kendra Little
  • UPSERT Race Condition With Merge - sqlteam
  • An Interesting MERGE Bug - Paul White
  • Can I optimize this merge statement - Aaron Bertrand
  • If you are using indexed views and MERGE, please read this! - Aaron Bertrand


ignore_dup_key code comment:

// Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)

ignore_dup_key is going to use serializable behind the the scenes; potentially costly overhead on non-clustered indexes; and even when the index is clustered, can have significant costs depending on the amount of duplicates.

This can be handled in the stored procedures using Sam Saffron's upsert (update/insert) pattern, or one of the patterns shown here: Performance impact of different error handling techniques - Aaron Bertrand.



Well choosing the approach will certainly depend on the type of functionality & amount of data that both procedures will be using.

If we go with the first approach, then certainly for each of the SaveChanges() call, the Entity Framework will put a transaction. This could reduce the performance a bit in case of large number of records.

If there is a considerable amount of records which needs to be inserted/updated, then I will surely go with the Stored procedure based approach. With this approach, you will have a full control on the database & querying for the record to check if it exists will be very easy (though some fine tuning may be required here). I don't see if there would be any challenges implementing the same with stored procedures. With few implementation optimizations like loading the data into temporary tables (not SQL temp tables, but physical tables which can be used to store data temporarily), this can be further enhanced to have full information log the stored procedure has processed.


Based on your last key point another solution is to move you "Creation" logic to a central application server/service (See update 2) that has a queue users can use to "add" records.

Since most of your records already exist,if you use some sort of caching, you should be able to make this quite efficient

Now,about the number a records.
You have to keep in mind the EF was not designed to support "bulk" operations therefore,creating thousands of records will be (really really) slow.

I have used 2 solutions that help you and a huge number of records very fast 1)EntityFramework.BulkInsert
2)SqlBulkCopy

Both are extremely easy to use

Also,I hope you've already seen Fastest Way of Inserting in Entity Framework

Update
Below is another solution that I've used twice recently
Instead of saving your record when a user performs a "Save",schedule it to happen X seconds later.
If in the meantime someone else trying to save the same record,just "slide" the Scheduled Date.

Below you can see a sample code that tries to save the same record 10 times(at the same time) but the actual save only happens once.

The actual result can be seen here:

enter image description here

using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;

namespace ConsoleApplicationScheduler
{
    class Program
    {
        static void Main(string[] args)
        {
            ConcurrentSaveService service = new ConcurrentSaveService();
            int entity = 1;
            for (int i = 0; i < 10; i++)
            {
                //Save the same record 10 times(this could be conrurrent)
                service.BeginSave(entity);
            }

            Console.ReadLine();
        }
    }

    public class ConcurrentSaveService
    {
        private static readonly ConcurrentDictionary<int, DateTime> _trackedSubjectsDictionary = new ConcurrentDictionary<int, DateTime>();

        private readonly int _delayInSeconds;

        public ConcurrentSaveService()
        {
            _delayInSeconds = 5;
        }

        public async void BeginSave(int key)
        {
            Console.WriteLine("Started Saving");
            DateTime existingTaskDate;
            _trackedSubjectsDictionary.TryGetValue(key, out existingTaskDate);

            DateTime scheduledDate = DateTime.Now.AddSeconds(_delayInSeconds);
            _trackedSubjectsDictionary.AddOrUpdate(key, scheduledDate, (i, d) => scheduledDate);

            if (existingTaskDate > DateTime.Now)
                return;

            do
            {
                await Task.Delay(TimeSpan.FromSeconds(_delayInSeconds));

                DateTime loadedScheduledDate;
                _trackedSubjectsDictionary.TryGetValue(key, out loadedScheduledDate);
                if (loadedScheduledDate > DateTime.Now)
                    continue;

                if (loadedScheduledDate == DateTime.MinValue)
                    break;

                _trackedSubjectsDictionary.TryRemove(key, out loadedScheduledDate);

                if (loadedScheduledDate > DateTime.MinValue)
                {
                    //DoWork
                    Console.WriteLine("Update/Insert record:" + key);
                }

                break;
            } while (true);

            Console.WriteLine("Finished Saving");
        }
    }
}

Update 2 Since you can control the "creation" process in your WebAPI app you should be able to avoid duplicate using some sort of cache like in the following pseudocode

using System.Collections.Concurrent;
using System.Web.Http;

namespace WebApplication2.Controllers
{
    public class ValuesController : ApiController
    {
        static object _lock = new object();
        static ConcurrentDictionary<string, object> cache = new ConcurrentDictionary<string, object>();
        public object Post(InputModel value)
        {
            var existing = cache[value.Name];
            if (existing != null)
                return new object();//Your saved record

            lock (_lock)
            {
                existing = cache[value.Name];
                if (existing != null)
                    return new object();//Your saved record

                object newRecord = new object();//Save your Object

                cache.AddOrUpdate(value.Name, newRecord, (s, o) => newRecord);

                return newRecord;
            }
        }
    }

    public class InputModel
    {
        public string Name;
    }
}