/ C#

Improving Bulk Insert Performance with Entity Framework

For the past few years, Entity Framework (EF) has improved in terms of performance and works well for most scenarios. But it has its shortcoming with bulk insert operations where performance starts to drop. In this article, I'll dig into how to boost your inserts which can improve the performance of your application.

There are many ways to perform bulk inserts. Yet, I’ll show three examples which are more common with EF. I’m excluding third-party libraries/solutions, but will provide links to some good ones. So, using a simple example, I’ll show the difference between:

  • Single row insert with commit
  • Committing all records at once
  • Batch insert

Let’s assume you need to load the database with a large amount of data during the initial testing phase of an application. To keep things as simple as possible, I'll build a domain model with only scalar (no navigation) properties using EF code first.

// The model
public sealed class Product
{
  public long Id {get;set;}
  public string Name {get;set;}
  public decimal UnitPrice {get;set;}
  public int? UnitInStock {get;set;}
}

// The entity configuration
public sealed class ProductMap : EntityTypeConfiguration<Product>
{
  public ProductMap()
  {
      ToTable("Products");
      HasKey(x => x.Id);
      Property(x => x.Id).HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
      Property(x => x.Name).IsRequired().HasMaxLength(100);
      Property(x => x.UnitPrice).IsRequired().HasPrecision(18,2);
      Property(x => x.UnitInStock).IsOptional();
  }
}

// The data context
public sealed class DataContext : DbContext
{
  public DataContext() : base("MyConnectionString")
  {
  }

  public IDbSet<Product> Products { get; set; }

  protected override void OnModelCreating(DbModelBuilder modelBuilder)
  {
      modelBuilder.Configurations.Add(new ProductMap());
  }
}

The scenarios below will consist of inserting 100, 000 records and measure the time taken for each method to do the insert. So, without further ado, let’s get to the code!

Single Row Insert

In this scenario, each record is added to the context and a call to SaveChanges() is made.

using (var scope = new TransactionScope())
{
    using (var context = new DataContext())
    {
        for(var i = 0; i <= 100000; i++)
        {
            var product = new Product
            {
                Name = Util.GenerateName(20),
                UnitPrice = decimal.Parse(Util.GenerateNumber(3)),
                UnitInStock = int.Parse(Util.GenerateNumber(2))
            };
            
            context.Products.Add(product);
            context.SaveChanges();
        }
    }
    
    scope.Complete();
}

The result here is atrocious. I waited, waited and waited…and it took around 4 hours to complete. This method is considered as a bad practice as you'll guess, but I wanted to show the result so that you can avoid it. Calling SaveChanges() for each record slows down bulk insert. So avoid it!

Inserting All Records at Once

This method is quite similar with the one above with a slight modification. We’ll call the SaveChanges() method after all records are added to the context.

using (var scope = new TransactionScope())
{
    using (var context = new DataContext())
    {
        for(var i = 0; i <= 100000; i++)
        {
            context.Products.Add(product);
        }
        context.SaveChanges();
    }

    scope.Complete();

}

Doing this took around 1 hour to complete and you gain a solid 3 hours, but there’s one caveat. We’re adding all the records to the same context instance, which means that EF is tracking 100, 000 records. For the first few records which have been added, this will run fast. But by the time you reach 100, 000, EF is tracking an even larger object graph. There are two common ways to fix this problem:​

  • Turning off AutoDetectChangesEnabled
  • Re-creating the context (which I’ll discuss in the next section)

Turning Off "AutoDetectChangesEnabled"

You can turn off this flag by doing this:

using (var context = new DataContext())
{
    context.Configuration.AutoDetectChangesEnabled = false;

    for(var i = 0; i < 100000; i++)
    {
    }
}

You’ll get a tremendous amount of time, roughly 29 seconds, to get the same 100, 000 records inserted. You can read more on AutoDetectChanges performance here. You should be cautious when turning it off as EF won’t track your entities - especially if you have navigation properties. For specific situations such as data imports, I would recommend turning it off because you’ll save a lot of time unless you don’t mind the long wait.

Batch Insert

For our final method, the idea here is to batch several inserts, call SaveChanges() and re-create the context. I’ve also made an extension method for convenience.

// The extension method
public static class DataContextExtensions
{
    public static DataContext BulkInsert<T>(this DataContext context, T entity, int count, int batchSize) where T : class
    {
        context.Set<T>().Add(entity);

        if (count % batchSize == 0)
        {
            context.SaveChanges();
            context.Dispose();
            context = new DataContext();

            // This is optional
            context.Configuration.AutoDetectChangesEnabled = false;
        }
        return context;
    }
}

// Usage
using(var scope = new TransactionScope())
{
    DataContext context;

    using(context = new DataContext())
    {
        // This is optional
        context.Configurations.AutoDetectChangesEnabled = false;

        for(var i = 0; i < 100000; i++)
        {
            // create product
            
            // insert in batch
            context = context.BulkInsert(product, i, 100);
        }
        
        context.SaveChanges(); // save remaining items
    }

    scope.Complete();
}

Result: 32 secs with AutoDetectChangesEnabled ON and 25 secs when OFF

As you can see, this method takes less the time to complete the process. It’s important to call SaveChanges() after x number of items (the batch size), keeping the context as small as possible. You can change this number, say 1000, and see if it performs better for you. Another important factor is to dispose of​ the context, clearing it from all entities as SaveChanges() doesn’t detach anything. Instead, entities which have been already committed are still attached to the context in an “Unchanged” state. Also, turning AutoDetectChangesEnabled off is not necessary as the performance gain is insignificant (as compared to the second method with AutoDetectChangesEnabled off). Instead, you’ll get all tracking benefits when working with more complex entities.

Third Party Libraries

EntityFramework.BulkInsert - Open Source

EntityFramework.Utilities - Open Source

.NET Bulk Operations - Free Trial, Licensed

Conclusion

Entity Framework has come a long way in terms of performance and stability and is works well in most scenarios. It has its pitfalls like most ORMs do, but there are myriads of libraries and solutions out there to solve such problems.

Image credits: The Entity Framework logo from CodePlex

Improving Bulk Insert Performance with Entity Framework
Share this

Subscribe to Djamseed Khodabocus