Multi-thread batch inserts to one table without wait

Multi-thread parallel inserts of multiple rows into one table without locks causing waits? I was convinced, that this task can be realized without deeper thinking about its implementation. But I am wrong, even though there are lots of posts on internet, describing that it is easy task.

I simulated this scenario on my laptop environment with SQL Server 2017 developer edition.

I have one table and I would like to insert data into the table from two sessions in parallel fashion. See structure of the heap table below.

CREATE TABLE dbo.ConcurentTable (b INT)

Execute long running insert to the table which should cause wait of another insert running in next query window.

BEGIN TRAN
INSERT INTO dbo.ConcurentTable
SELECT TOP 20000000 a.object_id
FROM sys.objects a WITH (NOLOCK)
JOIN sys.objects b WITH (NOLOCK) ON 1=1
JOIN sys.objects c WITH (NOLOCK) ON c.object_id=a.object_id

COMMIT

Here we have the short one insert.

BEGIN TRAN
INSERT INTO dbo.ConcurentTable  (b )
SELECT TOP 100000 a.object_id
FROM sys.objects a WITH (NOLOCK)
JOIN sys.objects b WITH (NOLOCK) ON b.object_id=a.object_id
COMMIT

Now run the long running query in first query window and then the query in the second one window.

In my environment I see that the second one query is waiting till the first one is finished.

What is the reason? Let’s look on the lock type first. Use dynamic views or sp_lock proc to get following output.

Exclusive table locks
Exclusive table locks

You can see that there is an exclusive lock as expected according to documentation defining that DML operations use exclusive locks to protect more processes trying to change the same data. Used for data-modification operations, such as INSERT, UPDATE, or DELETE. Ensures that multiple updates cannot be made to the same resource at the same time. Seehttps://bit.ly/39trAiq
The lock escalated to Table Resource level, because  it is default behaviour of SQL server. The table level is more convenient for SQL server in our scenario because it spares some more resources. If SQL server would use lower level of locking mechanism, for example row resource level, for such a huge amount of inserted data, there will be lots of locks on lower level which is very time and resource consuming with impact on transaction log, etc…

I played with count of inserted rows to get locks on lower resources. When I get smaller batch for insert (about 2000 on my environment), SQL Server used Lock escalation on row level. For smaller amount of data, SQL Server would prefer to use lower resources to lock. Now you could see RID lock (because we don’t have any index on the table) which means that the whole table is not locked exclusively but its ranges of rows only.

Notice: There is defined threshold when SQL Server decides to use lock escalation. I found information in SQL Server 2008 Internals book.

  • The number of locks held by a single statement on one object, or on one partition of one object, exceeds a threshold. Currently that threshold is 5,000 locks, but it might change in future service packs. The lock escalation does not occur if the locks are spread over multiple objects in the same statement—for example, 3,000 locks in one index and 3,000 in another.
  • Memory taken by lock resources exceeds 40 percent of the non-AWE (32-bit) or regular (64-bit) enabled memory and the locks configuration option is set to 0. (In this case, the lock memory is allocated dynamically as needed, so the 40 percent value is not a constant.) If the locks option is set to a nonzero value, memory reserved for locks is statically allocated when SQL Server starts. Escalation occurs when SQL Server is using more than 40 percent of the reserved lock memory for lock resources.

    But as I understood from other blog posts and discussions, above mentioned could change from distribution to distribution. It would be better to make some tests find the threshold on your environment. I am going to write another post about this topic.

    Is there a way to manage above mentioned for huge batches of inserted data?

    Let’s try.

    We can use hint to force query to use  lower resource level – ROW or PAGE. We can use ROWLOCK hint to force lock escalation on low resource level. But as states in documentation SQL Server still can decide which locking level will be used.

    INSERT INTO dbo.ConcurentTable WITH (ROWLOCK)

    To prevent such a behaviour, we can disable lock escalation on our table. Run following command. (there are other ways to change the lock escalation behaviour even on SQL Server level, but let’s describe in another post).

    ALTER TABLE dbo.ConcurentTable SET (LOCK_ESCALATION = DISABLE)

    Run the queries to simulate our scenario again. From sp_lock output you can see now exclusive lock on RID resource since we have heap table.

    RID locks
    RID locks

    Now you should be successful with your second one query. The data were inserted before the first one query is finished, without waiting. In my environment I have occasionally problems with RID lock and PAGE lock. Seems to me that these two kinds of locks cannot guarantee concurrent batch inserts in the same time. I extend the table about additional attribute to split the inserted data and support concurrent inserts with KEY lock, see below.

    Notice you can check waiting behaviour with NOWAIT hint which raises error in case the query is waiting.

    Msg 1222, Level 16, State 45, Line 13

    Lock request time out period exceeded.

    Let’s add an attribute to distinguish batches of data based on SPID and sort he tables according to it.

    ALTER TABLE dbo.ConcurentTable ADD spidFilter SMALLINT NOT NULL DEFAULT @@spid
    
    ALTER TABLE dbo.ConcurentTable ADD CONSTRAINT chSPidFIlter CHECK (SPIDFILTER = @@SPID)
    
    CREATE CLUSTERED INDEX CX_ConcurentTable ON dbo.ConcurentTable (SPIDFILTER )

    Here we got it.

    The second one query finished while the first “blocking” one is still running. In sp_lock we can see that there is KEY lock resource type used because of added clustered index.

    KEY locks
    KEY locks

    When I tried to put lock escalation back on the table, I returned at the beginning – table lock and the second query was waiting till the first one was finsihed.

    Closure

    I was successful in concurrency scenario when 2 batches were inserted data to the same table from multiple threads with:

    • Lock escalation on the table was disabled
    • Query was forced to use lock on low resources level
    • Clustered index design to separate inserted rows helped with count of succesfull experiments
    • Locks on lower level resources hepled to increase concurrency of inserts in general

    Notices

    • Still not sure if above mentioned will work in all cases
    • This forced lower level escalation approach does not block the second query but from point of view of resources it could be better strategy to leave the first one query finish and after that to execute the second one query ( the behaviour we tried at the beginning of this post). But this could be subject of other tests and play I will realize in one of my next post
    • If you know any better way how to manage above mentioned scenario, please share it

    Stay tuned

Partitioning

In this post I would like to continue with partitioning series. Previous post we created partitioned view and look how it looks like in execution plan. In this post we will create partitioned table.

In case of simple partitioned table, we will need to do following:

  • Create table
  • Create partitioning function
  • Create partitioned schema
  • Relationship between partitioned schema and the table
  • Let’s check created partition
  • Let’s verify partition query in execution plan

As in the partitioned view post there will be created 3 partitions within the same filegroup (without separated disks per partitioned). This solution will not profit from parallel reading from partitions separated on different disks. For simple demonstration it will fit.

Let’s create the table first

DROP TABLE [dbo].[PartitionedTable]
CREATE TABLE [dbo].[PartitionedTable] (id INT NOT NULL , booking_date DATE NOT NULL, data  SYSNAME)

ALTER TABLE [dbo].[PartitionedTable] ADD PRIMARY KEY CLUSTERED ( [id] ASC, [booking_date] ASC)

Create partitioned function. It defines partition data according to following boundaries 

<
'2019-10-01' , >= '2019-10-01' <
'2019-11-01' , >= '2019-11-01' <
'2019-12-01', >='2019-12-01'.

The side of boundaries values is defined by LEFT or RIGHT key word.

CREATE PARTITION FUNCTION PartitionedFunction (DATE)
AS RANGE  RIGHT FOR VALUES ('2019-10-01', '2019-11-01', '2019-12-01');

Create partition scheme to specify which partition of table or index belongs to which filegroup. All partitions are defined for Primary filegroup in our case.

CREATE PARTITION SCHEME PartitionedScheme 
AS PARTITION PartitionedFunction ALL TO ([PRIMARY])

Now let’s create relationship between table and partitioned schema.

ALTER TABLE [dbo].[PartitionedTable] DROP CONSTRAINT PK_PartitionedTable__booking_date

ALTER TABLE [dbo].[PartitionedTable] ADD CONSTRAINT PK_PartitionedTable__booking_date PRIMARY KEY CLUSTERED ([booking_date] ASC) ON [PartitionedScheme]([booking_date])

Here we have few queries to check our partitioned objects.

Let’s check connection between partition schema and partitioned table.

SELECT * 
FROM sys.tables AS t
JOIN sys.indexes AS i ON t.[object_id] = i.[object_id] AND i.[type] IN (0,1)
JOIN sys.partition_schemes ps ON i.data_space_id = ps.data_space_id
WHERE t.name = 'PartitionedTable';

View to check created partitions.

SELECT t.name AS TableName,
i.name AS IndexName,
p.partition_number,
p.partition_id,
i.data_space_id,
f.function_id,
f.type_desc,
r.boundary_id,
r.value AS BoundaryValue
FROM      sys.tables AS t
JOIN      sys.indexes AS i           ON t.object_id = i.object_id
JOIN     sys.partitions  AS p      ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN      sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
JOIN      sys.partition_functions AS f ON s.function_id = f.function_id
LEFT JOIN sys.partition_range_values AS r ON f.function_id = r.function_id and r.boundary_id = p.partition_number
WHERE t.name = 'PartitionedTable' AND i.type <= 1
ORDER BY p.partition_number;

You can check partitioned column as well with following query.

SELECT
t.[object_id] AS ObjectID
, t.name AS TableName
, ic.column_id AS PartitioningColumnID
, c.name AS PartitioningColumnName
FROM sys.tables         AS t
JOIN sys.indexes         AS i ON t.[object_id] = i.[object_id] AND i.[type] <= 1 -- clustered index or a heap
JOIN sys.partition_schemes AS ps      ON ps.data_space_id = i.data_space_id
JOIN sys.index_columns AS ic ON ic.[object_id] = i.[object_id] AND ic.index_id = i.index_id AND ic.partition_ordinal >= 1 -- because 0 = non-partitioning column
JOIN sys.columns         AS c ON t.[object_id] = c.[object_id] AND ic.column_id = c.column_id
WHERE t.name = 'PartitionedTable'
;

Check Execution plan

If we run following query check execution plan how it is look like.

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >=
CAST ('2019-11-01' AS DATE) AND booking_date <= CAST ('2019-11-30' AS DATE)
Execution plan - Actual partition count
Execution plan – Actual partition count

We can see Actual Partition Count = 1 which is ok since we would like to get data just for one moth= one partition in our case.

Let’s notice RangePartitionNew function which contains ranges from all defined partitions. I would expect to see our predicate values, so why optimizer show this? The reason is that we use simple predicate in our query which leads to simple parametrization.

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >= CAST ('2019-11-01' AS DATE) AND booking_date <= CAST ('2019-11-30' AS DATE)

Modify the query as follows

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >= CAST ('2019-11-01' AS DATE) AND booking_date <= CAST ('2019-11-30'  AS DATE) AND 1<>2.

It eliminates simple parametrization and we get in execution plan what we expected, see plan bellow.

Execution plan  - Partition predicate
Execution plan – Partition predicate

Let’s try to modify query now u can see in execution plan that partition count value is 2.

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >= CAST ('2019-11-01' AS DATE) AND booking_date < CAST ('2019-12-01' AS DATE)
Execution plan - Actual Partition Count
Execution plan – Actual Partition Count

It is important how the predicate is set. Since above mentioned query touches two partitions instead of 1.

That was very quick introduction to SQL Sever partitioning.

It would be nice to look at partitioning little bit deeper in one of my next post and try to compare some scenarios with partitioned views approach or how RangePartitionNew function works internally.

Stay tuned.

Insert exec

I was thinking about difference between INSERT and INSERT-EXEC statement from performance perspective since I saw lots of post announcing that you should avoid the second one mentioned (INSERT-EXEC). I decided to make few tests and look at this problematic from more angles to have better imagination about how these commands behave.

Testing scenarios:

  • Execution plan
  • IO stats and time stats
  • Extended events – statement completed event – row count
  • Transaction log behaviour
  • Transaction scope

Let’s create data structures for our tests:

First create tables for data insert

 CREATE TABLE dbo.InsertTable1 (id INT , [data] VARCHAR(255))
CREATE TABLE dbo.InsertTable2 (id INT , [data] VARCHAR(255))

Create two persistent temporary tables we will fill inside procedures we create:

CREATE TABLE dbo.TempTable1 (id INT , [data] VARCHAR(255))
CREATE TABLE dbo.TempTable2 (id INT, [data] VARCHAR(255))

Create objects we use to test:

The first one is stored procedure inserting data to InsertTable1 table – insert statement is part of stored procedure definition.

 CREATE PROCEDURE dbo.InsertData
   AS
INSERT INTO dbo.TempTable1
SELECT TOP 100000 a.object_id,REPLICATE('a',10 ) a
  FROM sys.objects a
  JOIN sys.objects b ON 1=1

 INSERT INTO dbo.TempTable2
 SELECT TOP 100000 a.object_id,REPLICATE('a',10 ) a
  FROM sys.objects a
  JOIN sys.objects b ON 1=1 

  INSERT INTO dbo.InsertTable1
  SELECT * FROM dbo.TempTable1
  UNION ALL
  SELECT * FROM dbo.TempTable2

The second one batch inserts data to InsertTable2 table – insert is realized by INSERT – EXEC statement. 

EXEC ('INSERT INTO dbo.TempTable1 SELECT TOP 100000 a.object_id,REPLICATE('a',10 ) a 
FROM sys.objects a
JOIN sys.objects b ON 1=1

INSERT INTO dbo.TempTable2

SELECT TOP 100000  a.object_id,REPLICATE('a',10 ) a
FROM sys.objects a
JOIN sys.objects b ON 1=1

INSERT INTO dbo.InsertTable1
SELECT * FROM dbo.TempTable1
UNION ALL
SELECT * FROM dbo.TempTable2
')

Execute satements to fill the data:

exec  dbo.InsertData

INSERT EXEC ('…') /*put exec part above*/

Cleaning statements – call it before statements executions

TRUNCATE TABLE dbo.InsertTable1
TRUNCATE TABLE dbo.InsertTable2
TRUNCATE TABLE dbo.TempTable1
TRUNCATE TABLE dbo.TempTable2
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
CHECKPOINT

Switch IO stats/processing time /actual execution plan in studio on. Or put following commands:

SET STATISTICS IO ON
SET SHOWPLAN_XML ON
SET STATISTICS TIME ON

Run following extended event with result to file.

CREATE EVENT SESSION GetRowsCount ON SERVER
ADD EVENT transaction_log 
( ACTION (sqlserver.sql_text, sqlserver.tsql_stack, sqlserver.database_id,sqlserver.username)
)
,
ADD EVENT sql_transaction 
( ACTION (sqlserver.sql_text, sqlserver.tsql_stack, sqlserver.database_id,sqlserver.username)
)
,
ADD EVENT sql_batch_completed 
( ACTION (sqlserver.sql_text, sqlserver.tsql_stack, sqlserver.database_id,sqlserver.username)
)
ADD TARGET package0.event_file(
SET filename='C:\outputfile\outputfile2.xel')
ALTER EVENT SESSION GetRowsCount ON SERVER STATE = START
ALTER EVENT SESSION GetRowsCount ON SERVER STATE = STOP

DROP EVENT SESSION GetRowsCount ON SERVER

Statement completed event – counters

Queries touched 400000 rows. 100000 for each from the two temptables and 200000 for final insert. But in case of insert we can see from batch completed event that query touched 600000 instead of 400000. Look at table below to check other counters.

SELECT CAST (EVENT_data AS XML),
       CAST (EVENT_data AS XML).value('(/event/data[@name="batch_text"])[1]','NVARCHAR(100)')     AS batch_text,
       CAST (EVENT_data AS XML).value('(/event/data[@name="duration"])[1]','NVARCHAR(100)')       AS duration,
       CAST (EVENT_data AS XML).value('(/event/data[@name="physical_reads"])[1]','NVARCHAR(100)') AS physical_reads,
       CAST (EVENT_data AS XML).value('(/event/data[@name="logical_reads"])[1]','NVARCHAR(100)') AS logical_reads,
     CASt
(EVENT_data AS
XML).value('(/event/data[@name="row_count"])[1]','NVARCHAR(100)')
AS row_count
FROM sys.fn_xe_file_target_read_file('C:\outputfile\outputfile1*.xel',
'C:\outputfile\outputfile1*.xem',
null,null) 
WHERE object_name = 'sql_batch_completed' ;

 

Counter Insert Insert – exec
Writes 1350 2194
Duration 9149345 11984022
Row count 400000 600000
Logical Reads 471973 962908
Physical Rads 662 1238

As you can see INSERT – EXEC statements consume more resources than normal insert on same set of data.

How it is possible than INSERT – EXEC generates additional 200000 row counts and finally touched 600000 rows and we made inserts with 400000 rows at total?

IO stats

Let’s check IO stats and execution plan to see the difference.

Worktable in IO stats
Worktable in IO stats

 

On picture above you can see that with INSERT-EXEC statement Worktable is created, means that insert-exec uses tempdb to store result- set before final insert. So, there we see that is an additional impact on tempdb and tempdb transaction log too.

Execution plan

Execution plan - INSERT EXEC
Execution plan – INSERT EXEC

 

The difference in execution plan is as follow. With INSERT – EXEC you get two statements. The one for the result-set of EXEC and the second one for inserting the data.

Execution plan INSERT
Execution plan INSERT

Transaction log

Another perspective is the transaction scope behaviour. Let’s get data from transaction_log event to temporary tables first.

SELECT     CAST (EVENT_data AS XML).value('(/event/data[@name="log_record_size"])[1]','INT') logsize,
CAST (EVENT_data AS XML).value('(/event/@timestamp)[1]','datetime2') timestamp,
CAST (EVENT_data AS XML).value('(/event/data[@name="transaction_start_time"])[1]','datetime2') date_time,
CAST (EVENT_data AS XML).value('(/event/data[@name="database_id"])[1]','INT')    database_id,
CAST (EVENT_data AS XML).value('(/event/data[@name="transaction_id"])[1]','INT')    transaction_id,
CAST (EVENT_data AS XML).value('(/event/action[@name="sql_text"])[1]','VARCHAR(1000)')    sql_text ,
CAST (EVENT_data AS XML).value('(/event/data[@name="operation"])[1]','VARCHAR(1000)')    operation
INTO #t1
FROM sys.fn_xe_file_target_read_file('C:\outputfile\outputfile1*.xel', 'C:\outputfile\outputfile1*.xem', null, null)
WHERE object_name = 'transaction_log' ;

SELECT CAST (EVENT_data AS XML).value('(/event/data[@name="log_record_size"])[1]','INT') logsize,
CAST (EVENT_data AS XML).value('(/event/@timestamp)[1]','datetime2') timestamp,
CAST (EVENT_data AS XML).value('(/event/data[@name="transaction_start_time"])[1]','datetime2') date_time,
CAST (EVENT_data AS XML).value('(/event/data[@name="database_id"])[1]','INT')    database_id,
CAST (EVENT_data AS XML).value('(/event/data[@name="transaction_id"])[1]','INT')    transaction_id,
CAST (EVENT_data AS XML).value('(/event/action[@name="sql_text"])[1]','VARCHAR(1000)')    sql_text ,
CAST (EVENT_data AS XML).value('(/event/data[@name="operation"])[1]','VARCHAR(1000)')    operation
INTO #t2
FROM sys.fn_xe_file_target_read_file('C:\outputfile\outputfile2*.xel', 'C:\outputfile\outputfile2*.xem', null, null)
WHERE object_name = 'transaction_log' 
;

 Comparing following query outputs, we can see that insert-exec is scoped by one transaction against multiple individual transactions with normal insert.

SELECT COUNT(1) row_count,SUM(logsize) logsize,transaction_id,database_id FROM #t1 WHERE operation = '2LOP_INSERT_ROWS' GROUP BY transaction_id ,database_id

SELECT COUNT(1) row_count,SUM(logsize) logsize,transaction_id,database_id FROM #t2 WHERE operation = '2LOP_INSERT_ROWS' GROUP BY transaction_id, database_id
Transaction log output
Transaction log output

Transaction scope

In case of INSERT-EXEC statement it should rollback all insert statements inside of EXEC statement when error occurs, because INSERT – EXEC is scoped by one transaction. In case of individual transactions in stored procedure, each insert is taken like separate transaction, so rollbacked will be insert resulting with error. Let’s try:

Change type of inserting value to INT column in second insert.

INSERT INTO dbo.TempTable2
SELECT TOP 100000 REPLICATE('a',10 ) REPLICATE('a',10 ) a
FROM sys.objects a
JOIN sys.objects b ON 1=1

Run testing queries again. As you can see in case of INSERT-EXEC statement there are no rows inserted in tables since rollback appears.

Conclusion:

While INSERT-EXEC statement still takes place in some scenarios, you should be aware of mentioned circumstances.

NOTICE: I would like to check that transaction log of temporary database was filled with connection to the worktable created by INSERT-EXEC statement. But I cannot see any insert lop insert operation trough extended events in temporary database. I just see extent allocation

SELECT * FROM #t2 WHERE database_id = 2

 If you have any idea whether worktables are logged and it is possible to trace them, please write a comment.