Partitioning

In this post I would like to continue with partitioning series. Previous post we created partitioned view and look how it looks like in execution plan. In this post we will create partitioned table.

In case of simple partitioned table, we will need to do following:

  • Create table
  • Create partitioning function
  • Create partitioned schema
  • Relationship between partitioned schema and the table
  • Let’s check created partition
  • Let’s verify partition query in execution plan

As in the partitioned view post there will be created 3 partitions within the same filegroup (without separated disks per partitioned). This solution will not profit from parallel reading from partitions separated on different disks. For simple demonstration it will fit.

Let’s create the table first

DROP TABLE [dbo].[PartitionedTable]
CREATE TABLE [dbo].[PartitionedTable] (id INT NOT NULL , booking_date DATE NOT NULL, data  SYSNAME)

ALTER TABLE [dbo].[PartitionedTable] ADD PRIMARY KEY CLUSTERED ( [id] ASC, [booking_date] ASC)

Create partitioned function. It defines partition data according to following boundaries 

<
'2019-10-01' , >= '2019-10-01' <
'2019-11-01' , >= '2019-11-01' <
'2019-12-01', >='2019-12-01'.

The side of boundaries values is defined by LEFT or RIGHT key word.

CREATE PARTITION FUNCTION PartitionedFunction (DATE)
AS RANGE  RIGHT FOR VALUES ('2019-10-01', '2019-11-01', '2019-12-01');

Create partition scheme to specify which partition of table or index belongs to which filegroup. All partitions are defined for Primary filegroup in our case.

CREATE PARTITION SCHEME PartitionedScheme 
AS PARTITION PartitionedFunction ALL TO ([PRIMARY])

Now let’s create relationship between table and partitioned schema.

ALTER TABLE [dbo].[PartitionedTable] DROP CONSTRAINT PK_PartitionedTable__booking_date

ALTER TABLE [dbo].[PartitionedTable] ADD CONSTRAINT PK_PartitionedTable__booking_date PRIMARY KEY CLUSTERED ([booking_date] ASC) ON [PartitionedScheme]([booking_date])

Here we have few queries to check our partitioned objects.

Let’s check connection between partition schema and partitioned table.

SELECT * 
FROM sys.tables AS t
JOIN sys.indexes AS i ON t.[object_id] = i.[object_id] AND i.[type] IN (0,1)
JOIN sys.partition_schemes ps ON i.data_space_id = ps.data_space_id
WHERE t.name = 'PartitionedTable';

View to check created partitions.

SELECT t.name AS TableName,
i.name AS IndexName,
p.partition_number,
p.partition_id,
i.data_space_id,
f.function_id,
f.type_desc,
r.boundary_id,
r.value AS BoundaryValue
FROM      sys.tables AS t
JOIN      sys.indexes AS i           ON t.object_id = i.object_id
JOIN     sys.partitions  AS p      ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN      sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
JOIN      sys.partition_functions AS f ON s.function_id = f.function_id
LEFT JOIN sys.partition_range_values AS r ON f.function_id = r.function_id and r.boundary_id = p.partition_number
WHERE t.name = 'PartitionedTable' AND i.type <= 1
ORDER BY p.partition_number;

You can check partitioned column as well with following query.

SELECT
t.[object_id] AS ObjectID
, t.name AS TableName
, ic.column_id AS PartitioningColumnID
, c.name AS PartitioningColumnName
FROM sys.tables         AS t
JOIN sys.indexes         AS i ON t.[object_id] = i.[object_id] AND i.[type] <= 1 -- clustered index or a heap
JOIN sys.partition_schemes AS ps      ON ps.data_space_id = i.data_space_id
JOIN sys.index_columns AS ic ON ic.[object_id] = i.[object_id] AND ic.index_id = i.index_id AND ic.partition_ordinal >= 1 -- because 0 = non-partitioning column
JOIN sys.columns         AS c ON t.[object_id] = c.[object_id] AND ic.column_id = c.column_id
WHERE t.name = 'PartitionedTable'
;

Check Execution plan

If we run following query check execution plan how it is look like.

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >=
CAST ('2019-11-01' AS DATE) AND booking_date <= CAST ('2019-11-30' AS DATE)
Execution plan - Actual partition count
Execution plan – Actual partition count

We can see Actual Partition Count = 1 which is ok since we would like to get data just for one moth= one partition in our case.

Let’s notice RangePartitionNew function which contains ranges from all defined partitions. I would expect to see our predicate values, so why optimizer show this? The reason is that we use simple predicate in our query which leads to simple parametrization.

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >= CAST ('2019-11-01' AS DATE) AND booking_date <= CAST ('2019-11-30' AS DATE)

Modify the query as follows

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >= CAST ('2019-11-01' AS DATE) AND booking_date <= CAST ('2019-11-30'  AS DATE) AND 1<>2.

It eliminates simple parametrization and we get in execution plan what we expected, see plan bellow.

Execution plan  - Partition predicate
Execution plan – Partition predicate

Let’s try to modify query now u can see in execution plan that partition count value is 2.

SELECT * FROM [dbo].[PartitionedTable] WHERE booking_date >= CAST ('2019-11-01' AS DATE) AND booking_date < CAST ('2019-12-01' AS DATE)
Execution plan - Actual Partition Count
Execution plan – Actual Partition Count

It is important how the predicate is set. Since above mentioned query touches two partitions instead of 1.

That was very quick introduction to SQL Sever partitioning.

It would be nice to look at partitioning little bit deeper in one of my next post and try to compare some scenarios with partitioned views approach or how RangePartitionNew function works internally.

Stay tuned.

Partitioned views

In this post I would like to show you how to create simple local horizontal partitioned view.  The big advantage of this technique is to split huge table into smaller parts you can work with.  There must be fulfilled some prerequisites you can find here on MSDN in detail. In this post we will try just to access data from created partitioned view. Here are important prerequisites, must be fulfilled for creating portioned view in short summary. More detail you can find in MSDN  https://bit.ly/2WTrQmO in view sectio..

  • Tables in partitioned view are joined with UNION ALL operator
  • The same table cannot be repeated multiple times
  • Columns of each table has to be in same ordinal position
  • Partitioned column has to be part of primary key
  • Partitioned column has to be constrained based on partitioning conditions (for example date period)
  • All Columns of table (partition) has to be obtained in select of partitioned view

Script to create 3 tables partitions with booking_date partition column.

CREATE TABLE [dbo].[Partition__2019_10_01__2019_11_01]
(id INT NOT NULL , booking_date DATE NOT NULL, data SYSNAME) 
ALTER TABLE [dbo].[Partition__2019_10_01__2019_11_01] WITH CHECK ADD CONSTRAINT [CH__Partition1] CHECK (([booking_date]>='2019-10-01' AND [booking_date]<'2019-11-01'))
ALTER TABLE [dbo].[Partition__2019_10_01__2019_11_01] ADD PRIMARY KEY CLUSTERED (    [id]         ASC,   [booking_date] ASC)
GO
CREATE TABLE [dbo].[Partition__2019_11_01__2019_12_01] (id INT NOT NULL, booking_date DATE NOT NULL,data SYSNAME)
ALTER TABLE [dbo].[Partition__2019_11_01__2019_12_01] WITH CHECK ADD CONSTRAINT [CH__Partition2] CHECK (([booking_date]>='2019-11-01' AND [booking_date]<'2019-12-01'))
ALTER TABLE [dbo].[Partition__2019_11_01__2019_12_01] ADD PRIMARY KEY CLUSTERED 
(   [id]         ASC,  [booking_date] ASC
)
GO
CREATE TABLE [dbo].[Partition__2019_12_01__2020_01_01](id INT NOT NULL, booking_date DATE NOT NULL, data SYSNAME)
ALTER TABLE [dbo].[Partition__2019_12_01__2020_01_01] WITH CHECK ADD  CONSTRAINT [CH__Partition3] CHECK (([booking_date]>='2019-12-01' AND [booking_date]<'2020-01-01'))
ALTER TABLE [dbo].[Partition__2019_12_01__2020_01_01] ADD PRIMARY KEY CLUSTERED (    [id]         ASC,    [booking_date] ASC)
GO
CREATE VIEW [dbo].[PartitionedView] 
AS
SELECT * FROM [dbo].[Partition__2019_10_01__2019_11_01]
UNION ALL
SELECT * FROM [dbo].[Partition__2019_11_01__2019_12_01]
UNION ALL
SELECT * FROM [dbo].[Partition__2019_12_01__2020_01_01]
GO

Let’s put some data to all these partitions. Since your partitioned view is designed for data modification. I will mention prerequisites in next post.

INSERT INTO [dbo].[PartitionedView]
SELECT 1 ,'2019-11-05','TestData1'
INSERT INTO [dbo].[PartitionedView]
SELECT 2 ,'2019-11-06','TestData2'
INSERT INTO [dbo].[PartitionedView]
SELECT 3 ,'2019-11-07','TestData2'
INSERT INTO [dbo].[PartitionedView]
SELECT 4 ,'2019-11-08','TestData2'
INSERT INTO [dbo].[PartitionedView]
SELECT 5 ,'2019-12-05','TestData3'

Here is execution plan if you select data from one month (partition). You can see that we touched only partition, the predicate corresponds to the check constraint of the selected partition. Be aware that it is important to have partition column in predicate of query to work only with requested partition.

SELECT * FROM [dbo].[PartitionedView] WHERE booking_date >= '2019-11-01' AND booking_date <'2019-12-01'
Execution plan - Partitioned view
Execution plan – Partitioned view

Let’s suppose that we would like to get concrete record based on id attribute in our case. Using following query, we get data using index seek but we touched each partitioning partitioned view. Because we didn’t use partition column, used in check constraint of partition.

SELECT * FROM [dbo].[PartitionedView] WHERE id = 4
Execution plan - Partitioned view
Execution plan – Partitioned view

To get requested record effectively let’s change above mentioned query to:

SELECT * FROM [dbo].[PartitionedView] WHERE booking_date >= '2019-11-01' AND booking_date <'2019-12-01' AND id=4
Execution plan - Partitioned view
Execution plan – Partitioned view


Next time I would like to share some other knowledges regarding partitioned views. STAY TUNED!

 

Partitioning

I would like to start some posts series on my blog regarding techniques connected with high data volume. One of these techniques is partitioning. There are lots of sources describing partitioning approach. I will concentrate on practical usage in my posts. I will start this theme with partitioning views, since they are available in earlier versions of SQL server standard editions and they help to better illustrate principles of partitioning.

Partitioning helps when you have huge table with lots of historical data. Even when this table has designed indexes in good shape, large table has large indexes which has impact on seek operations and reading is not so fast.  It could cause other problems too, for example with data modification or data maintenance. Full-scan of very large statistic created with the index could be very time consuming. 

Splitting such a large objects to smaller partitions will help with all above mentioned aspects.  Even better, each partition could be part of filegroup on separate disk storage which leads to better I/O operations when data could be read in parallel. It will brings other advantages when you dont want to touch historical data or use them just for reading – you can bring them to filegroups and change them to read-only state.

There are few approaches how partitioning can be designed. I would like to describe them in several posts. In short:

from point of view of partitioned direction.

  • Horizontal partitioning – data are partitioned based on attribute/attributes defined in table. For example, data partitioned by month in separated tables. Each partition has the same structure – columns.
  • Vertical partitioning – there is defined attribute which connects more tables with fewer columns. Structure of each partition can be different. For example, let’s have transaction info, one table partition stores about customers another one regarding region. Each partition has unique key transaction_id creating relations between such a partitions.

from point of view of partition data distribution

  • Equally distributed – for example partitions based on monthly period connected to date dimension
  • Floating partitions – partition with data for last tree months. Older data are moved to archive partition with historical data

From point of view of data source

  • Partitions created in the same database
  • Partitions created across databases on the same server
  • Distributed partitions created across servers

There are more perspectives how we could look on partitoining. Lets go trough some sammples and implementation styles to get better overview of these basic terms.

Next post will be concentrated on partitioned views so stay tuned.