Data Archival using Automated Sliding Window Partitioning - Part I
Introduction
One of the most common requirement in day-to-day production environment maintenance is data archival.As, the amount of data grows due to constant inflow of data. The overall performance of the application might also go down due to the ever increasing number of records in the underlying database tables.
This is where data archival comes into picture. Data archival simply means to move the data from the table which is being used by any downstream application to an archival table.
Ofcourse, it needs to be ensured that the data archival process is well thought off in terms of its impact on the data available for consumption for downstream applications.
It means for example, we do not want a scenario wherein a report was generated on the yearly sales data and we archived 6 months worth of sales data, thus ending up showing only 6 months sales in an yearly sales report.
For this post, we will focus on the underlying concept of the 'sliding window partitioning'. In the next part we will do a hands-on on the implementation.
What is 'Sliding Window Partitioning' ?
Sliding window partitioning is a mechanism wherein, we keep a preset number of partitions in the 'live' table and keep moving data in the 'older' partitions out into another 'staging' table from where the data can be archived or purged depending on the use case.Here, 'live' table can be thought of as a table where we have incoming data on a regular basis and due to which the size of the table gradually increase and hence the need to maintain it.
This table is horizontally partitioned let's say on a 'daily' partition basis i.e. one day's incoming data is put in one partition and next day in next partition , so on and so forth.
Note that, partitioning basic depends on the use case. We have choses 'daily' partitioning for illustration purpose of this post.
'Older' partition can be understood as the partition which holds the data which was inserted first and is a potential candidate to be moved out of the 'live' table.
'Staging' table by name itself can be self-understood, it stages the data which can be futher archived or purged.
Below, snapshot illustrates the scneario where daily some amount of records are inserted into the live table i.e. 'Table_IN'.
Make a note that on day 5, zero records were inserted.
Day 1 had 100, Day 2 had 200, Day 3 had 300, Day 4 had 400 and Day 6 had 500 records inserted respectively. Giving the table a total of 1500 records in 6 days.
Now, as we said we have 'daily' partitions we will see that all these days' data sits in their respective partitions.
Now, its time to introduce the 'sliding window'.
Lets say we want only 3 days data in Table_IN.
So, we have to implement a mechanism which will keep moving out the data which is in 'older' partition and ensure that we always have only 3 days data in our table.
Great Explanation. Thanks :)
ReplyDelete