Tips and Tricks for data import into SQL Server – Part I
In this post, we will import a .csv into SQL server.
While importing csv can be one of the easiest task that can come over to you as a part of your daily duties of being a data engineer, sometimes it can require a bit more thinking and efforts.
A similar thing came across, where around 50+ large .csv files in a shared location and had to be imported in SQL server and all of them had varying headers which meant destination tables were supposed to have different schemas.
Though, there are numerous methods to achieve this for example you create an SSIS package or may be write a PowerShell script to loop through the files, and import them.
However, we looked to use SQL Server’s bulk insert command to complete the task.
If you have worked with BULK INSERT, you would have noticed that the destination table must be present beforehand.
While importing csv can be one of the easiest task that can come over to you as a part of your daily duties of being a data engineer, sometimes it can require a bit more thinking and efforts.
A similar thing came across, where around 50+ large .csv files in a shared location and had to be imported in SQL server and all of them had varying headers which meant destination tables were supposed to have different schemas.
Though, there are numerous methods to achieve this for example you create an SSIS package or may be write a PowerShell script to loop through the files, and import them.
However, we looked to use SQL Server’s bulk insert command to complete the task.
If you have worked with BULK INSERT, you would have noticed that the destination table must be present beforehand.
Therefore, the main challenge is that, destination tables
with different schemas have to be created dynamically (on the fly) according to the .csv being processed before we can import the data.
In this post, we will focus to resolve this challenge.
To achieve this, one of the methods is as follows :
Step 1: We will read the first record (header row) which has the column names.
DECLARE @filePath VARCHAR(1000) = ''
In this post, we will focus to resolve this challenge.
To achieve this, one of the methods is as follows :
Step 1: We will read the first record (header row) which has the column names.
DECLARE @filePath VARCHAR(1000) = ''
,@columnList VARCHAR(MAX) = ''
,@tableName VARCHAR(100)=''
,@dynamicSQL NVARCHAR(MAX)=''
;
-- Initialize the @filePath variable
SELECT @filePath = 'C:\Users\.....\SourceFile.csv' ;
-- Extracting the file name. We will
use the same variable as TableName of our destination table.
SELECT @tableName = SUBSTRING(RIGHT((@FilePath), CHARINDEX('\',REVERSE(@FilePath))-1),1,LEN(RIGHT((@FilePath), CHARINDEX('\',REVERSE(@FilePath))-1))-4) ;
-- We create a temporary table with a
single column
IF OBJECT_ID('tempdb..#bulkInsert') IS NOT NULL
DROP TABLE
#bulkInsert ;
CREATE TABLE #BulkInsert ( Header NVARCHAR(MAX)) ;
-- We will import only the first record
from the .csv
SET @dynamicSQL = 'BULK INSERT #bulkInsert FROM '''+@FilePath+''' WITH (FIRSTROW =
1, DATAFILETYPE = ''char'', LASTROW =
1)'
EXEC sp_executesql @dynamicSQL ;
-- Store this comma separed header
record in a variable
SELECT @columnList =
Header from #bulkInsert
Step 2: Once we have extracted the header, next step is to create a
table by splitting the variable storing the header record.
SELECT @dynamicSQL= '
IF OBJECT_ID (''[dbo].['+@tableName +']'',''U'') IS NOT NULL
DROP TABLE [dbo].['+@tableName
+']
CREATE TABLE [dbo].['+@tableName +']
('+CHAR(13)+
' ['+REPLACE(@columnList
, ',', '] NVARCHAR(4000) '+CHAR(13)+',[')+'] NVARCHAR(4000) '+CHAR(13)+')'
EXEC sp_executesql @dynamicSQL ;
Step 3: Once we have the table created, then we can use BULK
INSERT again, only this time we import all records skipping the header.
SET @dynamicSQL ='
BULK INSERT [dbo].['+@tableName+']
FROM '''+@FilePath+'''
WITH
(
FIRSTROW = 2,
DATAFILETYPE = ''char'',
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''\n'',
TABLOCK
);
'
EXEC sp_executesql @dynamicSQL ;
Step 4: Lets check the data.
SET @dynamicSQL =' SELECT '+@columnList+' FROM [dbo].['+@tableName+'] ;'
EXEC sp_executesql @dynamicSQL;
EXEC sp_executesql @dynamicSQL;
Note: Often when we are in development environment and working with dynamic SQL, we can print the dynamic SQL variable which helps us debug the code if needed.
Below are the dynamic SQLs generated in our code above.
That’s it ! We have completed our task. This code can be used along while looping through many .csv files and import data from them into destination tables which are created on the fly.
Comments
Post a Comment