Tips and Tricks for data import into SQL Server – Part I


In this post, we will import a .csv into SQL server.

While importing csv can be one of the easiest task that can come over to you as a part of your daily duties of  being a data engineer, sometimes it can require a bit more thinking and efforts.

A similar thing came across, where around 50+ large .csv files in a shared location and had to be imported in SQL server and all of them had varying headers which meant destination tables were supposed to have different schemas.

Though, there are numerous methods to achieve this for example you create an SSIS package or may be write a PowerShell script to loop through the files, and import them.

However, we looked to use SQL Server’s bulk insert command to complete the task.
If you have worked with BULK INSERT, you would have noticed that the destination table must be present beforehand.

Therefore, the main challenge is that, destination tables with different schemas have to be created dynamically (on the fly) according to the .csv being processed before we can import the data.

In this post, we will focus to resolve this challenge.

To achieve this, one of the methods is as follows :

Step 1: We will read the first record (header row) which has the column names.

DECLARE @filePath VARCHAR(1000) = ''
       ,@columnList VARCHAR(MAX) = ''
       ,@tableName VARCHAR(100)=''
       ,@dynamicSQL NVARCHAR(MAX)=''
          ;

-- Initialize the @filePath variable

SELECT
@filePath = 'C:\Users\.....\SourceFile.csv' ; 
-- Extracting the file name. We will use the same variable as TableName of our destination table.
SELECT @tableName = SUBSTRING(RIGHT((@FilePath), CHARINDEX('\',REVERSE(@FilePath))-1),1,LEN(RIGHT((@FilePath), CHARINDEX('\',REVERSE(@FilePath))-1))-4) ;

-- We create a temporary table with a single column
IF OBJECT_ID('tempdb..#bulkInsert') IS NOT NULL
 DROP TABLE #bulkInsert ;
CREATE TABLE #BulkInsert ( Header NVARCHAR(MAX)) ;

-- We will import only the first record from the .csv
SET @dynamicSQL = 'BULK INSERT #bulkInsert FROM '''+@FilePath+''' WITH (FIRSTROW  = 1, DATAFILETYPE = ''char'',   LASTROW = 1)'

EXEC sp_executesql @dynamicSQL ;

-- Store this comma separed header record in a variable
SELECT @columnList = Header from #bulkInsert

Step 2: Once we have extracted the header, next step is to create a table by splitting the variable storing the header record.

SELECT @dynamicSQL= '
IF OBJECT_ID (''[dbo].['+@tableName +']'',''U'') IS NOT NULL
   DROP TABLE [dbo].['+@tableName +']
                     
CREATE TABLE [dbo].['+@tableName +']
('+CHAR(13)+
 ' ['+REPLACE(@columnList , ',', '] NVARCHAR(4000) '+CHAR(13)+',[')+'] NVARCHAR(4000) '+CHAR(13)+')'

EXEC sp_executesql @dynamicSQL ;

Step 3: Once we have the table created, then we can use BULK INSERT again, only this time we import all records skipping the header.

SET @dynamicSQL ='
BULK INSERT [dbo].['+@tableName+']
FROM '''+@FilePath+'''
WITH
  ( 
    FIRSTROW  = 2,
    DATAFILETYPE = ''char'',
    FIELDTERMINATOR = '','',
    ROWTERMINATOR = ''\n'',
    TABLOCK
  );
 '
EXEC sp_executesql @dynamicSQL ;

Step 4: Lets check the data.
SET @dynamicSQL =' SELECT '+@columnList+' FROM [dbo].['+@tableName+'] ;'

EXEC sp_executesql @dynamicSQL;



Note:
 Often when we are in development environment and working with dynamic SQL, we can print the dynamic SQL variable which helps us debug the code if needed.

Below are the dynamic SQLs generated in our code above.


That’s it ! We have completed our task. This code can be used along while looping through many .csv files and import data from them into destination tables which are created on the fly.

Comments

Popular posts from this blog

Unforeseen Issue during SAP BW Data Flow Migration

Simple Program to Check Active Requests in ODSOs / ADSOs

Simplification List for SAP BW/4HANA (SAP Release 18th June’ 2018)