Import T-SQL vs SSIS for large data

Import 1,000,000 Rows with SQL Server Integration Services

  1. In this fist example, we are going to import a million rows from Excel to SQL Server.
  2. In the SSDT, create a new SSIS project.
  3. Drag and drop the Data Flow Task to the design pane.


  4. Drag and drop the Data Flow Task to the design pane


  5. Double click on the task and drag and drop the Excel source and the SQL Server Destination and join the tasks.


  6. Double click on the task and drag and drop the Excel source and the SQL Server Destination and join the tasks


  7. In the Excel Source add a new connection to the Excel file.


  8. In the Excel Source add a new connection to the Excel file


  9. Double click on the SQL Destination Task and select a SQL connection and in the use a table or view, press the new button to create a new table. Click on the Mappings page to match columns.


  10. Double click on the SQL Destination Task and select a SQL connection and in the use a table or view, press the new button to create a new table


  11. If you run the package, you will notice that only 65,535 rows were copied.


  12. If you run the package, you will notice that only 65,535 rows were copied


  13. It took just 5 seconds, but we need to import a million rows!


  14. It took just 5 seconds, but we need to import a million rows


  15. The easiest way to solve this problem is to save the Excel file as a CSV file.


  16. The easiest way to solve this problem is to save the Excel file as a CSV file


  17. Let's remove the other tasks and add the Flat File Source and a SQL Server Destination.


  18. Let's remove the other tasks and add the Flat File Source and a SQL Server Destination


  19. Press the browse button and select the CSV file.


  20. Press the browse button and select the csv file


  21. Go to the columns page to match the columns.


  22. Go to the columns page to match the columns


  23. If you run the package, you will see that the million rows were imported to SQL Server successfully.


  24. If you run the package, you will see that the million rows were imported to SQL Server successfully


  25. If we check the time in the progress, we will notice that it took 15.959 seconds.


  26. If we check the time in the progress, we will notice that it took 15 seconds almost 16 seconds

Import 1,000,000 Rows with T-SQL

  1. Let's try the same with T-SQL. We are going to import the data using Linked Servers.
  2. If you are not familiar with Linked servers, we strongly recommend you to check our tip about Linked Servers for Microsoft Excel.
  3. The following code will create the Linked Server:

  4. EXEC sp_addlinkedserver 
    @server = N'ExcelDataSource', 
    @srvproduct=N'ExcelData', 
    @provider=N'Microsoft.ACE.OLEDB.12.0', 
    @datasrc=N'C:\scripts\Excelfile.xlsx',
    @provstr=N'EXCEL 12.0' ;

  5. Try to expand the tables of the Linked Server:


  6. Try to expand the tables of the Linked Server


  7. A typical error is this: The OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "ExcelDataSource" reported and error. Access denied. Cannot obtain the required interface ("IID_IDBSchemaRowset") from OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "ExcelDataSource". (Microsoft SQL Server, Error: 7399)


  8. The OLE DB provider Microsoft.ACE.OLEDB.12.0 for linked server ExcelDataSource reported and error. Access denied. Cannot obtain the required interface


  9. One of the solutions is to install the following driver for MS Office. By default, this driver is not installed. You will need to restart the machine after installing this driver.
  10. If that does not work, run the following commands. The AllowInProcess is used to run the process as a SQL Server Process and the DynamicParameter option is used to run parameterized queries.

      USE[master]
      GO
      EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0',N'AllowInProcess',1
      GO
      EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0',N'DynamicParameters',1
      GO
     
  11. Also, make sure that the query is running under an account with access to the Excel file.
  12. Once the Linked Server is working, run the following query to create a SQL Server table named MylinkedExcel.


  13. SELECT [BusinessEntityID]
    ,[PersonType]
    ,[NameStyle]
    ,[Title]
    ,[FirstName]
    ,[MiddleName]
    ,[LastName]
    INTO MylinkedExcel
    FROM [ExcelDataSource]...[Sheet1$]

  14. If we measure the time, we will notice that it takes more than 7 minutes and 47 seconds!


  15. If we measure the time, we will notice that it takes more than 7 minutes and 47 seconds


  16. OK, now we know that using a Linked Server to import data from Excel is very slow. We are going to use the CSV we created above and then use the Bulk Insert command.
  17. To use the Bulk Insert command we will need to first create the table manually:


  18. CREATE TABLE [dbo].[excelcustomer2]( [BusinessEntityID] [varchar](50) NULL,
     [PersonType] [varchar](50) NULL,
     [NameStyle] [varchar](50) NULL,
     [Title] [varchar](50) NULL,
     [FirstName] [varchar](50) NULL,
     [MiddleName] [varchar](50) NULL,
     [LastName] [varchar](50) NULL,
     [Column 7] [varchar](50) NULL,
     [Column 8] [varchar](50) NULL,
     [Column 9] [varchar](50) NULL
    ) ON [PRIMARY]
    
    GO;
  19. This code will import the data from the CSV file to the table we just created.


  20. BULK
    INSERT excelcustomer2
    FROM 'c:\scripts\excelfile.csv'
    WITH
    (
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n'
    )
    GO;
  21. If we check the time, we will notice that it took 31 seconds to import the data.


  22. Executing the package

To summarize, SSIS has an Excel limitation of 65535 rows. We had to work with the flat file component and convert from Excel to a CSV file to import a million rows.

On the other hand, T-SQL had several problems to access the Linked Server file from Excel, but finally when it was working it took more than 7 minutes to import the data. We then tried again with the CSV file and used the Bulk Insert command to improve the performance.
the table summarizes the results:
SolutionTime
SSIS using the Excel component5 seconds, but it is limited to only 65535 rows
SSIS converting the Excel to CSV and using the flat file component16 seconds
Linked Server to Excel7 minutes, 47 seconds (but it worked)
The Bulk Insert statement converting the Excel file to a CSV file31 seconds
As you can see, SSIS has very fast tools to import and export files. It also has the advantage that the destination table is created automatically while the Bulk Insert statement requires creating the table manually. In this example SSIS is the winner. 

Comments