Use your text / CSV files in your queries via OPENROWSET

This article has been reviewed, improved and moved to my new blog http://blog.hubacek.uk/use-your-text-csv-files-in-your-queries-via-openrowset/

On Friday I was asked by my friend whether I could help him with text file access via OPENROWSET with Ad Hoc Distributed Queries. So I stopped by to help. His first thoughts were to use MSDASQL …

select *
from openrowset('MSDASQL'
               ,'Driver={Microsoft Text Driver (*.txt; *.csv)}'
               ,'select * from C:\Temp\test1.CSV')

but he was stopped by this weird error message:

Msg 7399, Level 16, State 1, Line 1

The OLE DB provider “MSDASQL” for linked server “(null)” reported an error. The provider did not give any information about the error.

Msg 7303, Level 16, State 1, Line 1

Cannot initialize the data source object of OLE DB provider “MSDASQL” for linked server “(null)”.

Do you know why? Anyone?

I wasn’t able to find out what we were missing. I didn’t even had to try reproduce this error in my own instance it came instantly so I googled Binged (sorry! J). Among other forum posts there were ones which said that MSDASQL is not reliable at all and this error ‘just happens’.

Workaround – better solution

I thought whether I could use some kind bulk operation on this. Luckily yes! Here comes the query

SELECT  *
FROM    OPENROWSET(BULK 'c:\temp\test.txt', FORMATFILE= 'c:\temp\test.xml') AS a
WHERE   col0 LIKE N'%doe%'

C:\temp\test.txt = my file for import

C:\temp\test.xml = XML schema Format file for test.txt

test.txt content:

test_content_col0;test_contenct_col1
john doe;male
jane doe;female

You might wonder what FORMATFILE is for. It’s schema of your data for BULK operation. It has to be used when number of columns in data file differs from target table, columns are in different order, custom delimiter is used or this case – when you use BULK within OPENROWSET.

You can write your own schema ‘by hand’ based on example bellow of my dataset which has two columns named col0 and col1 both nvarchar(255):

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=";" MAX_LENGTH="510" COLLATION="Czech_100_CI_AS"/>
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="510" COLLATION="Czech_100_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="col0" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="2" NAME="col1" xsi:type="SQLNVARCHAR"/>
 </ROW>
</BCPFORMAT>

Notice MAX_LENGTH is length in Bytes.

Or you can just design (use) your (existing) table in SQL Server Management Studio

..and create schema based on the table with BCP command utility.

About BCP

Bulk Copy program (BCP) is a command line utility used to import/export data. BCP became available in Microsoft SQL Server 6.0 in which it was the tool of choice for import/export of data. With Microsoft SQL Server 7.0 came first version of Data Transformation Services (DTS) since SQL 2005 Integration Services (SSIS) which offered much more flexibility and comfort then BCP. Nowadays BCP is used for large dataset loads (by default BCP ignores constraints and does not fire triggers which makes it even more faster) or just loads which does not require additional data transformation. BCP is also able to create a Format file for BULK INSERT operations. For more information check BOL, bcp Utility article.

To create format file open cmd shell and execute following script:

bcp Test.dbo.import format nul -c -x -f import.xml -t; -T

Test.dbo.import – schema will be created based on table import at dbo schema at Test database

Format nul -f -x – specifies the XML format file, for SQL 2000 omit the –x switch and replace xml extension with fmt – the utility will create schema in native format

-c – specifies character data

Import.xml – output file (will be placed to your command prompt actual folder (by default when you open CMD c:\users\USERSHOMEFOLDER)

-t; – specifies that semicolon is used as column delimiter

-T – specifies trusted connection to SQL Server using integrated security. Optionally you cas use –U and –P switch to log in with SQL account.

Happy importing!

About these ads

About Jiří Hubáček

Experienced consultant and blogger for Microsoft SQL Server and related products. Likes cats, workouts and delicious food. Enjoys sailing, traveling, time spent with wife and friends.
This entry was posted in Microsoft SQL Server and tagged , , , , . Bookmark the permalink.

One Response to Use your text / CSV files in your queries via OPENROWSET

  1. Heather says:

    This amazing blog, “Use your text / CSV files in your queries via OPENROWSET | SQL Mate Blog” demonstrates that u actually know everything that
    you are writing about! I actually fully approve. Thanks a lot -Ute

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s