c# - How can I insert 10 million records in the shortest time possible? -
i have file (which has 10 million records) below:
line1 line2 line3 line4 ....... ...... 10 million lines
so want insert 10 million records database. read file , upload sql server.
c# code
system.io.streamreader file = new system.io.streamreader(@"c:\test.txt"); while((line = file.readline()) != null) { // insertion code goes here //dal.executesql("insert table1 values("+line+")"); } file.close();
but insertion take long time. how can insert 10 million records in shortest time possible using c#?
update 1:
bulk insert:
bulk insert dbname.dbo.datas 'f:\dt10000000\dt10000000.txt' ( rowterminator =' \n' );
my table below:
datas ( datasfield varchar(max) )
but getting following error:
msg 4866, level 16, state 1, line 1
bulk load failed. column long in data file row 1, column 1. verify field terminator , row terminator specified correctly.msg 7399, level 16, state 1, line 1
ole db provider "bulk" linked server "(null)" reported error. provider did not give information error.msg 7330, level 16, state 2, line 1
cannot fetch row ole db provider "bulk" linked server "(null)".
below code worked:
bulk insert dbname.dbo.datas 'f:\dt10000000\dt10000000.txt' ( fieldterminator = '\t', rowterminator = '\n' );
please not create datatable
load via bulkcopy. ok solution smaller sets of data, there absolutely no reason load 10 million rows memory before calling database.
your best bet (outside of bcp
/ bulk insert
/ openrowset(bulk...)
) stream contents file database via table-valued parameter (tvp). using tvp can open file, read row & send row until done, , close file. method has memory footprint of single row. wrote article, streaming data sql server 2008 application, has example of scenario.
a simplistic overview of structure follows. assuming same import table , field name shown in question above.
required database objects:
-- first: need user-defined table type create type importstructure table (field varchar(max)); go -- second: use udtt input param import proc. -- hence "tabled-valued parameter" (tvp) create procedure dbo.importdata ( @importtable dbo.importstructure readonly ) set nocount on; -- maybe clear out table first? truncate table dbo.datas; insert dbo.datas (datasfield) select field @importtable; go
c# app code make use of above sql objects below. notice how rather filling object (e.g. datatable) , executing stored procedure, in method executing of stored procedure initiates reading of file contents. input parameter of stored proc isn't variable; return value of method, getfilecontents
. method called when sqlcommand
calls executenonquery
, opens file, reads row , sends row sql server via ienumerable<sqldatarecord>
, yield return
constructs, , closes file. stored procedure sees table variable, @importtable, can access data starts coming on (note: data persist short time, if not full contents, in tempdb).
using system.collections; using system.data; using system.data.sqlclient; using system.io; using microsoft.sqlserver.server; private static ienumerable<sqldatarecord> getfilecontents() { sqlmetadata[] _tvpschema = new sqlmetadata[] { new sqlmetadata("field", sqldbtype.varchar, sqlmetadata.max) }; sqldatarecord _datarecord = new sqldatarecord(_tvpschema); streamreader _filereader = null; try { _filereader = new streamreader("{filepath}"); // read row, send row while (!_filereader.endofstream) { // shouldn't need call "_datarecord = new sqldatarecord" // sql server received row when "yield return" called. // unlike bcp , bulk insert, have option here create string // call readline() string, manipulation(s) / validation(s) on // string, pass string setstring() or discard if invalid. _datarecord.setstring(0, _filereader.readline()); yield return _datarecord; } } { _filereader.close(); } }
the getfilecontents
method above used input parameter value stored procedure shown below:
public static void test() { sqlconnection _connection = new sqlconnection("{connection string}"); sqlcommand _command = new sqlcommand("importdata", _connection); _command.commandtype = commandtype.storedprocedure; sqlparameter _tvparam = new sqlparameter(); _tvparam.parametername = "@importtable"; _tvparam.typename = "dbo.importstructure"; _tvparam.sqldbtype = sqldbtype.structured; _tvparam.value = getfilecontents(); // return value of method streamed data _command.parameters.add(_tvparam); try { _connection.open(); _command.executenonquery(); } { _connection.close(); } return; }
additional notes:
- with modification, above c# code can adapted batch data in.
- with minor modification, above c# code can adapted send in multiple fields (the example shown in "steaming data..." article linked above passes in 2 fields).
- you can manipulate value of each record in
select
statement in proc. - you can filter out rows using condition in proc.
- you can access tvp table variable multiple times; readonly not "forward only".
- advantages on
sqlbulkcopy
:sqlbulkcopy
insert-only whereas using tvp allows data used in fashion: can callmerge
; candelete
based on condition; can split data multiple tables; , on.- due tvp not being insert-only, don't need separate staging table dump data into.
- you can data database calling
executereader
instead ofexecutenonquery
. example, if thereidentity
field ondatas
import table, addoutput
clauseinsert
passinserted.[id]
(assumingid
name ofidentity
field). or can pass results of different query, or both since multiple results sets can sent , accessed viareader.nextresult()
. getting info database not possible when usingsqlbulkcopy
yet there several questions here on s.o. of people wanting (at least regards newly createdidentity
values). - for more info on why faster overall process, if slower on getting data disk sql server, please see whitepaper sql server customer advisory team: maximizing throughput tvp
Comments
Post a Comment