Sunday, February 12, 2012

Best way to do batch inserts

All -
I've read up quite a bit on simply doing batch inserts where the scenario
involves inserting a good number of records (maybe a 1000?) at single time.
I definitely do not want to use bulk inserts, so let's rule that out
immediately. It is also my opinion that is not efficient to pass the
"values" as string (or whatever) and have a loop in a stored proc that
parses the string an the insert repeatedly (if anyone disagrees with this
statement, please let me know).
Having concluded the above, it leaves me with two options:
1.Concate a string of insert/values statements
2.Concate a string of insert/select/union all statements where the
"select/union all" represents my values.
First of all does anyone have any metrics as to which of the approaches
above peforms better?
Secondly, in a multi-user, concurrent environment, which is better?
a)Putting all inserts in a single transaction
b)Putting each insert in a single transaction
I understand that there is a lot more I/O with option (b), but I'm of the
opinion that option (a) can lead to table locking etc. Which should I worry
about more?
For now, these are my questions. I'm not looking for the magic bullet here,
I will do some testing myself, but I wanted to poll the minds of the smart
folk who read these posts to see if anyone had already done this or had
opinions
Thanks.
iceA 1000 rows at a time is not much of a challenge. I don't understand why
you've ruled out bulk inserts. They can be very fast. If you don't want to
insert your data directly into the target tables via bulk insert, you could
load staging tables that way and then transfer over via SQL.
As for larger inserts, it boils down to how much concurrency do you need?
If you can take some downtime, then it is faster to do things in a single
chunk (some exceptions do apply). However, if you can't afford to be down
for too long, then you may have to do a bunch of min-transactions with, say,
100 - 1000 rows at a time.
HTH
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com
.
"Ice" <ice@.nospam.com> wrote in message
news:uR0lcN9JFHA.3500@.TK2MSFTNGP14.phx.gbl...
All -
I've read up quite a bit on simply doing batch inserts where the scenario
involves inserting a good number of records (maybe a 1000?) at single time.
I definitely do not want to use bulk inserts, so let's rule that out
immediately. It is also my opinion that is not efficient to pass the
"values" as string (or whatever) and have a loop in a stored proc that
parses the string an the insert repeatedly (if anyone disagrees with this
statement, please let me know).
Having concluded the above, it leaves me with two options:
1.Concate a string of insert/values statements
2.Concate a string of insert/select/union all statements where the
"select/union all" represents my values.
First of all does anyone have any metrics as to which of the approaches
above peforms better?
Secondly, in a multi-user, concurrent environment, which is better?
a)Putting all inserts in a single transaction
b)Putting each insert in a single transaction
I understand that there is a lot more I/O with option (b), but I'm of the
opinion that option (a) can lead to table locking etc. Which should I worry
about more?
For now, these are my questions. I'm not looking for the magic bullet here,
I will do some testing myself, but I wanted to poll the minds of the smart
folk who read these posts to see if anyone had already done this or had
opinions
Thanks.
ice|||Thanks for your quick reply.
I know that "Bulk Inserts" will get this done fast. But I'm looking for a
generic solution that would work for different databases without having to
use their specific implementations to solve the problem. I think using the
"staging" idea in this scenario is probably adding more work than might be
necessary. When you say "via SQL" what are you suggesting I use?
With regards to concurrency and downtime, I want to operate under assumption
that there will be no downtime. So based on that, it would seem that you
are suggesting that inserting ~1000 records in a single transaction is
better than ~1000 individual transactions. I am correct with this
deduction?
ice
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:#4LF2W9JFHA.1280@.TK2MSFTNGP09.phx.gbl...
> A 1000 rows at a time is not much of a challenge. I don't understand why
> you've ruled out bulk inserts. They can be very fast. If you don't want
to
> insert your data directly into the target tables via bulk insert, you
could
> load staging tables that way and then transfer over via SQL.
> As for larger inserts, it boils down to how much concurrency do you need?
> If you can take some downtime, then it is faster to do things in a single
> chunk (some exceptions do apply). However, if you can't afford to be down
> for too long, then you may have to do a bunch of min-transactions with,
say,
> 100 - 1000 rows at a time.
> HTH
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON Canada
> www.pinnaclepublishing.com
> .
> "Ice" <ice@.nospam.com> wrote in message
> news:uR0lcN9JFHA.3500@.TK2MSFTNGP14.phx.gbl...
> All -
> I've read up quite a bit on simply doing batch inserts where the scenario
> involves inserting a good number of records (maybe a 1000?) at single
time.
> I definitely do not want to use bulk inserts, so let's rule that out
> immediately. It is also my opinion that is not efficient to pass the
> "values" as string (or whatever) and have a loop in a stored proc that
> parses the string an the insert repeatedly (if anyone disagrees with this
> statement, please let me know).
> Having concluded the above, it leaves me with two options:
> 1.Concate a string of insert/values statements
> 2.Concate a string of insert/select/union all statements where the
> "select/union all" represents my values.
> First of all does anyone have any metrics as to which of the approaches
> above peforms better?
> Secondly, in a multi-user, concurrent environment, which is better?
> a)Putting all inserts in a single transaction
> b)Putting each insert in a single transaction
> I understand that there is a lot more I/O with option (b), but I'm of the
> opinion that option (a) can lead to table locking etc. Which should I
worry
> about more?
> For now, these are my questions. I'm not looking for the magic bullet
here,
> I will do some testing myself, but I wanted to poll the minds of the smart
> folk who read these posts to see if anyone had already done this or had
> opinions
> Thanks.
> ice
>|||> First of all does anyone have any metrics as to which of the approaches
> above peforms better?
To add to Tom's response, I ran a quick test from Query Analyzer and found
the separate inserts outperformed the UNION ALL method when SET NOCOUNT ON
was specified.
Hope this helps.
Dan Guzman
SQL Server MVP
"Ice" <ice@.nospam.com> wrote in message
news:uR0lcN9JFHA.3500@.TK2MSFTNGP14.phx.gbl...
> All -
> I've read up quite a bit on simply doing batch inserts where the scenario
> involves inserting a good number of records (maybe a 1000?) at single
> time.
> I definitely do not want to use bulk inserts, so let's rule that out
> immediately. It is also my opinion that is not efficient to pass the
> "values" as string (or whatever) and have a loop in a stored proc that
> parses the string an the insert repeatedly (if anyone disagrees with this
> statement, please let me know).
> Having concluded the above, it leaves me with two options:
> 1.Concate a string of insert/values statements
> 2.Concate a string of insert/select/union all statements where the
> "select/union all" represents my values.
> First of all does anyone have any metrics as to which of the approaches
> above peforms better?
> Secondly, in a multi-user, concurrent environment, which is better?
> a)Putting all inserts in a single transaction
> b)Putting each insert in a single transaction
> I understand that there is a lot more I/O with option (b), but I'm of the
> opinion that option (a) can lead to table locking etc. Which should I
> worry
> about more?
> For now, these are my questions. I'm not looking for the magic bullet
> here,
> I will do some testing myself, but I wanted to poll the minds of the smart
> folk who read these posts to see if anyone had already done this or had
> opinions
> Thanks.
> ice
>|||Doing 1000 individual transaction will take longer end-to-end. However,
each little insert is likely to take not much time by itself. A 1000 row
insert would take longer than a single-row insert. However, a 1000 row
insert should be of the order of a second or two. If you want to insert,
say, a million rows, then you may want to do that in 1000 row batches.
Using SQL, you would be doing an INSERT SELECT from one table into another.
How are these data arriving into your system - DTS, Linked server, flat
file?
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com
.
"Ice" <ice@.nospam.com> wrote in message
news:%23MY27B%23JFHA.484@.TK2MSFTNGP15.phx.gbl...
Thanks for your quick reply.
I know that "Bulk Inserts" will get this done fast. But I'm looking for a
generic solution that would work for different databases without having to
use their specific implementations to solve the problem. I think using the
"staging" idea in this scenario is probably adding more work than might be
necessary. When you say "via SQL" what are you suggesting I use?
With regards to concurrency and downtime, I want to operate under assumption
that there will be no downtime. So based on that, it would seem that you
are suggesting that inserting ~1000 records in a single transaction is
better than ~1000 individual transactions. I am correct with this
deduction?
ice
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:#4LF2W9JFHA.1280@.TK2MSFTNGP09.phx.gbl...
> A 1000 rows at a time is not much of a challenge. I don't understand why
> you've ruled out bulk inserts. They can be very fast. If you don't want
to
> insert your data directly into the target tables via bulk insert, you
could
> load staging tables that way and then transfer over via SQL.
> As for larger inserts, it boils down to how much concurrency do you need?
> If you can take some downtime, then it is faster to do things in a single
> chunk (some exceptions do apply). However, if you can't afford to be down
> for too long, then you may have to do a bunch of min-transactions with,
say,
> 100 - 1000 rows at a time.
> HTH
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON Canada
> www.pinnaclepublishing.com
> .
> "Ice" <ice@.nospam.com> wrote in message
> news:uR0lcN9JFHA.3500@.TK2MSFTNGP14.phx.gbl...
> All -
> I've read up quite a bit on simply doing batch inserts where the scenario
> involves inserting a good number of records (maybe a 1000?) at single
time.
> I definitely do not want to use bulk inserts, so let's rule that out
> immediately. It is also my opinion that is not efficient to pass the
> "values" as string (or whatever) and have a loop in a stored proc that
> parses the string an the insert repeatedly (if anyone disagrees with this
> statement, please let me know).
> Having concluded the above, it leaves me with two options:
> 1.Concate a string of insert/values statements
> 2.Concate a string of insert/select/union all statements where the
> "select/union all" represents my values.
> First of all does anyone have any metrics as to which of the approaches
> above peforms better?
> Secondly, in a multi-user, concurrent environment, which is better?
> a)Putting all inserts in a single transaction
> b)Putting each insert in a single transaction
> I understand that there is a lot more I/O with option (b), but I'm of the
> opinion that option (a) can lead to table locking etc. Which should I
worry
> about more?
> For now, these are my questions. I'm not looking for the magic bullet
here,
> I will do some testing myself, but I wanted to poll the minds of the smart
> folk who read these posts to see if anyone had already done this or had
> opinions
> Thanks.
> ice
>|||Ok, thanks for the information.
I thought about the INSERT SELECT option, but I don't think it'll get me
much.
I will do some additional testing.
"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
news:uLpaQFAKFHA.3340@.TK2MSFTNGP14.phx.gbl...
> Doing 1000 individual transaction will take longer end-to-end. However,
> each little insert is likely to take not much time by itself. A 1000 row
> insert would take longer than a single-row insert. However, a 1000 row
> insert should be of the order of a second or two. If you want to insert,
> say, a million rows, then you may want to do that in 1000 row batches.
> Using SQL, you would be doing an INSERT SELECT from one table into
another.
> How are these data arriving into your system - DTS, Linked server, flat
> file?
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON Canada
> www.pinnaclepublishing.com
> .
> "Ice" <ice@.nospam.com> wrote in message
> news:%23MY27B%23JFHA.484@.TK2MSFTNGP15.phx.gbl...
> Thanks for your quick reply.
> I know that "Bulk Inserts" will get this done fast. But I'm looking for a
> generic solution that would work for different databases without having to
> use their specific implementations to solve the problem. I think using
the
> "staging" idea in this scenario is probably adding more work than might be
> necessary. When you say "via SQL" what are you suggesting I use?
> With regards to concurrency and downtime, I want to operate under
assumption
> that there will be no downtime. So based on that, it would seem that you
> are suggesting that inserting ~1000 records in a single transaction is
> better than ~1000 individual transactions. I am correct with this
> deduction?
> ice
> "Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
> news:#4LF2W9JFHA.1280@.TK2MSFTNGP09.phx.gbl...
why
want
> to
> could
need?
single
down
> say,
scenario
> time.
this
the
> worry
> here,
smart
>|||Interesting. Were all the inserts in one transaction or were you setting
"nocount" per individual insert?
ice
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:#orP1F#JFHA.2764@.tk2msftngp13.phx.gbl...
> To add to Tom's response, I ran a quick test from Query Analyzer and found
> the separate inserts outperformed the UNION ALL method when SET NOCOUNT ON
> was specified.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Ice" <ice@.nospam.com> wrote in message
> news:uR0lcN9JFHA.3500@.TK2MSFTNGP14.phx.gbl...
scenario
this
the
smart
>|||The inserts were done as individual statements/transactions. The single
batch was 1000 insert statements was a script like:
SET NOCOUNT ON
INSERT INTO MyTable VALUES(1)
INSERT INTO MyTable VALUES(2)
--etc.
The UNION ALL script as basically:
SET NOCOUNT ON
INSERT INTO MyTable
SELECT 1
UNION ALL SELECT 2
--etc.
I ran this on my home PC so the write caching IDE controller mitigates the
effect of multiple transactions and log I/O. I'd expect a single
transaction to be a little faster.
Hope this helps.
Dan Guzman
SQL Server MVP
"Ice" <ice@.nospam.com> wrote in message
news:%23QENHJBKFHA.1476@.TK2MSFTNGP09.phx.gbl...
> Interesting. Were all the inserts in one transaction or were you setting
> "nocount" per individual insert?
> ice
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:#orP1F#JFHA.2764@.tk2msftngp13.phx.gbl...
> scenario
> this
> the
> smart
>|||Is "SET NOCOUNT" a connection specific setting? How If i send 10 different
individual statements using ADO.NET and I set "no count" on before send the
1st of the 10, is it on for all 10? Any ideas?
ice
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:u2c#OWBKFHA.2752@.TK2MSFTNGP10.phx.gbl...
> The inserts were done as individual statements/transactions. The single
> batch was 1000 insert statements was a script like:
> SET NOCOUNT ON
> INSERT INTO MyTable VALUES(1)
> INSERT INTO MyTable VALUES(2)
> --etc.
> The UNION ALL script as basically:
> SET NOCOUNT ON
> INSERT INTO MyTable
> SELECT 1
> UNION ALL SELECT 2
> --etc.
> I ran this on my home PC so the write caching IDE controller mitigates the
> effect of multiple transactions and log I/O. I'd expect a single
> transaction to be a little faster.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Ice" <ice@.nospam.com> wrote in message
> news:%23QENHJBKFHA.1476@.TK2MSFTNGP09.phx.gbl...
setting
approaches
that
approaches
I
bullet
had
>|||ignore my last post, just found some documentation on this in the framework
SDK.
ice
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:u2c#OWBKFHA.2752@.TK2MSFTNGP10.phx.gbl...
> The inserts were done as individual statements/transactions. The single
> batch was 1000 insert statements was a script like:
> SET NOCOUNT ON
> INSERT INTO MyTable VALUES(1)
> INSERT INTO MyTable VALUES(2)
> --etc.
> The UNION ALL script as basically:
> SET NOCOUNT ON
> INSERT INTO MyTable
> SELECT 1
> UNION ALL SELECT 2
> --etc.
> I ran this on my home PC so the write caching IDE controller mitigates the
> effect of multiple transactions and log I/O. I'd expect a single
> transaction to be a little faster.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Ice" <ice@.nospam.com> wrote in message
> news:%23QENHJBKFHA.1476@.TK2MSFTNGP09.phx.gbl...
setting
approaches
that
approaches
I
bullet
had
>

No comments:

Post a Comment