Tuesday, March 27, 2012
Binding Different DataSets to One Table.
, I also have a Dset2 which i am not binding. Now as the Rows gets
rendered depending on Dset1 i also want to populate a value from Dset2
(2nd Datset ) in a cell for a particular Row.
How do i do that?
Any help.
ThanksYou cannot bind a data region (in this case a table) to more than one
dataset. You should re-design the report so that it has one dataset that has
a query returning both the data currently in Dset1 and in Dset2. This will
most likely mean writing a JOIN SQL query.
Charles Kangai, MCDBA, MCT
"Rahul" wrote:
> I have a situation where i am binding a Dataset Dset1 to a table.
> , I also have a Dset2 which i am not binding. Now as the Rows gets
> rendered depending on Dset1 i also want to populate a value from Dset2
> (2nd Datset ) in a cell for a particular Row.
>
> How do i do that?
> Any help.
>
> Thanks
>|||That is easier to to in a store proc, identifying to temp tables and
populating your row in there - and THEN returning one data-set back to the
report server.
"Rahul" wrote:
> I have a situation where i am binding a Dataset Dset1 to a table.
> , I also have a Dset2 which i am not binding. Now as the Rows gets
> rendered depending on Dset1 i also want to populate a value from Dset2
> (2nd Datset ) in a cell for a particular Row.
>
> How do i do that?
> Any help.
>
> Thanks
>
Thursday, March 22, 2012
binary checksum
Can anyone provide me with the syntax for comparing rows of two tables using binary checksum? The tables A and B have 8 & 9 columns respectively. The PK in both cases is Col1 & Col2. I want checksum on Columns 1 to 8.
ThanksTry this draft:
drop table test
drop table test2
go
create table test(id int, col1 int,col2 varchar(5),col3 datetime)
create table test2(id int, col1 int,col2 varchar(5),col3 datetime)
go
insert test values(1,1,'a','02/03/2004')
insert test values(2,2,'b','02/04/2004')
insert test values(3,3,'c','02/05/2004')
insert test values(4,4,'d','02/06/2004')
insert test2 values(1,1,'a','02/03/2004')
insert test2 values(2,2,'b','02/04/2004')
insert test2 values(3,3,'f','02/05/2004')
insert test2 values(4,4,'d','02/01/2004')
go
select t.*
from test t
join test2 t2 on t2.id=t.id
where CHECKSUM(t.col2,t.col3)<>CHECKSUM(t2.col2,t2.col3)|||Hi,
The code doesn't work for this case.
drop table test
drop table test2
go
create table test(id int, col1 int,col2 varchar(5),col3 datetime)
create table test2(id int, col1 int,col2 varchar(5),col3 datetime)
insert test values(4,4,'d','02/06/2004')
insert test values(4,4,'e','02/06/2004')
insert test2 values(4,4,'d','02/06/2004')
insert test2 values(4,4,'e','02/06/2004')
select *
from test
select *
from test2
select t.*
from test t
join test2 t2 on t2.id=t.id
where CHECKSUM(t.col2,t.col3)<>CHECKSUM(t2.col2,t2.col3)
--
All my data is like this. The rows are the same but still checksum selects the rows. Please help.|||drop table test
drop table test2
go
create table test(id int, col1 int,col2 varchar(5),col3 datetime)
create table test2(id int, col1 int,col2 varchar(5),col3 datetime)
insert test values(4,4,'d','02/06/2004')
insert test values(4,4,'e','02/06/2004')
insert test2 values(4,4,'d','02/06/2004')
insert test2 values(4,4,'e','02/06/2004')
select *
from test
select *
from test2
select t.*
from test t
join test2 t2 on t2.col1=t.col1 AND t2.col2=t.col2 -- Join on PK
where BINARY_CHECKSUM(t.id,t.col3) <> BINARY_CHECKSUM(t2.id,t2.col3)
-- A much more complex, but its the way I do it.
SELECT t.*
from test t
JOIN
(
SELECT t1.col1, t1.col2, BINARY_CHECKSUM(*) as bin_ck_sum
from test t1
) AS X1 ON t.col1 = X1.col1 AND t.col2 = X1.col2
JOIN
(
SELECT t2.col1, t2.col2, BINARY_CHECKSUM(*) as bin_ck_sum
from test2 t2
) AS X2 ON t.col1 = X2.col1 AND t.col2 = X2.col2
WHERE X1.bin_ck_sum <> X2.bin_ck_sum|||TimS, your method is the most appropriate when comparing one entire record to another, but vivek_vdc only wants to check on 8 of the 9 columns in one table against the 8 columns in the other table. Binary_Checksum(*) on both tables will not do this comparison.|||--
TimS, your method is the most appropriate when comparing one entire record to another, but vivek_vdc only wants to check on 8 of the 9 columns in one table against the 8 columns in the other table. Binary_Checksum(*) on both tables will not do this comparison.
--
I agree but, I gave him two different solutions.
The Second, is how I would compare tables and it can be adapted to his problem; I have found that the second solution works best for me. Thier is no reason he can't replace the star with what columns he wish to compare.
THE PROBLEM was no one was joining on the PK of the tables.
Tim S|||You have to join on the primary keys change the join to use the columns in your primary key!
select t.*
from test t
-- Join on PK1 & PK2 ( the Primary Key COLUMNS )
join test2 t2 on t2.PK1=t.PK1 AND t2.PK2=t.PK2
where BINARY_CHECKSUM(t.id,t.col3) <> BINARY_CHECKSUM(t2.id,t2.col3)
If you want any more help please response to the newsgoup. Please give your create table and insert data that MATCHES your tables.
Tim S
--Original Message--
From: vivek_vdc
Sent: Monday, February 09, 2004 3:46 PM
To:
Subject: binary checksum post help
TimS - The code that you have given is assuming col1 & col2 are the PK but I have id & Col1 as PK. The code doesn't work in this case. Please advise on how I should proceed?
Wednesday, March 7, 2012
Beware the UNION ALL - it can swallow rows...
Just for all you June CTPers as a warning.. I have a UNION ALL that, at random, decides to swallow row count....
Tried deleting and recreating task
Unable to Repro
Along with the buffer leaks and the looping errors.. bring on the NEXT CTP.. ! ! !
Giri,Can you send me a package that reproduces this issue? We're unaware of any UnionAll issues with CTP15 and I'd love to fix these.
thanks
beware of avg() for large datasets
I have a huge dataset - about 700m rows, one column of which is a tinyint. I needed to take an average of that column, so I did this:
select avg(mytinyint) from mytable
which returned:
Msg 8115, Level 16, State 2, Line 6
Arithmetic overflow error converting expression to data type int.
It turns out, if the sum() over the column is bigger than MAX_INT = 2147483647, then avg() will throw this error. This surprised me a little - I guess I was assuming sql server would use bigint internally when computing integer avg, but it uses int.
so you have to do this:
select avg(cast(mytinyint as bigint)) from mytable
what's annoying to me is, it's impossible that an average of type T could be larger or smaller than the max/min allowed value of a type T. This means sql server devs could have internally always used bigint for sum and count, and then returned cast(sum/count as T) from avg() without possibility of overflow (unless sum or count was bigger than the max bigint! you are on your own if that happens i guess!).
Because of this, you have to cast your column to a bigint to make it work if your dataset is large enough, even if the column is only 1 measly byte! kind of a waste of space if you ask me. :)
here's a little example of what I'm talking about, that doesn't require you to import a 700m row dataset :)
declare @.t table (id int)
insert into @.t select 1 union all select 2147483647
select avg(cast(id as bigint)) from @.t -- works
select avg(id) from @.t -- failshmm...:rolleyes:|||Very interesting. I will check it out tomorrow. Thanks for posting it.
Saturday, February 25, 2012
BETWEEN 20/5/2007 AND 30/5/2007 not returning rows.
SQL Server 2005
I have tried and cannot return any rows either though there are records that should be returned, using the sql below.
Code Snippet
SELECT * FROM Calls WHERE clientName IN ('Bankside Engineering Limited') AND scheduleDate BETWEEN 20/5/2007 AND 30/5/2007
In the database scheduleDate is a dateTime datatype.
In the database I have copied and pasted. 23/05/2007 00:00:00
I tried the following
Code Snippet
SELECT * FROM Calls WHERE clientName IN ('Bankside Engineering Limited') AND scheduleDate BETWEEN '20/5/2007' AND '30/5/2007'
And got an error message:The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.
Many thanks for any help with this,
Steve
First approach doesn't work because 20/5/2007 interpreted as arithmetical expression with value 0. Then with value converted to datetime as 1/1/1901. Try following code:
Code Snippet
declare @.dt datetime
set @.dt = 20/5/2007
select @.dt
You could rewrite second query with explicit convert:
Code Snippet
SELECT * FROM Calls WHERE clientName IN ('Bankside Engineering Limited')
AND scheduleDate BETWEEN convert('20/5/2007',104) AND convert('30/5/2007',104)
Or use
Code Snippet
SET DATEFORMAT dmy
for setting default dateformat
|||That is because the implicit conversion thinks the date is a US date.Do this:
.....BETWEEN CONVERT(datetime,'20/5/2007',103) AND CONVERT(datetime,'30/5/2007',103)
or
SET DATEFORMAT dmy
SELECT * FROM Calls WHERE clientName IN ('Bankside Engineering Limited') AND scheduleDate BETWEEN '20/5/2007' AND '30/5/2007'
|||
steve_1_rm,
Try using the ISO or ISO8601 format (see function CONVERT in BOL) and SQL Server will interprete correctly the datetime string, no matter the language or setting of dateformat.
SELECT *
FROM Calls
WHERE
clientName IN ('Bankside Engineering Limited')
AND scheduleDate BETWEEN '20070527' AND '20070530'
AMB
|||Hello, Thanks for your, help. but the code above did not return the rows i wanted.I have used the following and with yyyymmdd format SELECT * FROM Calls WHERE [Date] BETWEEN '20070521' AND '20070531' This is the actual value i have in the database and this is what i am trying to get out. 23/05/2007 00:00:00 Many thanks Steve|||What do you get when you issue
Code Snippet
SELECT CONVERT(varchar(10), [Date], 103) as [Date], *
FROM Calls
WHERE [Date] BETWEEN '20070521' AND '20070531'
between
Field date is smalldatetime
I have 8 rows with data='06-Aug-2003'
SELECT * FROM TEST
where data between '06-Aug-2003' and '06-Aug-2003'
Results: 0 Rows
SELECT * FROM TEST
where data>='06-Aug-2003' and data<='06-Aug-2003'
Results: 0 Rows
What can I do? (I do not want to use '06-Aug-2003' and '07-Aug-2003')What is the time portion of the field ?
What does this give you
SELECT * FROM TEST
where data between '20030806' and '20030807'
OR
SELECT * FROM TEST
where CONVERT(char(8),data,112) = '20030806'
--
Allan Mitchell (Microsoft SQL Server MVP)
MCSE,MCDBA
www.SQLDTS.com
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org
"Dragos Hilbert" <dragos_hilbert@.yahoo.co.uk> wrote in message
news:eMh$KHAXDHA.652@.TK2MSFTNGP10.phx.gbl...
> SQL Server2000 SP3
> Field date is smalldatetime
> I have 8 rows with data='06-Aug-2003'
> SELECT * FROM TEST
> where data between '06-Aug-2003' and '06-Aug-2003'
> Results: 0 Rows
>
> SELECT * FROM TEST
> where data>='06-Aug-2003' and data<='06-Aug-2003'
> Results: 0 Rows
> What can I do? (I do not want to use '06-Aug-2003' and '07-Aug-2003')
>
>|||> SELECT * FROM TEST
> where data>='06-Aug-2003' and data<='06-Aug-2003'
SELECT * FROM test
WHERE data>='20030806' and data<'20030807'
(a) between can be ambiguous, I tend to avoid it for datetime queries.
(b) since datetime columns have time also, the above is converted to
midnight, so you won't likely have rows where "data" is between a day at
midnight and the same day at midnight.
(c) never use SELECT * in production code.
(d) consider changing the column name. "data" is not very descriptive at
all; I would never guess that it is a datetime column...
Better way to do a large delete?
delete about 4 million rows, based on the date in another table
Table Session
SessionID SessionDate
1 1/25/2004
2 7/25/2001
I have created a view (SessionView) containing all sessionid's where the
sessiondate less than 3 years old.
(Create view SessionView As Select SessionID From Session Where
Session.sessiondate >= getdate() - 1095)
Table Claims
ClaimID SessionID etc etc
etc
24578 1 15
HealthPractice 5554441234
6548975 2 52 ChirocPract
5551234567
What I need is to delete all rows from Table Claims that have sessionids in
SessionView.
Delete From Claims Where Claims.sessionID In (Select SessionID from
SessionView)
-In this case ClaimID 6548975 would be deleted but ClaimID 24578 would not.
But I must be doing something wrong (I am very new at SQL Server, and have
never done a large scale delete that wasn't just straight forward), because
the delete seems to take forever. Last time I had to stop it at just over 5
hours because the network admin needed to do some work on the server and
reboot it.
I have set the recovery to simple, and the server has no other active
databases or other applications running on it.
Any assistance appreciated.
Nancy Lytle
N_Lytle@.terplaum.umd.eduyou could also copy the rows you want to keep to another table, drop
the orginal one, and rename the copy|||nancy,
try deleting in smaller chunks, say 1000 rows each. this should help.
dean
"Nancy Lytle" <word_diva@.hotmail.com> wrote in message
news:er8ntEsIGHA.2668@.tk2msftngp13.phx.gbl...
> I have to do a delete from a table with about 25 million rows. I need to
> delete about 4 million rows, based on the date in another table
> Table Session
> SessionID SessionDate
> 1 1/25/2004
> 2 7/25/2001
> I have created a view (SessionView) containing all sessionid's where the
> sessiondate less than 3 years old.
> (Create view SessionView As Select SessionID From Session Where
> Session.sessiondate >= getdate() - 1095)
> Table Claims
> ClaimID SessionID etc etc etc
> 24578 1 15 HealthPractice
> 5554441234
> 6548975 2 52
> ChirocPract 5551234567
> What I need is to delete all rows from Table Claims that have sessionids
> in SessionView.
> Delete From Claims Where Claims.sessionID In (Select SessionID from
> SessionView)
> -In this case ClaimID 6548975 would be deleted but ClaimID 24578 would
> not.
> But I must be doing something wrong (I am very new at SQL Server, and have
> never done a large scale delete that wasn't just straight forward),
> because the delete seems to take forever. Last time I had to stop it at
> just over 5 hours because the network admin needed to do some work on the
> server and reboot it.
> I have set the recovery to simple, and the server has no other active
> databases or other applications running on it.
> Any assistance appreciated.
> Nancy Lytle
> N_Lytle@.terplaum.umd.edu
>|||Another thing could be to drop all indexes, delete your rows, recreate
the indexes.
Its all about how much you can do with the table. Do you have a
maintenance window where no one are using the database or is it always
online?|||Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.programming:580414
Nancy Lytle (word_diva@.hotmail.com) writes:
> I have created a view (SessionView) containing all sessionid's where the
> sessiondate less than 3 years old.
> (Create view SessionView As Select SessionID From Session Where
> Session.sessiondate >= getdate() - 1095)
>...
> What I need is to delete all rows from Table Claims that have sessionids
> in SessionView.
> Delete From Claims Where Claims.sessionID In (Select SessionID from
> SessionView)
> -In this case ClaimID 6548975 would be deleted but ClaimID 24578 would
> not.
> But I must be doing something wrong (I am very new at SQL Server, and
> have never done a large scale delete that wasn't just straight forward),
> because the delete seems to take forever. Last time I had to stop it at
> just over 5 hours because the network admin needed to do some work on
> the server and reboot it.
Rather than creating a view, insert all the sessionids to delete into
a temp table.
You could also try this syntax:
DELETE Claims
FROM Claims c
JOIN #temp t ON c.session_id = t.session_id
If Claims does not have its clustered index on SessionId, it could be
worth to do
CREATE CLUSTERED INDEX session_id_ix ON Claims(session_id)
WITH DROP_EXISTING
for this task, and then restore the original index.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Thanks, I was trying to avoid that because I actually have about 10 tables
that I have to handle in a similiar fashion, and copying was also going
slowly.
I think I will end up using Dean's suggesting and chunking it out.
Thanks,
Nancy
"Alexander Kuznetsov" <AK_TIREDOFSPAM@.hotmail.COM> wrote in message
news:1138312336.363053.109410@.o13g2000cwo.googlegroups.com...
> you could also copy the rows you want to keep to another table, drop
> the orginal one, and rename the copy
>|||I think this is the way I will have to go, so I will just use smaller date
intervals, until I get everything deleted.
Thanks,
Nancy
"Dean" <dvitner@.nospam.gmail.com> wrote in message
news:OvslVLsIGHA.2828@.TK2MSFTNGP12.phx.gbl...
> nancy,
> try deleting in smaller chunks, say 1000 rows each. this should help.
> dean
> "Nancy Lytle" <word_diva@.hotmail.com> wrote in message
> news:er8ntEsIGHA.2668@.tk2msftngp13.phx.gbl...
>|||When bulk deleting 4 million rows, the slowness you are experiencing
probably has more to do with transaction logging than anything else. I
assume this must a one time purge, but if it is a recurring process, then
you will probably want to reconsider your data model and workflow.
Start by dropping any indexes from the table that would not assist in
qualifying the rows needed for deletion. For example, if you are deleting
based on transaction date, then you would not need to retain an index on
LastName or AccountType. This will reduce the amount of transaction logging
during the process, and besides, all the indexes will need to be
defragmented after the process has completed anyway.
You can try creating a loop and delete the rows based on a date column,
however, this usally results in batches that vary greatly in size. For
example, your company have a lot more transactions in 2005 than in 2004.
Using the technique below, you can delete in batches of exactly 1000 rows.
set rowcount 1000
while
delete from mytable where . . .
if @.@.rowcount = 0 break
checkpoint
end
"Nancy Lytle" <word_diva@.hotmail.com> wrote in message
news:er8ntEsIGHA.2668@.tk2msftngp13.phx.gbl...
> I have to do a delete from a table with about 25 million rows. I need to
> delete about 4 million rows, based on the date in another table
> Table Session
> SessionID SessionDate
> 1 1/25/2004
> 2 7/25/2001
> I have created a view (SessionView) containing all sessionid's where the
> sessiondate less than 3 years old.
> (Create view SessionView As Select SessionID From Session Where
> Session.sessiondate >= getdate() - 1095)
> Table Claims
> ClaimID SessionID etc etc etc
> 24578 1 15 HealthPractice
> 5554441234
> 6548975 2 52
> ChirocPract 5551234567
> What I need is to delete all rows from Table Claims that have sessionids
> in SessionView.
> Delete From Claims Where Claims.sessionID In (Select SessionID from
> SessionView)
> -In this case ClaimID 6548975 would be deleted but ClaimID 24578 would
> not.
> But I must be doing something wrong (I am very new at SQL Server, and have
> never done a large scale delete that wasn't just straight forward),
> because the delete seems to take forever. Last time I had to stop it at
> just over 5 hours because the network admin needed to do some work on the
> server and reboot it.
> I have set the recovery to simple, and the server has no other active
> databases or other applications running on it.
> Any assistance appreciated.
> Nancy Lytle
> N_Lytle@.terplaum.umd.edu
>
Better performance without an index?
I have a small table (600 rows) that is used in a query with a relatively large table (200,000 rows). Strangely, I get the best performance from the query by removing all indexes from the small table (including the primary key). This seems a little odd to me, and I'm wondering if this is a common scenario, and if there is some general rule that can be applied when indexing small tables.
I have tried changing the primary key index (clustered vs nonclustered) and adding other indexes to both tables, but I always get the best performance by removing all indexes from the small table. The performance difference is significant, execution time goes from 2 seconds to just over 1 second, and this is a query that gets executed quite a bit.
I can't delete the PK index as the table needs a PK for data integrity reasons. So I'm not sure how to achieve the non-index performance without removing the index.
Any suggestions would be much appreciated. By the way it's SQL Server 2005 Express.
Without indexes, you will always do a table scan. However, with such a small table, a table scan is probably much better than hopping through the index tree to get data. Though, I do not see how a clustered index would be a bad thing here. Can you show us some code (ddl for both table, including all keys/constraints)?
|||The small table:
create table EmployeeOffice (
EmployeeID uniqueidentifier not null,
OfficeID uniqueidentifier not null,
Sequence smallint not null,
CanAccessAllCases bit default 0 not null,
IsManager bit default 0 not null)
go
alter table EmployeeOffice
add constraint EmployeeOffice_PK primary key nonclustered (EmployeeID, OfficeID)
The large table:
create table CaseInfo (
CaseNumber int not null,
OfficeID uniqueidentifier not null,
EmployeeID uniqueidentifier null,
TeamID uniqueidentifier null,
... - there's about another 20 columns
InvoiceRequired" bit default 0 not null,
LastChanged" timestamp not null)
go
alter table CaseInfo
add constraint CaseInfo_PK primary key clustered (CaseNumber)
There's an Employee and an Office table as well with RI contraints between the four tables but they are not used in this particular query. The query is fairly large and involves other tables but this particular behaviour is specific to these two tables.
The query (simplified):
SELECT CaseNumber, InvoiceRequired,...
FROM CaseInfo
WHERE EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
or OfficeID in (Select OfficeID From EmployeeOffice Where EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
and (CanAccessAllCases=1 or IsManager=1))
In the test database ALL the records in the CaseInfo table have the same office, and the sub query returns no rows. The execution plan shows that when the index exists, it creates an "Index Spool" with 131,992 rows (which is the number of rows in CaseInfo where EmployeeID does not match. If I use a table hint to ignore the index on this table (or delete it) the "Index Spool" is not created.
|||
A few things:
1. I would rewrite the query to use join instead of subquery.
e.g.
Code Snippet
SELECT CaseNumber, InvoiceRequired
FROM CaseInfo ci JOIN EmployeeOffice eo ON ci.EmployeeID=eo.EmployeeID
WHERE ci.EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
union all
SELECT CaseNumber, InvoiceRequired
FROM CaseInfo ci JOIN EmployeeOffice eo ON ci.OfficeID=eo.OfficeID
WHERE eo.EmployeeID = 'BCA6F76E-A2EF-4FDA-AB38-061047ADBFDE'
and (eo.CanAccessAllCases|eo.IsManager=1)
2. Create an index on CaseInfo(EmployeeID,OfficeID)
3. Index spool is used to cache the data to allow faster lookup. This is not neccessarily a bad thing.
Friday, February 24, 2012
Beta 2 Management studio navigation
-- I can see how to desgin a table in the tree, but how do you quickly view all rows in a table like you could in enterprise manager?
-- Where do you go to set up a new DTS script if it is not done with the wizard?
Thanks.
If you have a question about SQL Server 2005 Beta 2, please ask it in
the newsgroups that have been established for it. You can find out how
to access those newsgroups where you got the beta version of SQL Server
2005.
Steve Kass
Drew University
bogus_boy wrote:
>A couple of questions:
>-- I can see how to desgin a table in the tree, but how do you quickly view all rows in a table like you could in enterprise manager?
>-- Where do you go to set up a new DTS script if it is not done with the wizard?
>Thanks.
>
Best Way To Update Existing Rows
IUpdating existing rows in a SQL server 2005 destination table
--I have a CSV file and every row needs to perform an update on a production table
The most straight forward is to use an OLEDB Command Data Flow Transformation Object that sends one SQL statement per row coming in. How do I configure this object ?
That might be okay if you've only got a small number of updates, however if you've got a large set, I prefer landing this set to a temporary table and then doing Set based updates in a following Execute SQL task. Can you please tell me how I can set up the Execute SQL Task to do set based updates in the Control flow?
thanks in advance
Dave
Mr Pro Tools wrote:
IUpdating existing rows in a SQL server 2005 destination table
--I have a CSV file and every row needs to perform an update on a production table
The most straight forward is to use an OLEDB Command Data Flow Transformation Object that sends one SQL statement per row coming in. How do I configure this object ?
I assume you can write SQL, right?
All explained here: http://msdn2.microsoft.com/en-us/library/ms141138.aspx
Mr Pro Tools wrote:
That might be okay if you've only got a small number of updates, however if you've got a large set, I prefer landing this set to a temporary table and then doing Set based updates in a following Execute SQL task. Can you please tell me how I can set up the Execute SQL Task to do set based updates in the Control flow?
thanks in advance
Dave
Again, I assume you can write SQL. Just paste your SQL into the 'SQL Statement' proeprty of the Execute SQL Task.
-Jamie
|||
Thanks Jamie
First method works --Profiler is telling me its taking about a second per update
but is too slow as I have 900,000 records to update across two tables each having 7 million records which will take
889,641 records in the file supplied / 3600 per hour (1 per second)
= 250 hours
= 10 days
the Excecute SQL task looks like another option and sure I can paste into the SQL Statement window my sproc Exec dbo.MySproc ?,?
How do I map ?,? to each row in the Execute SQL Task--in other words what are the main properties I have to set for row by row processing and is it much faster than the OLE DB Command Transformation method --which is nice and simple and slow
thanks in advance
Dave
|||Mr Pro Tools wrote:
Thanks Jamie
First method works --Profiler is telling me its taking about a second per update
but is too slow as I have 900,000 records to update across two tables each having 7 million records which will take
889,641 records in the file supplied / 3600 per hour (1 per second)
= 250 hours
= 10 days
the Excecute SQL task looks like another option and sure I can paste into the SQL Statement window my sproc Exec dbo.MySproc ?,?
How do I map ?,? to each row in the Execute SQL Task--in other words what are the main properties I have to set for row by row processing and is it much faster than the OLE DB Command Transformation method --which is nice and simple and slow
thanks in advance
Dave
I'm confused (seems to happen alot these days ) Why are you trying to map any columns at all using '?' ?
All you have to do is load the data into a temporary staging area (using OLE DB Destination) and then issue a SQL statement along the lines of:
UPDATE t
SET t.<col> = te.<col> ,......
FROM target t INNER JOIN temp te ON <some join columns>
Also, if OLE DB Command really is taking one second per row then your bottleneck is not the OLE DB Comamnd. its something else.
-Jamie
|||Mr Pro Tools wrote:
How do I map ?,? to each row in the Execute SQL Task--in other words what are the main properties I have to set for row by row processing and is it much faster than the OLE DB Command Transformation method --which is nice and simple and slow
Dave,
The OLEDB command is just slow becasue performs the same command in a row by row basis. The intention of using a Execute SQL task is to be able to perform execute a SQL command over a set of rows at once. What Jamie is sugesting is to push the rows in your CSV file to an staging table and then, in control flow to issue un Update stament against your target table using the staging table rows. That would be a one time update.
|||UPDATE t
SET t.<col> = te.<col> ,......
FROM target t INNER JOIN temp te ON <some join columns>
Yeah that is simple enough what Im getting at is can the Execute SQL Task do row by row processing like the DTS DDQ task and how do set the properties if so
In my case the updates will only occur if one of columns in the temp staging table has an particular value in a column
If it has --we do the update otherwise do a delete
thats why I had ? as parameters for a store prod
|||No, the Execute SQL Task does not do row-by-row processing. As already established, that's what the OLE DB Command does.
-Jamie
Sunday, February 19, 2012
Best way to select a Constant String from a Table ?
What is the best way to generate a constant String from a Table (Only 1 row)
.
I have some sql statements returning a few rows. I want to also return a row
with a Constant String along with this sql statement..
e.g.
Select * from orders
Union
Select 'End of Order Select' from '
I can try selecting it from orders table, but this should return only one ro
w.
Any ideas ?
- AnandSorry for the trouble, found out the answer that I can just do
Select 'End of Order Select'
"S Anand" wrote:
> Hi,
> What is the best way to generate a constant String from a Table (Only 1 ro
w).
> I have some sql statements returning a few rows. I want to also return a r
ow
> with a Constant String along with this sql statement..
> e.g.
> Select * from orders
> Union
> Select 'End of Order Select' from '
> I can try selecting it from orders table, but this should return only one
row.
> Any ideas ?
> --
> - Anand|||hi Anand
This will work fine if you are note selecting any value from a table. If u
require a value
from a table along with a constant value, then u require to do like this:
SELECT TOP 1 'Const Value', <COLUMNS> FROM <TABLE>
you query can be modified as
Select * from orders
Union ALL
Select 'End of Order Select'
try using UNION ALL if u definately want the text to be displayed.
in case of UNION the second table will not display a value if the same value
exists
in the main table.
Hope this gives u a better picture
best Regards,
Chandra
http://chanduas.blogspot.com/
http://groups.msn.com/SQLResource/
---
"S Anand" wrote:
> Sorry for the trouble, found out the answer that I can just do
> Select 'End of Order Select'
> "S Anand" wrote:
>|||A few comments:
> Select 'End of Order Select'
Use square brackets of double-quotes around the produced column name instead
. The column you produce
in the result is an identifier and non-standard identifiers are delimited wi
th double-quotes in SNAI
SQL (and SQL Server) and SQL Server also allow double-quotes. Why SQL Server
allow single quotes for
identifiers in this particular case is beyond my understanding, very strange
..
Don't do SELECT *. It might just have been an example, but imagine of the ta
ble structure changes
and you add or remove columns. The UNION won't work.
Also, don't expect the SELECT with a constant to come last unless you do an
ORDER BY for the UNION.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"S Anand" <x@.hotmail.com> wrote in message
news:81178FBE-0B61-411E-8836-D01240F81227@.microsoft.com...
> Sorry for the trouble, found out the answer that I can just do
> Select 'End of Order Select'
> "S Anand" wrote:
>
Best way to retrieve rows matching an array of Ids
.
I have an array of ids, with N elements where N is typically between 500 and
5,000.
I want to retrieve into a collection all rows whose id matches a value in
this array.
What is the 'best' way to do this and/or what are the tradeoffs to consider?
I can imagine a number of solutions as follows:
1. A parametered query which takes a single Id parameter and returns 1 row.
Implies N round trips to the server to retrieve N rows.
2. A parametered query with an IN clause which takes n parameters "... WHERE
ID IN (@.p1, @.p2, ... , @.pn)". Implies N/n round trips to the server where
each round trip retrieves up to n rows. What is the maximum / optimal valu
e
for n in this case?
3. Build the SQL string dynamically, with an IN clause generated by
concatenating up to n Id values. Implies N/n round trips to the server wher
e
each round trip retrieves up to n rows. What is the maximum / optimal valu
e
for n in this case?
4. Create a temporary table with a single integer ID column and insert the N
Ids from my array (how?). Join the original table with this temporary table
to retrieve all matching Ids.
...
Any suggestions?You can try the solution offered in http://www.aspfaq.com/2248 ... also
check out the link there to Erland's article on array handling.
"Joe" <Joe@.discussions.microsoft.com> wrote in message
news:656DF445-E0EC-495C-B62E-0A535EBC2366@.microsoft.com...
> My database table has approx 400,000 rows, with an integer Id as primary
> key.
> I have an array of ids, with N elements where N is typically between 500
> and
> 5,000.
> I want to retrieve into a collection all rows whose id matches a value in
> this array.
> What is the 'best' way to do this and/or what are the tradeoffs to
> consider?
> I can imagine a number of solutions as follows:
> 1. A parametered query which takes a single Id parameter and returns 1
> row.
> Implies N round trips to the server to retrieve N rows.
> 2. A parametered query with an IN clause which takes n parameters "...
> WHERE
> ID IN (@.p1, @.p2, ... , @.pn)". Implies N/n round trips to the server where
> each round trip retrieves up to n rows. What is the maximum / optimal
> value
> for n in this case?
> 3. Build the SQL string dynamically, with an IN clause generated by
> concatenating up to n Id values. Implies N/n round trips to the server
> where
> each round trip retrieves up to n rows. What is the maximum / optimal
> value
> for n in this case?
> 4. Create a temporary table with a single integer ID column and insert the
> N
> Ids from my array (how?). Join the original table with this temporary
> table
> to retrieve all matching Ids.
> ...
> Any suggestions?|||Very useful resources, thanks, it's a lot clearer. In my case, the number
of values is potentially greater than will fit into an 8K string. So I thin
k
the approach will be:
1. Create the temp table
2. Concatenate as many Ids as possible into a string which does not exceed
8K and call an SP which extracts integers from the string and inserts into
the temp table.
3. Repeat step 2 until all ids in my array are processed.
4. Execute a query with a join to the temp table
5. Drop the temp table.
This will be done from a .NET application. I am concerned about the
lifetime of the temporary table? If I get a SqlException while running the
above, my code will typically close its SqlConnection object - which won't
actually close the underlying connection, just return it to the connection
pool. Presumably this means the temp table could be left lying around for
the next unsuspecting user of that connection, right?
I guess the workaround would be to check for the existence of the temp table
before step 1 above and drop it if it already exists.
"Aaron Bertrand [SQL Server MVP]" wrote:
> You can try the solution offered in http://www.aspfaq.com/2248 ... also
> check out the link there to Erland's article on array handling.
>
> "Joe" <Joe@.discussions.microsoft.com> wrote in message
> news:656DF445-E0EC-495C-B62E-0A535EBC2366@.microsoft.com...
>
>|||Arrays? Collections? Those are not SQL structures. We only have
tables. And we avoid procedural code, cursors, dynamic SQL and string
casting as much as possible -- which is about 99.99% of the time.
Load your list of ids into a one column table and use "WHERE foobar_id
IN (SELECT search_id FROM WorkingList)" in the body of the procedure.
Avoid kludges.|||I think we already established that a good approach is to load the ids into
a
(temporary) table. The question was, what is the 'best' way to load these
ids into the temporary table.
From the resources linked by Aaron Bertrand, it seems that a good approach
is to pass the array of values as a string - this means that all the values
can be passed in a single call.
Do you agree or what alternative would you recommend and why?
"--CELKO--" wrote:
> Arrays? Collections? Those are not SQL structures. We only have
> tables. And we avoid procedural code, cursors, dynamic SQL and string
> casting as much as possible -- which is about 99.99% of the time.
> Load your list of ids into a one column table and use "WHERE foobar_id
> IN (SELECT search_id FROM WorkingList)" in the body of the procedure.
> Avoid kludges.
>
Tuesday, February 14, 2012
Best way to insert large amounts of data from a webform to SQL Server 2005
Hi
I have a VB.net web page which generates a datatable of values (3 columns and on average about 1000-3000 rows).
What is the best way to get this data table into an SQL Server? I can create a table on SQL Server no problem but I've found simply looping through the datatable and doing 1000-3000 insert statements is slow (a few seconds). I'd like to make this as streamlined as possible so was wondering is there is a native way to insert all records in a batch via ADO.net or something.
Any ideas?
Thanks
Ed
Hi,
If you gonna insert or update in single table then use SQL Data adapter. But it needs to have primary or unique key to create update query via sql command builder otherwise its just insert all rows in your table.
If you want to update for above scenario there are two ways.
1. Create staging table in your database and insert your records using sqlbulkcopy class which can insert lakhs of data/s (i'm not sure abt amt), then create a stored procedure to update or insert rows from staging table to main table
2. You can write your update and insert command in sqldataadapter.sqlcommand property.
for example
create a table as testtable with id number and name as text
Sql query will be like this
update testtable
set name = @.namecolumn
Where id = @.idColumn
If @.@.Rowcount = 0
Insert into testtable (id,name) values(@.idColumn,@.nameColumn)
then add sql parameters(@.idcolumn,@.nameColumn) with your data table column names
then adapter will execute the query while you call update method
hope it helps
Best way to handle existing rows
The method brent describes is discussed in more detail here: http://www.sqlis.com/default.aspx?311
Its also worth saying that if the data in the source is timestamped when it canges then you can use that timestamp to only pull out data changed since previous load. i.e. Reduce the amount of data that you have to process.
-Jamie
Friday, February 10, 2012
Best way to compare two ENTIRE rows in seperate tables?
Here's the basic lowdown:
I get some data every night from an external system (cache') via DTS. This is more or less my "master" data which drives my application. I have just been informed of an interesting constraint. If any of the data changes in the external system, those changes do not become effective until the first day of the ensuing fiscal quarter.
I'm solving this by running the DTS as normal, but populating a "duplicate" table which I will evaluate once per quarter for any changes. This is also the preferred solution because they would like to see a snapshot between current cache' data and the data my application is currently working with.
So, I end up with two identically structured tables. both tables have the same primary key and can be linked by an id field with relative ease. What I would like to do is a full row comparision once this join is established.
Right now I explicitly check the value of each column. ie:
WHERE t1.field1 <> t2.field1 OR t1.field2 <> t2.field2 OR t1.field3 <> t2.field3 ... etc
I'm hoping there is something buried in TSQL that I just don't know about that can handle comparing entire rows and tell me if they're different. Or perhaps there's another approach all-together.
Any thoughts?in my opinion, a JOIN with a column-by-column comparison is the best way to go
WHERE t1.field1 <> t2.field1 OR t1.field2 <> t2.field2 OR t1.field3 <> t2.field3
another approach:select pkey
from (
select * from table1
union
select * from table2
) u
group
by pkey
having count(*) = 2 since UNION removes duplicate rows, this query will give you all the pkeys which have something different
of course, then you gotta dive back into the tables to see the actual data, so ...|||I'm hoping there is something buried in TSQL that I just don't know about that can handle comparing entire rows and tell me if they're different. Or perhaps there's another approach all-together.There is.select a.pkey
from (select pkey, binary_checksum(*) as checkvalue from a) checksumsa
inner join (select pkey, binary_checksum(*) as checkvalue from b) as checksumb
on checksumsa.pkey = checksumsb.pkey
where checksumsa.checkvalue <> checksumsb.checkvalueBUT...binary_checksum values are not guaranteed to be different 100% of the time. There is a slight chance that the data could be modified in a way that leaves the checksum value unchanged, though the chances are slim.
Have you considered placing a trigger on your secondary table to set a flag when a record is updated? In the end, that may be your best approach.|||Sounds like Delta processing
http://weblogs.sqlteam.com/brettk/archive/2004/04/23/1281.aspx|||When using binary checksums (which is the way I do this comparison across several databases and tables we have) you also have to be aware that there can be differences reported if one of the servers being used in the comparison is a LINKED server...
http://www.dbforums.com/showthread.php?t=1213716|||Good stuff guys! I'll be leveraging a bit of what Rudy and Brett posted for what I need. I was positive there had to be a more elegant method than what I was doing...
Thanks duders!
best way to compare last row with new insert
for MS SQL 2000
I need an UsersHistory Table where i cannot INSERT 2 rows one after the other with the same user.name
I can get
Bob
David
Bob
but not
Bob
Bob
David
something like an INDEX on rows
or
INSERT INTO UsersHistory (name) VALUES ('bob') IF MAX(name) <> 'bob' ?
what is the best way to do it ?
thank youYou need some other column to order by, say a datetime. it could store when the user last logged in/logged out/whatever. Why? because the physical order of rows in a table has no meaning. Since you are storing a history, the thing you want to order by is a time, so you need a datetime column.
Then you could do something like this:
-- assumes you already have a username in a @.name variable
declare @.time datetime
select @.time = max(LoggedInAt) from UsersHistory
if not exists(select * from UsersHistory where LoggedInAt=@.time and UserName=@.name)
begin
insert into UsersHistory (UserName, LoggedInAt) values (@.name, getdate())
end|||I have allready that datetime column = getdate()
but i dont understand the meaning of your query , if the last user row is Bob the next row cannot be Bob, it doesnt depend on the DateTime it can be 10 minutes ... one year
I must have an history of LastUsers for certains events in the database, the LastUser is the Last one as long as nobody do something and replace the last one (I know my english is terrific ! :-))
the last inserted row is also the MAX(id)
thanks a lot Jezemine|||Do something like:
INSERT INTO UsersHistory (name) VALUES ('bob') IF (SELECT fname from UsersHistory where id = max(id)) <> 'bob'
Note: not actual code
best way to combine columns
Hi all,
I have a table with multiple rows with the same ID.
a) How do I combine all columns into one row with the same ID?
b) Is this better to do the combine in the store procedure/trigger or a sub that involked by some sort of datarepeater, datagrid controls?
Since I am trying to brush up my sql. I appreciate any examples on top of conceptual solution.
Thanks
I take that my question is not so clear so here is the example of what I wanted to do:
I have a sql 2000 table like this:
ID col1 col2
7777 itemx 12/02/07 00:00:10
7777 itemy 12/02/07 10:00:00
7777 itemz 12/02/07 12:10:60
8888 itemA 12/02/07 01:01:00
888 itemB 12/02/07 02:00:00
..........................................
I like to combine all rows with the same ID together like the followings:
7777 itemx itemy itemz 12/02/07
888 itemA itemB...................
The question has 3 parts:
1) what is the best way to do this? using trigger/stored procedure or just a vb sub with somesort of datarepeater, datalist controls
2) can you show me some examples of each way if any?
3) Can crystal report do something like this?
I am open to create another table or just plain writting them out on page.
Thanks
|||
Some one mentioned to me that 'trigger' could cause some confusion in reading the question. So , please do skip the trigger part.
|||Depending on how many records will be pulled out, you could do it at the report level or at the DB level. You could write a function that takes the ID as parameter and returns a concatenated string for items and use the function in the SELECT.
SELECT Id, dbo.fnGetItems(ID), col2...
FROM ...
and create the function with a SELECT as
SELECT @.val = ISNULL(@.val,'') + ' ' + Convert(Varchar, col2) FROM YourTable WHERE ID = @.Id
and return the @.val.
|||Thanks ndinakar. Anyone has any other solutions/ideas or clearer/complete solution? thanks
|||Hi tvfoto,
This depends on if this table has a primary key.
If yes, I would suggest you read everything into a DataSet. Process the data combination
within the DataSet and update it back to the server, because in the .NET code, you will
have better flexibility for the combination logic.
If the table doesn't have a primary key, as ndinakar suggested, you can use some stored
procedure to achieve this.
HTH. If this does not answer you question, please feel free to mark it as Not Answered and post your reply. Thanks!
|||THis table does not have a primary key. The following code works, however I dont know how to seperate them out as individual columns so I can give them a proper heading. Any suggestion? Thanks
//my user function.
create functionGetItems(@.mId int)
returns varchar(1000)As
Begin
declare @.values varchar(1000)
Set @.values=''
Select @.values=@.values+','+ myItemColumnName from myTable whereID=@.mId
return @.values
End
Go
//my aspx code
<
body><formid="form1"runat="server"><div><asp:SqlDataSourceID="SqlDataSource1"runat="server"ConnectionString="<%$ ConnectionStrings:myTableConnectionString %>"SelectCommand="SELECT DISTINCT [ID],dbo.GetItems(ID) as [xyzValues] FROM [myTable] "></asp:SqlDataSource><
asp:RepeaterID="Repeater1"runat="server"DataSourceID="SqlDataSource1"><HeaderTemplate><tablestyle="background:#ebebeb"><tr><tdcolspan="5"> </td></tr><trstyle="text-align:left; background:yellow"><th>ID</th><th> </th><th>item1,item2,item3,item4, item5...</h>///////header for different items here...not a good way</tr>
</HeaderTemplate><ItemTemplate><tr><tdstyle="width:25px"><%#DataBinder.Eval(Container.DataItem,"ID")%></td>
<td> </td><td><%#DataBinder.Eval(Container.DataItem,"xzyValues")%></td>.................................</div></form></body>
Best way of dumping the rows of a table to another table
Hi,
I have two table as follows
Tabel Cat
{
ID
Update Datetime
}
Table Master
{
UniqueId
Cat Datetime
}
I wanted to insert millions of rows from Cat to Master table and can be achieved from the following query
Code Snippet
IINSERT INTO Master (UniqueId, Cat )SELECT ID, Update from Cat
Question : Is there any better way do achieve the same operation with better performance mean to say that the execution should be less than the time it takes to execute the above statement
Thanks for your time
~Mohan
There are some elaborate things you might do (looping, BCP etc) but the time to develope and test those will MORE than eat up the time it takes to run a probably-quite-efficient update like you've create above.
You might experiment with disabling the indexes on the target table, and then rebuilding them after the insert is complete (if your application will tolerate the indexes being offline for a period). Indexes tend to generally slow down inserts.
Code Snippet
ALTER INDEX ALL on Master DISABLE;
IINSERT INTO Master (UniqueId, Cat )
SELECT ID, Update from Cat
ALTER INDEX ALL ON Master REBUILD;
|||Hello Mohan,
'Insert into' statement writes to log and will take a long time to insert millions of rows. I would use the follwoing to make it quicker...
1. drop any existing second table. In your case t is 'table Master'.
2. Execute this query - select * into master from Cat
3. This will create and populate the rows into master table. Note 'master' table will have the same column names as in Cat.
4. You can rename the columns using 'sp_rename'. Look it up in BOL. Check if you can rename the columns in designer.
Regards....
|||Hi ,
If we do not have indexes on the table then the query will be same as the one i mentioned. In that case it will be time cosuming to insert millions of rows..right?
~Mohan
|||Hello AsianIndian,
I can't drop the table and the column name are different and i cannot change the column names too as the master tabel has different intentiion.
Can u please let me know if there is any other way of doing it
~Mohan
|||Hello Mohan,
1. Do you have any rows in Master table? If yes, approx how many?
2. Did you try BCP OUT and BCP IN?
regards......
|||Hello,
BCP is a command line utility to read into or form a file. I don't think so we cann't use it to dump data from one tabel to another table.
~Mohan
|||Disabling the indexes is a very good idea. You might also see if you can partition the rows you are inserting into seperate batches of some length (based on testing)
The problem is that each insert requires a log write. So one row at a time in a normal RDBMS operation can take longer than doing multiples. However, if you try to insert too many rows at a time, you get a situation where it takes a lot more time because things cannot be committed, using more memory.
However, along the lines that the other person mentioned, I would consider writing a simple SSIS (2005 and up) or DTS (2000 and earlier) package to bulk insert the data (might be labeled fast load) along with the disabling/dropping of indexes. I have used this technique on several occasions to reduce a process of inserting rows that took over an hour to just taking a few minutes.
|||Hi,
I am agree with rusag2 and Louis Davidson..
I also would like to suggest to use SSIS or DTS to transfer the huge number of data and run it .. and run them on the server throuch scheduled job(instead on Client) which makes it to perform much faster than runnin it in Management Studio or Query Analyzer..
Best way for the performance
system:
Creating a huge table with more than 5 000 000 000 rows or multiple
tables with the same structure containing 5 000 000 rows?
Here the context of our application:
Our application offers the possibility to create up to 3000 projects.
Each project contains a list of tags and a list of value-timestamp for
each tag.
So, a project can contain up to 500 tags with a maximum of 5000 values
for each tag. In a hypothetical scenario where everything is at
maximun, only one table containing every value-timestamp would be huge
with more than 7 500 000 000 rows.
My first thought was to create a table containning the values for each
project.
Project
=========== Project_ID - PK
ValDataName_ID
... etc...
ValDataName
=========== ValDataName_ID - PK
TableName (represents the ValDataXXX)
Project_ID
ValDataXXX where XXX is an unique id
=========== ValDataXXX_ID - PK
Tag_ID
TagVal
TagTime
So, I would have a table (ValDataName) containing the names of each
ValData table linked to the project and a table containing my projects.
Then, I would have up to 5000 tables of type ValData.
Since my ValData tables are completely independant, any action
affecting one table does not affect the other.
So, which way would be the better approach?
Thanks.Francis B. wrote:
> I would like to know which way is better for the performance of my
> system:
> Creating a huge table with more than 5 000 000 000 rows or multiple
> tables with the same structure containing 5 000 000 rows?
> Here the context of our application:
> Our application offers the possibility to create up to 3000 projects.
> Each project contains a list of tags and a list of value-timestamp for
> each tag.
> So, a project can contain up to 500 tags with a maximum of 5000 values
> for each tag. In a hypothetical scenario where everything is at
> maximun, only one table containing every value-timestamp would be huge
> with more than 7 500 000 000 rows.
> My first thought was to create a table containning the values for each
> project.
> Project
> ===========> Project_ID - PK
> ValDataName_ID
> ... etc...
> ValDataName
> ===========> ValDataName_ID - PK
> TableName (represents the ValDataXXX)
> Project_ID
> ValDataXXX where XXX is an unique id
> ===========> ValDataXXX_ID - PK
> Tag_ID
> TagVal
> TagTime
> So, I would have a table (ValDataName) containing the names of each
> ValData table linked to the project and a table containing my projects.
> Then, I would have up to 5000 tables of type ValData.
> Since my ValData tables are completely independant, any action
> affecting one table does not affect the other.
> So, which way would be the better approach?
> Thanks.
You didn't mention one very important piece of information. What
version of SQL Server are you using? SQL Server 2005 has a lot of new
scalability features, especially table partitioning. For a major new
application I assume SQL Server 2005 will be the obvious choice for
you.
Achieving a scalable solution will depend very much on the design of
your indexing, queries, your application and your hardware
implementation. You've told us nothing of those. I can only suggest
that creating new tables in a bid to improve performance should
generally be a long way down your list of options. Usually there are
much more effective ways to optimise performance. More likely you might
choose to implement horizontal partitioning in order to make certain
admin tasks easier. How you do that depends a lot on the version and
edition you are using.
My other suggestion would be that you rethink very, very carefully
about logical design. When I see tables with names like "ValDataXXX"
and columns with names like "Tag_ID" and "TagVal" I just get an a awful
sinking feeling that this is some naive data-model-on-the-fly kind of
solution - what is sometimes euphemistically called an "EAV" model. If
I'm right then you can probably say goodbye to scalability, performance
and data integrity. Your database will have none of them.
--
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||On 19 Sep 2006 12:59:25 -0700, "Francis B."
<francisbeliveau@.gmail.com> wrote:
>I would like to know which way is better for the performance of my
>system:
>Creating a huge table with more than 5 000 000 000 rows or multiple
>tables with the same structure containing 5 000 000 rows?
>Here the context of our application:
>Our application offers the possibility to create up to 3000 projects.
>Each project contains a list of tags and a list of value-timestamp for
>each tag.
No way should a project management system have 10^9 rows.
Beyond that, I second what David has already said.
J.
Best way for the performance
system:
Creating a huge table with more than 5 000 000 000 rows or multiple
tables with the same structure containing 5 000 000 rows?
Here the context of our application:
Our application offers the possibility to create up to 3000 projects.
Each project contains a list of tags and a list of value-timestamp for
each tag.
So, a project can contain up to 500 tags with a maximum of 5000 values
for each tag. In a hypothetical scenario where everything is at
maximun, only one table containing every value-timestamp would be huge
with more than 7 500 000 000 rows.
My first thought was to create a table containning the values for each
project.
Project
===========
Project_ID - PK
ValDataName_ID
.... etc...
ValDataName
===========
ValDataName_ID - PK
TableName (represents the ValDataXXX)
Project_ID
ValDataXXX where XXX is an unique id
===========
ValDataXXX_ID - PK
Tag_ID
TagVal
TagTime
So, I would have a table (ValDataName) containing the names of each
ValData table linked to the project and a table containing my projects.
Then, I would have up to 5000 tables of type ValData.
Since my ValData tables are completely independant, any action
affecting one table does not affect the other.
So, which way would be the better approach?
Thanks.
Francis B. wrote:
> I would like to know which way is better for the performance of my
> system:
> Creating a huge table with more than 5 000 000 000 rows or multiple
> tables with the same structure containing 5 000 000 rows?
> Here the context of our application:
> Our application offers the possibility to create up to 3000 projects.
> Each project contains a list of tags and a list of value-timestamp for
> each tag.
> So, a project can contain up to 500 tags with a maximum of 5000 values
> for each tag. In a hypothetical scenario where everything is at
> maximun, only one table containing every value-timestamp would be huge
> with more than 7 500 000 000 rows.
> My first thought was to create a table containning the values for each
> project.
> Project
> ===========
> Project_ID - PK
> ValDataName_ID
> ... etc...
> ValDataName
> ===========
> ValDataName_ID - PK
> TableName (represents the ValDataXXX)
> Project_ID
> ValDataXXX where XXX is an unique id
> ===========
> ValDataXXX_ID - PK
> Tag_ID
> TagVal
> TagTime
> So, I would have a table (ValDataName) containing the names of each
> ValData table linked to the project and a table containing my projects.
> Then, I would have up to 5000 tables of type ValData.
> Since my ValData tables are completely independant, any action
> affecting one table does not affect the other.
> So, which way would be the better approach?
> Thanks.
You didn't mention one very important piece of information. What
version of SQL Server are you using? SQL Server 2005 has a lot of new
scalability features, especially table partitioning. For a major new
application I assume SQL Server 2005 will be the obvious choice for
you.
Achieving a scalable solution will depend very much on the design of
your indexing, queries, your application and your hardware
implementation. You've told us nothing of those. I can only suggest
that creating new tables in a bid to improve performance should
generally be a long way down your list of options. Usually there are
much more effective ways to optimise performance. More likely you might
choose to implement horizontal partitioning in order to make certain
admin tasks easier. How you do that depends a lot on the version and
edition you are using.
My other suggestion would be that you rethink very, very carefully
about logical design. When I see tables with names like "ValDataXXX"
and columns with names like "Tag_ID" and "TagVal" I just get an a awful
sinking feeling that this is some naive data-model-on-the-fly kind of
solution - what is sometimes euphemistically called an "EAV" model. If
I'm right then you can probably say goodbye to scalability, performance
and data integrity. Your database will have none of them.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
|||On 19 Sep 2006 12:59:25 -0700, "Francis B."
<francisbeliveau@.gmail.com> wrote:
>I would like to know which way is better for the performance of my
>system:
>Creating a huge table with more than 5 000 000 000 rows or multiple
>tables with the same structure containing 5 000 000 rows?
>Here the context of our application:
>Our application offers the possibility to create up to 3000 projects.
>Each project contains a list of tags and a list of value-timestamp for
>each tag.
No way should a project management system have 10^9 rows.
Beyond that, I second what David has already said.
J.