Friday, February 24, 2012

best way to update row by row?

Hello all,
I am (dangerously) new to SQL but alas, have been appointed as the new
DB-everything (admin, programmer, designer, etc.). I have been reading
up on SQL scripting and am not sure about the most efficient way to
proceed and was hoping someone could point me in the right direction.
If anyone could point me to fairly simple, good examples of looping in
SQL script for the purposes of processing and updating data from one
database (or table(s)) to another I would appreciate it - I've been
slogging around for a while and haven't been able to find anything that
compares the methods and/or provides a clear example for emulation.
I know exactly how I would do it using a recordset in ASP to handle the
logic of looping through each record in a table but have not been able
to find "the correct way" to do this strictly in SQL - it seems people
recommend against cursors. So what do you do?
I have an example in a book that utilizes a WHILE loop based on a
variable that stores RECORDCOUNT.
There seems to be a lot of anti-cursor sentiments - though cursors are
made specifically for looping through each record (fetch)
Perhaps the answer is that it's better to update row by row using
ADO/VB.
Any input, suggestions and/or links to good tutorials are much
appreciated. I've been doing a lot of reading on my own but each new
tutorial seems to contradict the last.
mail at mahalie dot comThe Best Practice in a relational database is to use set-based processing
instead of procedural 'cursor' processing whenever possible. In other
words, do all work at once instead of 'row by row'. A simple set-based
UPDATE example
--move all employees in department 12345 to department 98765
UPDATE Employees
SET Department = '98765'
WHERE Department = '12345'

> I have an example in a book that utilizes a WHILE loop based on a
> variable that stores RECORDCOUNT.
>
This technique is probably nothing more than a pseudo-cursor. Although
technically not a cursor, performance is often similar to cursor processing.

> There seems to be a lot of anti-cursor sentiments - though cursors are
> made specifically for looping through each record (fetch)
There are times when one must resort to individual row processing. Cursors
(client or server) are appropriate in that case. However, there are many
situations where cursors can and should be avoided.

> Perhaps the answer is that it's better to update row by row using
> ADO/VB.
This is just a client-side cursor. The point being that it is often
possible to achieve the desired result in a single SQL statement on the
server side rather than any sort of server or client looping construct.
Set-based processing allows the database engine to optimize the statement as
a unit.
Hope this helps.
Dan Guzman
SQL Server MVP
"mahalie" <mahalie@.gmail.com> wrote in message
news:1134446568.289271.282450@.g43g2000cwa.googlegroups.com...
> Hello all,
> I am (dangerously) new to SQL but alas, have been appointed as the new
> DB-everything (admin, programmer, designer, etc.). I have been reading
> up on SQL scripting and am not sure about the most efficient way to
> proceed and was hoping someone could point me in the right direction.
> If anyone could point me to fairly simple, good examples of looping in
> SQL script for the purposes of processing and updating data from one
> database (or table(s)) to another I would appreciate it - I've been
> slogging around for a while and haven't been able to find anything that
> compares the methods and/or provides a clear example for emulation.
> I know exactly how I would do it using a recordset in ASP to handle the
> logic of looping through each record in a table but have not been able
> to find "the correct way" to do this strictly in SQL - it seems people
> recommend against cursors. So what do you do?
> I have an example in a book that utilizes a WHILE loop based on a
> variable that stores RECORDCOUNT.
> There seems to be a lot of anti-cursor sentiments - though cursors are
> made specifically for looping through each record (fetch)
> Perhaps the answer is that it's better to update row by row using
> ADO/VB.
> Any input, suggestions and/or links to good tutorials are much
> appreciated. I've been doing a lot of reading on my own but each new
> tutorial seems to contradict the last.
> mail at mahalie dot com
>|||Thank you for the input!
I should have pointed out that set processing is not really an option.
That is, this example wouldn't apply.
--move all employees in department 12345 to department 98765
UPDATE Employees
SET Department = '98765'
WHERE Department = '12345'
Each row in the table(s) needs to be matched to a specific row in the
other existing table, and updated (conditionally, or course!) using all
sorts of fun look ups to boot!
So it sounds like a cursor is the way to go, especially if other
methods like the row count loop are just pseudo-cursors creating the
same overhead and there's no major benefit to the ASP pages carrying
some of the logic processing burden?
For what it's worth - I'll be able to do this update without lock
issues as it's an internal set of databases and run the script when
everyone's gone!|||Please post DDL+ sample data
Look at David Portas's example , how UPDATE...FROM syntax may affect your
result.
CREATE TABLE Countries
(countryname VARCHAR(20) NOT NULL PRIMARY KEY,
capitalcity VARCHAR(20));
CREATE TABLE Cities
(cityname VARCHAR(20) NOT NULL,
countryname VARCHAR(20) NOT NULL
REFERENCES Countries (countryname),
CONSTRAINT PK_Cities
PRIMARY KEY (cityname, countryname));
INSERT INTO Countries (countryname, capitalcity) VALUES ('USA', NULL);
INSERT INTO Countries (countryname, capitalcity) VALUES ('UK', NULL);
INSERT INTO Cities VALUES ('Washington', 'USA');
INSERT INTO Cities VALUES ('London', 'UK');
INSERT INTO Cities VALUES ('Manchester', 'UK');
The MS-syntax makes it all too easy for the developer to slip-up by
writing ambiguous UPDATE...FROM statements where the JOIN criteria is
not unique on the right side of the join.
Try these two identical UPDATE statements with a small change to the
primary key in between.
UPDATE Countries
SET capitalcity = cityname
FROM Countries JOIN Cities /* evil UPDATE... FROM syntax */
ON Countries.countryname = Cities.countryname;
SELECT * FROM Countries;
ALTER TABLE Cities DROP CONSTRAINT PK_Cities;
ALTER TABLE Cities ADD CONSTRAINT PK_Cities PRIMARY KEY (countryname,
cityname);
UPDATE Countries
SET capitalcity = cityname
FROM Countries JOIN Cities /* don't do this! */
ON Countries.countryname = Cities.countryname;
SELECT * FROM Countries;
You get this from the first SELECT statement:
countryname capitalcity
-- --
UK London
USA Washington
and this from the second:
countryname capitalcity
-- --
UK Manchester
USA Washington
(though these results aren't guaranteed - that's part of the problem).
Why did the result change? The physical implementation has affected the
meaning of the code, with serious, potentially disastrous consequences.
How can you even test your code if its results are subject to change
due to the vagaries of storage, indexing and cacheing?
With the ANSI syntax there is no ambiguity. The UPDATE statement
compels the programmer to design an unambiguous assignment subquery
that returns no more than a single value.
UPDATE Countries
SET capitalcity =
(SELECT MIN(cityname)
FROM Cities
WHERE Countries.countryname = Cities.countryname);
At the very least this forces the developer to reconsider whether the
UPDATE statement makes logical sense. You might want to make an effort
to learn Standard SQL instead of a dialect that can change at any time,
which will not port, cannot be understood by other programmers, etc.
"mahalie" <mahalie@.gmail.com> wrote in message
news:1134449190.817097.83020@.z14g2000cwz.googlegroups.com...
> Thank you for the input!
> I should have pointed out that set processing is not really an option.
> That is, this example wouldn't apply.
> --move all employees in department 12345 to department 98765
> UPDATE Employees
> SET Department = '98765'
> WHERE Department = '12345'
> Each row in the table(s) needs to be matched to a specific row in the
> other existing table, and updated (conditionally, or course!) using all
> sorts of fun look ups to boot!
> So it sounds like a cursor is the way to go, especially if other
> methods like the row count loop are just pseudo-cursors creating the
> same overhead and there's no major benefit to the ASP pages carrying
> some of the logic processing burden?
> For what it's worth - I'll be able to do this update without lock
> issues as it's an internal set of databases and run the script when
> everyone's gone!
>|||mahalie wrote:
> Thank you for the input!
> I should have pointed out that set processing is not really an option.
> That is, this example wouldn't apply.
> --move all employees in department 12345 to department 98765
> UPDATE Employees
> SET Department = '98765'
> WHERE Department = '12345'
> Each row in the table(s) needs to be matched to a specific row in the
> other existing table, and updated (conditionally, or course!) using all
> sorts of fun look ups to boot!
> So it sounds like a cursor is the way to go, especially if other
> methods like the row count loop are just pseudo-cursors creating the
> same overhead and there's no major benefit to the ASP pages carrying
> some of the logic processing burden?
> For what it's worth - I'll be able to do this update without lock
> issues as it's an internal set of databases and run the script when
> everyone's gone!
Nothing you have said implies that a cursor is the best or only
solution. There's no fundamental reason why you can't write set based
code that will "match to a specific row in the other table and update
conditionally using look ups". There are many reasons to choose a set
based solution and not all of them are to do with performance and
scalability. Simplicity, ease of maintenance and portability are
usually advantages of set based code as well.
The best way to get help with your question is to post DDL, sample data
and expected results. See: http://www.aspfaq.com/etiquette.asp?id=5006
David Portas
SQL Server MVP
--|||Mahalie,
Cursors are often discouraged mainly due to the conflict with rolling back
transactions. Just as a safety precaution, use aliases instead of the
tablename.fieldname approach. This will allow you to easily replace UPDATE
with SELECT to check the result set set before you commit the cursor.
Just my twist on it,
Adam Turner
"mahalie" wrote:

> Hello all,
> I am (dangerously) new to SQL but alas, have been appointed as the new
> DB-everything (admin, programmer, designer, etc.). I have been reading
> up on SQL scripting and am not sure about the most efficient way to
> proceed and was hoping someone could point me in the right direction.
> If anyone could point me to fairly simple, good examples of looping in
> SQL script for the purposes of processing and updating data from one
> database (or table(s)) to another I would appreciate it - I've been
> slogging around for a while and haven't been able to find anything that
> compares the methods and/or provides a clear example for emulation.
> I know exactly how I would do it using a recordset in ASP to handle the
> logic of looping through each record in a table but have not been able
> to find "the correct way" to do this strictly in SQL - it seems people
> recommend against cursors. So what do you do?
> I have an example in a book that utilizes a WHILE loop based on a
> variable that stores RECORDCOUNT.
> There seems to be a lot of anti-cursor sentiments - though cursors are
> made specifically for looping through each record (fetch)
> Perhaps the answer is that it's better to update row by row using
> ADO/VB.
> Any input, suggestions and/or links to good tutorials are much
> appreciated. I've been doing a lot of reading on my own but each new
> tutorial seems to contradict the last.
> mail at mahalie dot com
>|||> I should have pointed out that set processing is not really an option.
> That is, this example wouldn't apply.
> --move all employees in department 12345 to department 98765
> UPDATE Employees
> SET Department = '98765'
> WHERE Department = '12345'
> Each row in the table(s) needs to be matched to a specific row in the
> other existing table, and updated (conditionally, or course!) using all
> sorts of fun look ups to boot!
This can easily be done in a single UPDATE statement as Uri pointed out in
his examples. I wouldn't go so far as to say that the proprietary
UPDATE...FROM syntax is 'evil' since I find it especially handy when
multiple columns are to be updated. Note that the gotchas with poorly
formulated queries that Uri pointed out apply to cursors as well.
As David suggested, please post your DDL and script so that we can perhaps
demonstrate a set-based approach for your situation.
Hope this helps.
Dan Guzman
SQL Server MVP
"mahalie" <mahalie@.gmail.com> wrote in message
news:1134449190.817097.83020@.z14g2000cwz.googlegroups.com...
> Thank you for the input!
> I should have pointed out that set processing is not really an option.
> That is, this example wouldn't apply.
> --move all employees in department 12345 to department 98765
> UPDATE Employees
> SET Department = '98765'
> WHERE Department = '12345'
> Each row in the table(s) needs to be matched to a specific row in the
> other existing table, and updated (conditionally, or course!) using all
> sorts of fun look ups to boot!
> So it sounds like a cursor is the way to go, especially if other
> methods like the row count loop are just pseudo-cursors creating the
> same overhead and there's no major benefit to the ASP pages carrying
> some of the logic processing burden?
> For what it's worth - I'll be able to do this update without lock
> issues as it's an internal set of databases and run the script when
> everyone's gone!
>|||This is a small piece of the overall project. I thought I was asking a
more theoretical quesiton - didn't want to bother you all with the
grimy details. But you asked and I'll take any help I can get, so here
we go with the DDL. Sorry if this is more info than needed - I thought
it would be easiest for anyone actually wanting to look at this to be
able to cut and paste tthe whole shebang at once...so below includes
create db, tables and sample data inserts for both databases.
Note I cannot make ANY structural changes to DB_2 - it's the backend of
a large application, I can only write to it. I can, however, make any
modifications to DB_1 necessary that would make porting the data from
it to DB_2 easier.
What I'm trying to do in pseudo-pseudocode:
FOR EACH EMPLOYEE IN DB_1..Employees
Check to see if employee exists in DB_2 using email address to match
(empID in DB_1 is just and ID field and doesn't correpsond to DB_2) -
If they don't exist, retrun an error message, move to next employee
If the employee exists, get employee number (DB_2..EM.Employee)
Look up all degrees for that employee matching DB_1..Employees.empID =
DB_1..Education.empID
For each employee, add degree
lookup DB_2 degree code by matching DB_1.Edcuation.eduDegree to
DB_2..CFGEMDegree.Description
(if the Description does not exist, add it - the code doesn't auto
increment so create number by getting last CFGEMDegree.Code, cast as
int, add 1)
add degree info DB_1..Education to DB_2..Degrees (using
DB_2.CFGEMDegree..Code value instead of DB_1..Education.eduDegree, and
DB_2..EM.Employee instead of DB_1..Education.empID).
As I mentioned, I can figure out how to do this using recordsets and
ASP, and even scripting in SQL using cursors (I think, haven't tried it
yet but it looks straight forward). But if it's both better and
possible to use set based code, please enlighten me because I don't
understand how!
I'm going to have to do similar processing for other employee related
tables - lists of project experience and licenses, but once I'm on the
path with this it should be transferrable.
Many, MANY, thanks in advance!
/*create DB_1*/
CREATE DATABASE [DB_1] ON (NAME = N'DB_1_dat', FILENAME = N'C:\Program
Files\Microsoft SQL Server\MSSQL\data\DB_1.mdf' , SIZE = 2, FILEGROWTH
= 10%) LOG ON (NAME = N'DB_1_log', FILENAME = N'C:\Program
Files\Microsoft SQL Server\MSSQL\data\DB_1.ldf' , SIZE = 2, FILEGROWTH
= 10%)
COLLATE SQL_Latin1_General_CP1_CI_AS
GO
exec sp_dboption N'DB_1', N'autoclose', N'false'
GO
exec sp_dboption N'DB_1', N'bulkcopy', N'true'
GO
exec sp_dboption N'DB_1', N'trunc. log', N'true'
GO
exec sp_dboption N'DB_1', N'torn page detection', N'true'
GO
exec sp_dboption N'DB_1', N'read only', N'false'
GO
exec sp_dboption N'DB_1', N'dbo use', N'false'
GO
exec sp_dboption N'DB_1', N'single', N'false'
GO
exec sp_dboption N'DB_1', N'autoshrink', N'false'
GO
exec sp_dboption N'DB_1', N'ANSI null default', N'false'
GO
exec sp_dboption N'DB_1', N'recursive triggers', N'false'
GO
exec sp_dboption N'DB_1', N'ANSI nulls', N'false'
GO
exec sp_dboption N'DB_1', N'concat null yields null', N'false'
GO
exec sp_dboption N'DB_1', N'cursor close on commit', N'false'
GO
exec sp_dboption N'DB_1', N'default to local cursor', N'false'
GO
exec sp_dboption N'DB_1', N'quoted identifier', N'false'
GO
exec sp_dboption N'DB_1', N'ANSI warnings', N'false'
GO
exec sp_dboption N'DB_1', N'auto create statistics', N'true'
GO
exec sp_dboption N'DB_1', N'auto update statistics', N'true'
GO
USE DB_1
GO
/* tables in DB_1 (origin of data) */
CREATE TABLE [Employees] (
[empID] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[empEmail] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_Employees] PRIMARY KEY CLUSTERED
(
[empID]
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [Education] (
[eduID] [int] IDENTITY (1, 1) NOT NULL ,
[empID] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[eduDegree] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[eduSpecialty] [varchar] (60) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL ,
[eduInstitution] [varchar] (60) COLLATE SQL_Latin1_General_CP1_CI_AS
NULL ,
[eduYear] [smallint] NOT NULL CONSTRAINT [DF_Education_eduYear]
DEFAULT (0),
CONSTRAINT [FK_Education_Employees] FOREIGN KEY
(
[empID]
) REFERENCES [Employees] (
[empID]
) ON DELETE CASCADE ON UPDATE CASCADE
) ON [PRIMARY]
GO
/* Create Database DB_2 - destination of data */
CREATE DATABASE [DB_2] ON (NAME = N'ART_Data', FILENAME = N'C:\Program
Files\Microsoft SQL Server\MSSQL\Data\DB_2.mdf' , SIZE = 100,
FILEGROWTH = 10%) LOG ON (NAME = N'ART_Log', FILENAME = N'C:\Program
Files\Microsoft SQL Server\MSSQL\Data\DB_2_log.ldf' , SIZE = 9,
FILEGROWTH = 10%)
COLLATE SQL_Latin1_General_CP1_CI_AS
GO
exec sp_dboption N'DB_2', N'autoclose', N'false'
GO
exec sp_dboption N'DB_2', N'bulkcopy', N'false'
GO
exec sp_dboption N'DB_2', N'trunc. log', N'true'
GO
exec sp_dboption N'DB_2', N'torn page detection', N'true'
GO
exec sp_dboption N'DB_2', N'read only', N'false'
GO
exec sp_dboption N'DB_2', N'dbo use', N'false'
GO
exec sp_dboption N'DB_2', N'single', N'false'
GO
exec sp_dboption N'DB_2', N'autoshrink', N'false'
GO
exec sp_dboption N'DB_2', N'ANSI null default', N'false'
GO
exec sp_dboption N'DB_2', N'recursive triggers', N'false'
GO
exec sp_dboption N'DB_2', N'ANSI nulls', N'false'
GO
exec sp_dboption N'DB_2', N'concat null yields null', N'false'
GO
exec sp_dboption N'DB_2', N'cursor close on commit', N'false'
GO
exec sp_dboption N'DB_2', N'default to local cursor', N'false'
GO
exec sp_dboption N'DB_2', N'quoted identifier', N'false'
GO
exec sp_dboption N'DB_2', N'ANSI warnings', N'false'
GO
exec sp_dboption N'DB_2', N'auto create statistics', N'true'
GO
exec sp_dboption N'DB_2', N'auto update statistics', N'true'
GO
/*corresponding tables in DB_2 (destination of data) */
USE DB_2
GO
CREATE TABLE [EM] (
[Employee] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[LastName] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[FirstName] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[MiddleName] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[EMail] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [EMPK] PRIMARY KEY NONCLUSTERED
(
[Employee]
) WITH FILLFACTOR = 90 ON [PRIMARY]
)
GO
/*Degrees Table*/
CREATE TABLE [EMDegree] (
[RecordID] [varchar] (32) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[Employee] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
[Seq] [smallint] NOT NULL CONSTRAINT [DF__EMDegree__Seq__04659998]
DEFAULT (0),
[Degree] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Specialty] [varchar] (60) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Institution] [varchar] (60) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
[YearEarned] [smallint] NOT NULL CONSTRAINT
[DF__EMDegree__YearEa__0559BDD1] DEFAULT (0),
CONSTRAINT [EMDegreePK] PRIMARY KEY NONCLUSTERED
(
[RecordID]
) WITH FILLFACTOR = 90 ON [PRIMARY] ,
CONSTRAINT [FK_EMDegree_Employee_EM] FOREIGN KEY
(
[Employee]
) REFERENCES [EM] (
[Employee]
)
) ON [PRIMARY]
GO
/*Degree Codes Table */
CREATE TABLE [CFGEMDegree] (
[Code] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Description] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT
NULL ,
CONSTRAINT [CFGEMDegreePK] PRIMARY KEY NONCLUSTERED
(
[Code]
) WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]
GO
USE DB_1
GO
/*sample data, for DB_1 */
/*employees*/
INSERT INTO [Employees]
([empID],[empEmail])VALUES('CathyG','cat
hyg@.domain.com')
INSERT INTO [Employees]
([empID],[empEmail])VALUES('CharlesM','c
harlesm@.domain.com')
INSERT INTO [Employees]
([empID],[empEmail])VALUES('DennisC','de
nnisc@.domain.com')
INSERT INTO [Employees]
([empID],[empEmail])VALUES('DorisD','Dor
isD@.domain.com')
INSERT INTO [Employees]
([empID],[empEmail])VALUES('jenniferp','
jenniferp@.domain.com')
INSERT INTO [Employees]
([empID],[empEmail])VALUES('MahalieP','m
ahaliep@.domain.com')
INSERT INTO [Employees]
([empID],[empEmail])VALUES('mercedesd','
mercedesd@.domain.com')
INSERT INTO [Employees]
([empID],[empEmail])VALUES('TeresaN','te
resan@.domain.com')
/*degrees*/
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('CharlesM'
,'MFA','Birds','Yale
Design School',99)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('mercedesd
','BS','Gymnastics','Yale',2000)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('DorisD','
BS','comp
sci','uw',1985)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('DorisD','
Masters','architecture','uw',1999)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('jenniferp
','AA','arch','yale',2004)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('TeresaN',
'BA','Urban
Planning','U of W',1975)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('TeresaN',
'BA','Architecture','U
of W',1985)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('MahalieP'
,'BA','Jargon','U
of W',1998)
INSERT INTO [Education]
([empID],[eduDegree],[eduSpecialty],[edu
Institution],[eduYear])VALUES('MahalieP'
,'Masters','Bologne','MIT',2002)
USE DB_2
GO
/*sample data, for DB_2 */
/*employees*/
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('01416','Goldmeir
','Cathy','M','CathyG@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00001','Maple','
Charles','R','charlesm@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00002','Carter',
'Dennis','Richard','dennisc@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00101','Day','Do
ris','E','DorisD@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00201','Peterson
','Jennifer','May','jenniferp@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00005','Porter',
'Mahalie','M','MahalieP@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00203','Davidson
','Mercedes','Lynn','mercedesd@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00007','Spencer'
,'Thomas','John','ThomasS@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00302','Norton',
'Teresa','Ann','TeresaN@.domain.com')
INSERT INTO [EM]
([Employee],[LastName],[FirstName],[Midd
leName],[EMail])VALUES('00009','Smithfie
ld','Herbert','James','HerbertS@.domain.com')
/*degrees*/
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('0
02011','01416','01','Architecture','Univ
ersity
of Virginia',1985)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('0
02012','00001','01','Interior
Design','University of North Carolina',1996)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('0
00021','00002','01','Art
History','Univeristy of San Diego',1962)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('0
00022','00002','02','Architecture','Penn
sylvannia
State University',1970)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('0
00031','00101','01','Design','George
Mason University',1992)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('A
DMIN1024963974101','00101','03','Archite
cture','Virginia
State University',1995)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('S
ANDERSON1012487091374','00101','02','Civ
il
Engineering','University of Florida',1982)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('S
ANDERSON1012487091424','00201','01','Str
uctural
Engineering','University of Florida',1985)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('S
ANDERSON1012487091474','00302','01','Str
uctural
Engineering','College of William and Mary',1988)
INSERT INTO [EMDegree]
([RecordID],[Employee],[Degree],[Special
ty],[Institution],[YearEarned])VALUES('S
ANDERSON1012854189480','00009','01','Soc
iology','Old
Dominion University',1987)
/*degrees code table*/
INSERT INTO [CFGEMDegree] ([Code],[Description])VALUES('01','Bache
lor
of Arts')
INSERT INTO [CFGEMDegree] ([Code],[Description])VALUES('02','Bache
lor
of Science')
INSERT INTO [CFGEMDegree] ([Code],[Description])VALUES('03','Bache
lor
of Architecture')
INSERT INTO [CFGEMDegree] ([Code],[Description])VALUES('04','Maste
rs of
Arts')
INSERT INTO [CFGEMDegree] ([Code],[Description])VALUES('05','Maste
rs of
Science')|||mahalie wrote:

> This is a small piece of the overall project. I thought I was asking a
> more theoretical quesiton - didn't want to bother you all with the
> grimy details. But you asked and I'll take any help I can get, so here
> we go with the DDL. Sorry if this is more info than needed - I thought
> it would be easiest for anyone actually wanting to look at this to be
> able to cut and paste tthe whole shebang at once...so below includes
> create db, tables and sample data inserts for both databases.
> Note I cannot make ANY structural changes to DB_2 - it's the backend of
> a large application, I can only write to it. I can, however, make any
> modifications to DB_1 necessary that would make porting the data from
> it to DB_2 easier.
> What I'm trying to do in pseudo-pseudocode:
> FOR EACH EMPLOYEE IN DB_1..Employees
> Check to see if employee exists in DB_2 using email address to match
> (empID in DB_1 is just and ID field and doesn't correpsond to DB_2) -
> If they don't exist, retrun an error message, move to next employee
> If the employee exists, get employee number (DB_2..EM.Employee)
> Look up all degrees for that employee matching DB_1..Employees.empID =
> DB_1..Education.empID
> For each employee, add degree
> lookup DB_2 degree code by matching DB_1.Edcuation.eduDegree to
> DB_2..CFGEMDegree.Description
> (if the Description does not exist, add it - the code doesn't auto
> increment so create number by getting last CFGEMDegree.Code, cast as
> int, add 1)
> add degree info DB_1..Education to DB_2..Degrees (using
> DB_2.CFGEMDegree..Code value instead of DB_1..Education.eduDegree, and
> DB_2..EM.Employee instead of DB_1..Education.empID).
> As I mentioned, I can figure out how to do this using recordsets and
> ASP, and even scripting in SQL using cursors (I think, haven't tried it
> yet but it looks straight forward). But if it's both better and
> possible to use set based code, please enlighten me because I don't
> understand how!
> I'm going to have to do similar processing for other employee related
> tables - lists of project experience and licenses, but once I'm on the
> path with this it should be transferrable.
>
> Many, MANY, thanks in advance!
>
This is an excellent example of how important it is to "think in SQL".
You've conceptualized the problem in terms of procedural steps and that
mindset unfortunately doesn't help you to write efficient code in SQL's
declarative language. That's why you thought of using a cursor. My
advice is always that only experts should write cursors. Only when you
master "real" SQL do you gain the knowledge and experience to judge
when a cursor makes sense. Until then, it's probably safest to assume
that a cursor is the wrong solution. OK, that's the sermon over with...
:-).
You have at least two steps it seems. 1) Insert the missing degree
codes in DB2. 2) Return a result set containing employee id, email and
the DB2 degree code. We know there are two steps because INSERTs are
always separate operations to SELECTs.
The INSERT is the only tricky part. Given that your degree names in DB1
and DB2 aren't very comparable I'm not clear how you want to handle
them. Possibly you'll need to review the data by eye and decide exactly
what codes are new ones and what already exist. Given that you want to
automate it I'll leave the precise logic to you but here's a partial
exaple to prove that it's possible. Notice that the temp table doesn't
use a cursor - it just helps us generate the code. (In SQL2005 there's
a much neater solution without a temp table, but I'm assuming 2000
here).
DECLARE @.t TABLE (description VARCHAR(50) NOT NULL PRIMARY KEY) ;
/* Pull the missing degrees */
INSERT INTO @.t (description)
SELECT DISTINCT D1.edudegree
FROM DB_1.dbo.education AS D1
LEFT JOIN DB_2.dbo.cfgemdegree AS D2
ON D2.description =
CASE D1.edudegree
WHEN 'BS' THEN 'Bachelor of Science'
WHEN 'BA' THEN 'Bachelor of Arts'
/* ... etc */
ELSE D1.edudegree
END
WHERE D2.code IS NULL ;
/* Generate the code and insert */
INSERT INTO DB_2.dbo.cfgemdegree (description, code)
SELECT T1.description,
RIGHT('0'+CAST(COUNT(*)+
(SELECT CAST(MAX(code) AS INT)
FROM DB_2.dbo.cfgemdegree) AS VARCHAR(2)),2) AS code
FROM @.t AS T1, @.t AS T2
WHERE T1.description >= T2.description
GROUP BY T1.description ;
Once you've sorted out the codes, the main part of the query is easy.
I'm assuming that you have matching descriptions for the degrees in
each database at this stage. If not, then you'll want to repeat
something like the CASE expression in the above INSERT. Alternatively
it might be easier to create an intermediate table to perform the
translation degree code -> description.
SELECT E1.empid, E1.empemail,
CASE WHEN E2.employee IS NULL
THEN 'Error!'
ELSE 'OK'
END AS emp_exists,
E2.employee,
D2.code
FROM DB_1.dbo.employees AS E1
LEFT JOIN DB_2.dbo.em AS E2
ON E1.empemail = E2.email
LEFT JOIN DB_1.dbo.Education AS D1
ON E1.empid = D1.empid
LEFT JOIN DB_2.dbo.cfgemdegree AS D2
ON D2.description = D1.edudegree ;
Hope this helps.
David Portas
SQL Server MVP
--|||David, thanks - sermon appreciated and your examples are exactly what I
was trying to get at without knowing what I was looking for. My
scripting background must be pretty obvious!
I'd just finished writing a temp table with new codes and distinct
degree types to UNION with the cfgemdegree table when I read your
resonse so I do think I'm not beyond hope.
All the aliases and joins have my head spinning but I think I
understand!
THANKS!!
Would it be worth creating a view that includes the
DB_2.dbo.cfgemdegree description in the DB_2.dbo.emdegree table to make
the joins a little more comprehensible or would that be more of a
maintenance issue? E.g. what's the best practice in a case like this?

No comments:

Post a Comment