Thursday, March 22, 2012

Bigint vs int

I had set up my tables using BigInts for my Surrogate keys (identity) to
make sure I could never run out of numbers.
I am thinking maybe that was overkill and my slow my system down (I know it
takes up more space, but that is not really an issue).
Are the Bigints going to cause me a problem down the line?
Also, if I start changing the BigInts to Ints in my Database and Stored
Procedure and miss some of the Stored procedures, would that cause me any
problems. I also have the ASP.Net pages that call the Stored Procedure set
to BigInts - would there be a problem if I miss some of these. In
otherwords, would there be a problem with mixing and matching the types
until I get them all taken care of?
Thanks,
Tom>I had set up my tables using BigInts for my Surrogate keys (identity) to
>make sure I could never run out of numbers.

> Are the Bigints going to cause me a problem down the line?
Problems? No. The only issue is that all of your keys are twice as large
as for an int. This will not "hurt" anything, but it is a waste of
resources. Tables with GUIDs as keys can perform just fine also, and they
take 16 byts. It is all into how much disk space and memory you need. It
will be slightly more for bigints.
The real question is whether you really have a need for 2 billion (well, 4
billion if you don't mind negatives after you exhaust the positive values)
values in your tables? I only have had two tables in my life where this was
possible (and only one has reached this amount) and they were both used to
track clicks on a pretty large web click tracking software table. I am
actually more likely to end up with small- and tiny- int keys than I am
bigint. But you have to ask yourself, if there is a possibility of that
kind of data, what do you want to do?
If this is an OLTP system, I would suggest you probably will want to delete
old data before doing this. A billion of anything is quite a bit.

> Also, if I start changing the BigInts to Ints in my Database and Stored
> Procedure and miss some of the Stored procedures, would that cause me any
> problems. I also have the ASP.Net pages that call the Stored Procedure
> set to BigInts - would there be a problem if I miss some of these. In
> otherwords, would there be a problem with mixing and matching the types
> until I get them all taken care of?
Not "big" problems, unless you try to pass a value that fits in a bigint and
not in an int. If you aren't using stored procs, you may have some issues
with type mismatches cause funky plans, but not enough to avoid switching.
----
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"tshad" <tscheiderich@.ftsolutions.com> wrote in message
news:uI2qqGhmFHA.1372@.TK2MSFTNGP10.phx.gbl...
>I had set up my tables using BigInts for my Surrogate keys (identity) to
>make sure I could never run out of numbers.
> I am thinking maybe that was overkill and my slow my system down (I know
> it takes up more space, but that is not really an issue).
> Are the Bigints going to cause me a problem down the line?
> Also, if I start changing the BigInts to Ints in my Database and Stored
> Procedure and miss some of the Stored procedures, would that cause me any
> problems. I also have the ASP.Net pages that call the Stored Procedure
> set to BigInts - would there be a problem if I miss some of these. In
> otherwords, would there be a problem with mixing and matching the types
> until I get them all taken care of?
> Thanks,
> Tom
>|||You can actually find an answer to your question using a calculator. Conside
r
the number of rows inserted daily and you can predict when you 'run out of
numbers'.
My guess: not in your life-time - even with integer.
ML|||"Louis Davidson" <dr_dontspamme_sql@.hotmail.com> wrote in message
news:OWXwVvhmFHA.3312@.tk2msftngp13.phx.gbl...
>
> Problems? No. The only issue is that all of your keys are twice as large
> as for an int. This will not "hurt" anything, but it is a waste of
> resources. Tables with GUIDs as keys can perform just fine also, and they
> take 16 byts. It is all into how much disk space and memory you need. It
> will be slightly more for bigints.
>
But if they are not used in any of my indexes, would there be any
performance problems?

> The real question is whether you really have a need for 2 billion (well, 4
> billion if you don't mind negatives after you exhaust the positive values)
> values in your tables? I only have had two tables in my life where this
> was possible (and only one has reached this amount) and they were both
> used to track clicks on a pretty large web click tracking software table.
> I am actually more likely to end up with small- and tiny- int keys than I
> am bigint. But you have to ask yourself, if there is a possibility of
> that kind of data, what do you want to do?
Actually, I don't know if I would come close, but I don't want to build this
system - which could take a couple of years to finish, only to find out that
the numbers are filling up faster than I expected.

> If this is an OLTP system, I would suggest you probably will want to
> delete old data before doing this. A billion of anything is quite a bit.
I would definately be deleting the records as I go, but the identity field
would still be increasing by one even if I dumped half the records.

>
> Not "big" problems, unless you try to pass a value that fits in a bigint
> and not in an int. If you aren't using stored procs, you may have some
> issues with type mismatches cause funky plans, but not enough to avoid
> switching.
If the issues aren't bad issues than I could just start making the changes
and deal with the differences as I go along.
Thanks,
Tom.

> --
> ----
--
> Louis Davidson - http://spaces.msn.com/members/drsql/
> SQL Server MVP
>
> "tshad" <tscheiderich@.ftsolutions.com> wrote in message
> news:uI2qqGhmFHA.1372@.TK2MSFTNGP10.phx.gbl...
>|||"ML" <ML@.discussions.microsoft.com> wrote in message
news:EB9108DE-3804-46E8-A4FF-D31A58D48666@.microsoft.com...
> You can actually find an answer to your question using a calculator.
> Consider
> the number of rows inserted daily and you can predict when you 'run out of
> numbers'.
> My guess: not in your life-time - even with integer.
>
Probably true.
Which is why I am in the process of looking closer at my tables and making
the changes now, before I go much further.
Thanks,
Tom
> ML|||> But if they are not used in any of my indexes, would there be any
> performance problems?
If you are not indexing an identity value, then what are you using it for?
Identity value are pretty much only good as keys to go and fetch a single
row.

> Actually, I don't know if I would come close, but I don't want to build
> this system - which could take a couple of years to finish, only to find
> out that the numbers are filling up faster than I expected.
This is a valid point. I would suggest that you do some math to see how
long this will last. Like if you do 5 million new rows a year, you have
well over 100 years before you reach 500 million, so 400 years (400
*5million = 2 billion, right? I have worked a lot this w!) before you
will have to switch to use the negative values for the next 400 years :)
But it is up to you to decide just how many rows you will need and size
appropriately ;)

> If the issues aren't bad issues than I could just start making the changes
> and deal with the differences as I go along.
If you actually don't need them in indexes (UNIQUE and PRIMARY KEY
constraints use indexes also) then it really won't make very much difference
at all.

> I would definately be deleting the records as I go, but the identity field
> would still be increasing by one even if I dumped half the records.
True, but you might be able to start your values over to fill in the gaps
once you hit the max value if the data that gets deleted is in increasing
identity value. Especially if you are using this as a surrogate key. Usage
will dictate. If you post your structures and how these values are actually
being used it might help... Good luck :)
----
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"tshad" <tscheiderich@.ftsolutions.com> wrote in message
news:%23vCQUQimFHA.3568@.TK2MSFTNGP10.phx.gbl...
> "Louis Davidson" <dr_dontspamme_sql@.hotmail.com> wrote in message
> news:OWXwVvhmFHA.3312@.tk2msftngp13.phx.gbl...
> But if they are not used in any of my indexes, would there be any
> performance problems?
>
> Actually, I don't know if I would come close, but I don't want to build
> this system - which could take a couple of years to finish, only to find
> out that the numbers are filling up faster than I expected.
>
> I would definately be deleting the records as I go, but the identity field
> would still be increasing by one even if I dumped half the records.
>
> If the issues aren't bad issues than I could just start making the changes
> and deal with the differences as I go along.
> Thanks,
> Tom.
>
>|||On Fri, 5 Aug 2005 18:13:38 -0700, tshad wrote:

>"Louis Davidson" <dr_dontspamme_sql@.hotmail.com> wrote in message
>news:OWXwVvhmFHA.3312@.tk2msftngp13.phx.gbl...
>But if they are not used in any of my indexes, would there be any
>performance problems?
Hi Tom,
Using a surrogate key without including it in an index is pointless.
The reason to add a surrogate key, is to serve as a surrogate for the
business key. This can be useful if other tables link to the table and
the business key is long - instead of duplicating the business key in
all refering tables, you duplicate the surrogate key. This of course
requires that the surrogate key is declared either PRIMARY KEY or
UNIQUE.
All PRIMARY KEY and UNIQUE constraints auto-generate an index. This is
not the case for FOREIGN KEY constraints, but in most cases, it makes
sense to manually add an index on the FOREIGN KEY column(s).
The surrogate key will be in the index for it's own PRIMARY KEY or
UNIQUE constraint, in all refering tables, and in all indexes you define
for the FOREIGN KEY constraints. If the PRIMARY KEY or UNIQUE for the
surrogate key is clustered, there'll be a copy of the surrogate key in
each nonclustered index for the same table as well.
Extra bytes in the surrogate key result in more space taken for all the
tables and all the indexes where the surrogate key is used. That means
that less rows and/or less index entries fit on a page, which will
affect the performance (more logical reads, and a lower cache-hit ratio,
resulting in more physical reads as well).
How much this will affect your total performance is hard to tell. Adding
two bytes to a 3000-byte row is not quite as significant as adding two
bytes to a 20-byte row.
Don't make your surrogate keys longer than they need to be, "just to be
on the safe side", as this WILL affect performance. On the other hand,
don't make your surrogate keys too short in order to save some
performance, as the amount of work it takes you to grow the keys later
is disproportional to the performance saved.
And if you use IDENTITY, then always start numbering at the lowest value
the datatype permits. The standard setting (starting at one) disregards
half the available values. Sure, you can roll over to the negatives once
the positivies are exhausted - but why would you?
Just my 0,02.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||> And if you use IDENTITY, then always start numbering at the lowest value
> the datatype permits. The standard setting (starting at one) disregards
> half the available values. Sure, you can roll over to the negatives once
> the positivies are exhausted - but why would you?
This has always been hard to do in my experience, and I have fought with so
many programmers/dba's about this. I would agree that if you calulate that
you will need over 2 billion rows before like twice your expected system
life expectancy it might be possibl, but most of the time the large number
of digits are too much to try to type while the system is in it's infancy
and people are troubleshooting problems.
----
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:mkk9f19ss8ge7igqclbpmb4gvkmi2e1p85@.
4ax.com...
> On Fri, 5 Aug 2005 18:13:38 -0700, tshad wrote:
>
> Hi Tom,
> Using a surrogate key without including it in an index is pointless.
> The reason to add a surrogate key, is to serve as a surrogate for the
> business key. This can be useful if other tables link to the table and
> the business key is long - instead of duplicating the business key in
> all refering tables, you duplicate the surrogate key. This of course
> requires that the surrogate key is declared either PRIMARY KEY or
> UNIQUE.
> All PRIMARY KEY and UNIQUE constraints auto-generate an index. This is
> not the case for FOREIGN KEY constraints, but in most cases, it makes
> sense to manually add an index on the FOREIGN KEY column(s).
> The surrogate key will be in the index for it's own PRIMARY KEY or
> UNIQUE constraint, in all refering tables, and in all indexes you define
> for the FOREIGN KEY constraints. If the PRIMARY KEY or UNIQUE for the
> surrogate key is clustered, there'll be a copy of the surrogate key in
> each nonclustered index for the same table as well.
> Extra bytes in the surrogate key result in more space taken for all the
> tables and all the indexes where the surrogate key is used. That means
> that less rows and/or less index entries fit on a page, which will
> affect the performance (more logical reads, and a lower cache-hit ratio,
> resulting in more physical reads as well).
> How much this will affect your total performance is hard to tell. Adding
> two bytes to a 3000-byte row is not quite as significant as adding two
> bytes to a 20-byte row.
> Don't make your surrogate keys longer than they need to be, "just to be
> on the safe side", as this WILL affect performance. On the other hand,
> don't make your surrogate keys too short in order to save some
> performance, as the amount of work it takes you to grow the keys later
> is disproportional to the performance saved.
> And if you use IDENTITY, then always start numbering at the lowest value
> the datatype permits. The standard setting (starting at one) disregards
> half the available values. Sure, you can roll over to the negatives once
> the positivies are exhausted - but why would you?
> Just my ? 0,02.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||That helps a lot.
I will probably start moving some of my bigints to ints as everyone is
probably correct in the overkill causing a little more performance problems
just to "be safe".
As you say, the surrogate key would really have to be in the indexes or what
would the point of the surrogate be. I hadn't started putting my indexes
together yet, so didn't notice that I would have to use them in then
indexes. So it would be better for the surrogates to be smaller if
possible.
Thanks,
Tom

> Hi Tom,
> Using a surrogate key without including it in an index is pointless.
> The reason to add a surrogate key, is to serve as a surrogate for the
> business key. This can be useful if other tables link to the table and
> the business key is long - instead of duplicating the business key in
> all refering tables, you duplicate the surrogate key. This of course
> requires that the surrogate key is declared either PRIMARY KEY or
> UNIQUE.
> All PRIMARY KEY and UNIQUE constraints auto-generate an index. This is
> not the case for FOREIGN KEY constraints, but in most cases, it makes
> sense to manually add an index on the FOREIGN KEY column(s).
> The surrogate key will be in the index for it's own PRIMARY KEY or
> UNIQUE constraint, in all refering tables, and in all indexes you define
> for the FOREIGN KEY constraints. If the PRIMARY KEY or UNIQUE for the
> surrogate key is clustered, there'll be a copy of the surrogate key in
> each nonclustered index for the same table as well.
> Extra bytes in the surrogate key result in more space taken for all the
> tables and all the indexes where the surrogate key is used. That means
> that less rows and/or less index entries fit on a page, which will
> affect the performance (more logical reads, and a lower cache-hit ratio,
> resulting in more physical reads as well).
> How much this will affect your total performance is hard to tell. Adding
> two bytes to a 3000-byte row is not quite as significant as adding two
> bytes to a 20-byte row.
> Don't make your surrogate keys longer than they need to be, "just to be
> on the safe side", as this WILL affect performance. On the other hand,
> don't make your surrogate keys too short in order to save some
> performance, as the amount of work it takes you to grow the keys later
> is disproportional to the performance saved.
> And if you use IDENTITY, then always start numbering at the lowest value
> the datatype permits. The standard setting (starting at one) disregards
> half the available values. Sure, you can roll over to the negatives once
> the positivies are exhausted - but why would you?
> Just my ? 0,02.
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||"Louis Davidson" <dr_dontspamme_sql@.hotmail.com> wrote in message
news:ObhzBZimFHA.708@.TK2MSFTNGP09.phx.gbl...
> If you are not indexing an identity value, then what are you using it for?
> Identity value are pretty much only good as keys to go and fetch a single
> row.
You're right. I would have to use the surrogate in the index.

>
> This is a valid point. I would suggest that you do some math to see how
> long this will last. Like if you do 5 million new rows a year, you have
> well over 100 years before you reach 500 million, so 400 years (400
> *5million = 2 billion, right? I have worked a lot this w!) before you
> will have to switch to use the negative values for the next 400 years :)
> But it is up to you to decide just how many rows you will need and size
> appropriately ;)
I will need to do that as I haven't really worked that out.

>
changes
> If you actually don't need them in indexes (UNIQUE and PRIMARY KEY
> constraints use indexes also) then it really won't make very much
difference
> at all.
>
field
> True, but you might be able to start your values over to fill in the gaps
> once you hit the max value if the data that gets deleted is in increasing
> identity value. Especially if you are using this as a surrogate key.
Usage
> will dictate. If you post your structures and how these values are
actually
> being used it might help... Good luck :)
Here is a stripped down version of some of my tables.
What I have is mulitple companies. Each company can have many users.
There would be many positions available. The JobTitle cannot be unique as
different companies could have the same JobTitles - so I need to use an
PositionID as my surrogate key and then index the JobTitle + PositionID.
There are many Applicants and each Applicant can have multiple positions
(ApplicantPosition table). And each ApplicantPosition table can have many
skills (ApplicantSkills table).
The tables could be set up as follows (except I will probably be changing
many of the bigints to ints):
CREATE TABLE [dbo].[Companies] (
[CompanyID] [bigint] IDENTITY (1, 1) NOT NULL Primary Key,
[Name] [varchar] (45) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX Name_ID_Company_Idx on Companies(Name,CompanyID)
CREATE TABLE [dbo].[logon] (
[UserID] [bigint] IDENTITY (1, 1) NOT NULL Primary Key ,
[CompanyID] [bigint] NULL ,
[FirstName] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[LastName] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Email] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[UserName] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Password] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DateCreated] [datetime] NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX Name_ID_Logon_Idx on Companies(UserName,UserID)
CREATE NONCLUSTERED INDEX Name_Password_Logon_Idx on
Companies(UserName,Password)
CREATE TABLE [dbo].[Position] (
[CompanyID] [bigint] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[PositionID] [int] IDENTITY (1, 1) NOT NULL Primary Key,
[CompanyID] [bigint] NULL ,
[JobTitle] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DatePosted] [datetime] NULL ,
[DateUpdated] [datetime] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX Name_ID_Position_Idx on
Companies(JobTitle,PositionID)
CREATE TABLE [dbo].[Applicant] (
[CompanyID] [bigint] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[UserID] [bigint] NOT NULL ,
[ApplicantID] [bigint] IDENTITY (1, 1) NOT NULL Primary Key,
[PositionID] [int] NOT NULL
[Name] [varchar] (45)
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX Name_ID_Applicant_Idx on
Companies(Name,ApplicantID)
CREATE TABLE [dbo].[ApplicantPosition] (
[CompanyID] [bigint] COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[UserID] [bigint] NOT NULL ,
[ApplicantPositionID] [bigint] IDENTITY (1, 1) NOT NULL
ApplicantPositionID,
[ApplicantID] [bigint] NOT NULL ,
[PositionID] [int] NOT NULL ,
[FirstName] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[LastName] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[Status] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX Name_ID_ApplicantPosition_Idx on
ApplicantPosition(LastName,FirstName,App
licantPositionID)
CREATE TABLE [dbo].[ApplicantSkills] (
[CompanyID] [bigint] COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[UserID] [bigint] NULL ,
[ApplicantSkillsID] [bigint] IDENTITY (1, 1) NOT NULL Primary Key,
[ApplicantID] [bigint] NOT NULL ,
[Skill] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Level] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX Skill_ID_ApplicantSkillsID on
ApplicantSkills(Skill,ApplicantSkillsID)
Thanks,
Tom

> --
> ----
--
> Louis Davidson - http://spaces.msn.com/members/drsql/
> SQL Server MVP
>
> "tshad" <tscheiderich@.ftsolutions.com> wrote in message
> news:%23vCQUQimFHA.3568@.TK2MSFTNGP10.phx.gbl...
to
of
(well,
tiny-
bit.
field
Stored
matching
bigint
changes
>
--
to
know
Stored
matching
>

No comments:

Post a Comment