Not a question, just a post about something I ran into today that surprised me a little.
I have a huge dataset - about 700m rows, one column of which is a tinyint. I needed to take an average of that column, so I did this:
select avg(mytinyint) from mytable
which returned:
Msg 8115, Level 16, State 2, Line 6
Arithmetic overflow error converting expression to data type int.
It turns out, if the sum() over the column is bigger than MAX_INT = 2147483647, then avg() will throw this error. This surprised me a little - I guess I was assuming sql server would use bigint internally when computing integer avg, but it uses int.
so you have to do this:
select avg(cast(mytinyint as bigint)) from mytable
what's annoying to me is, it's impossible that an average of type T could be larger or smaller than the max/min allowed value of a type T. This means sql server devs could have internally always used bigint for sum and count, and then returned cast(sum/count as T) from avg() without possibility of overflow (unless sum or count was bigger than the max bigint! you are on your own if that happens i guess!).
Because of this, you have to cast your column to a bigint to make it work if your dataset is large enough, even if the column is only 1 measly byte! kind of a waste of space if you ask me. :)
here's a little example of what I'm talking about, that doesn't require you to import a 700m row dataset :)
declare @.t table (id int)
insert into @.t select 1 union all select 2147483647
select avg(cast(id as bigint)) from @.t -- works
select avg(id) from @.t -- failshmm...:rolleyes:|||Very interesting. I will check it out tomorrow. Thanks for posting it.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment