Monday, March 19, 2012

Big inconsistency!

Hi,

i have a sqlserver 2005 mergereplication.

some facts:

SQL Server 2005 9.0.3175 Standard Edition german

1 Distributor / Publisher (same Server)

10 Subscriber

30 GB Database with 200 Tables ; in sum 35,000,000 rows

Every subscriber can change data.

Sometimes (i think in 1 of 1000 cases) following curious error occured:

the subscriber update data (for example varchar value), but the changes aren′t replicated to the publisher.

In case of our 35Mil rows, this is not acceptable.

What can I do? I have no specified error, no errormessage, but the data isn′t correct.

There is big inconsistency!!!

Any assistance gratefully received,

Best Regards

Thorsten Ueberschaer

1. can you reproduce the problem? i.e. can you make a change to a subscriber and for the change NOT to appear on the Publisher and/or other subscribers within reasonable time [with network connectivity, and several runs of the Distribution Agents] ?

2. is this a permissions problem? obviously the DA's need access rights to be able to replay changes elsewhere.

3. have all subscribers been initialised ? should all start with same data [otherwise an UPDATE will fail later if there is no matching PK]

4. have you got sensible identities ? for INSERTs you should review the identity settings

sp_addmergearticle
[ , [ @.vertical_partition = ] 'vertical_partition' ]
[ , [ @.auto_identity_range = ] 'auto_identity_range' ]
[ , [ @.pub_identity_range = ] pub_identity_range ]
[ , [ @.identity_range = ] identity_range ]
[ , [ @.subscriber_upload_options = ] subscriber_upload_options ]
[ , [ @.identityrangemanagementoption = ] 'identityrangemanagementoption' ]
and do you have "NOT FOR REPLICATION" to prevent frictions ?

5. what collision resolver are you using? - perhaps a merge collision is being lost by resolver

- suggest more logging [e.g. change agent profile to verbose]

HTH

Dick

|||

hello

1. no i can′t reproduce the problem.....

2. the replicationagents work in sysadmin-mode, in my opinion is this not a permission problem

3. yes, all subscribers have been initialised with the same data

4. all tables have uniqueidentifier-pk and -rowguid column, all fk′s are "not for replication"

5. 195 of our tables solve the collision with the standard collision resolver, 5 with "later wins", but i think, that is not a conflict problem, because, after a conflict all subscriber must have the same date (the winner or the loser)

any idea?

thanks

thorsten ueberschaer

|||

Find a row that has not replicated. Pick off the value in the rowguid column. Search the MSmerge_contents table and see if a row exists for this rowguid value. That's step #1. If it isn't logged into MSmerge_contents by the triggers, it is never going to replicate.

|||

hi

this was my first idea

but in msmerge_contents a row exists for my rowguid value

bye

|||

What is the generation value? Have you tried to validate your subscriptions? What is the max generation that exists on the publisher and the subscriber in question?

|||

the generationvalue is 218016

the maxgeneration on publisher is 4647424 and on subscriber is 4049693

i don′t understand my problem

sometimes it works, sometimes not

no exception, no error

it′s not reproduceable

i don′t like it :-(

|||

If it happens more than once, it is reproducible somehow. It's a matter of figuring out the exact combination of actions that causes it. No, they aren't always straightforward. Tracking some of these things down can take weeks or months to isolate the problem. I have 6 open support cases dealing with issues in the merge metadata, the newest one has been open for over 2 weeks and while we can see the bad entries and know how to manually fix them, we still can't come up with a way to reproduce the problem. The oldest one I have on the books has been there for over 6 months and a repro still hasn't been isolated.

Then there is the one with subscribers mysteriously getting blown out which took over a month and a half before we finally figured out what the heck was happening and came up with an artificial way of reproducing it. Turns out that DNS was handing out the same IP address to more than one machine, from the perspective of SQL Server, but the DNS entry was not updated right away. This caused SQL Server to connect to MachineA which DNS resolved to an IP address which was actually MachineB thereby causing replication to think that everything was screwed up and to remove the subscription information. Try reliably reproducing something like that which only creates itself under some very specific VPN connection cicumstances which have to occur within a specific time sequence, between cycles of the merge engine.

Now that you have the generation number. Take a look in MSmerge_genhistory on the publisher and subscriber and see if an entry for this generation exists.

|||

i have this problem sinc april :-(

i took a look in msmerge_genhistory, and the result is strange:

on publisher: the data is correct, there is a row

on the subscriber which generate the update: the data is also correct

on next subscriber, the data isn′t correct, there is no entry in msmerge_genhistory

|||

I'm sorry, can't help with April. If you have this issue going back 6 months, that makes it extraordinarily difficult to try and track it down. If there isn't a row in MSmerge_genhistory for that generation, it means that the merge engine did not see it as a change to be propagated for some reason and therefore the change did not move. Again, I don't know why it would do that and it isn't going to be easy to find, because you have 6 months worth of change in this environment that the answer is buried within.

Have you ever run validation? If so, what were the results?

No comments:

Post a Comment