Know your server

A post by Jeff Atwood got my attention last week. He reports how he had difficulties to find a problem in StackOverflow beta. They kept getting deadlock logs during the day and in the end the problem was related to MS SQL Server blocking readers:

You can attach the profiler to catch the deadlock event and see the actual commands that are deadlocking. I did that, and found there was always one particular SQL command involved:

UPDATE [Posts]
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0

If it detects a deadlock, SQL Server forces one of the deadlocking commands to lose — specifically the one that uses the least resources. The statement on the losing side varied, but in our case the losing deadlock statement was always a really innocuous database read, like so:

FROM [Posts]
WHERE [ParentId] = @p0

To fix the problem they had to instruct MS SQL Server to use dirty reads on each those selects. What kept hammering my head all these days was not how can modern databases still block readers. But the lesson we can take from the episode. Independently of server and its architectural limitations, the important thing is to understand it and also to know your application’s environment very well.

In the case of StackOverflow Jeff found that a possible solution was dirty reads. If the transactions performed by the website took a long time, dirty reads would not be an option. But knowing his application very well he knew that was not the case. Choosing dirty reads would have no consequences. Independently of what ACID purists would say. And I confess I turned my nose when I first read "dirty read".

Thanks to its multi-generational architecture Firebird does not block readers. So it’s natural that this problem may seem strange to us. But each architecture has its pros and cons. This same multi-generational architecture, if not correctly used, can be a problem. It’s the case of long running transactions or "stuck transactions".

Such transactions cause record versions to accumulate in the database. (These versions can also be called generations. It’s multi-generational architecture, remember?) The more versions accumulate more work Firebird has to do to find the correct version of each record. If stuck for long enough Firebird server can get slow to the point of looking like it’s dead. How much time depends of the database load and the server’s hardware. Could be a month, could be an hour.

I’ve seen it happen many times. In some of those occasions I saw managers argue for hours that MS SQL Server or Oracle don’t have such problem. The fact is that they have other architectural details that you must take into account in your application. For example, some databases block readers. Simply switching databases, as some of those managers suggested, is not a solution.

Although having StackOverflow powered by Firebird would not be a bad idea. ;)

Leave a comment