Monday 31 October 2011

Things That go Dead in the Night

Happy Halloween.

This post will be brief. Many people will think that it's a statement of the obvious but it's amazing how much time can be wasted on issues like this, especially in large organisations.

Sometimes we see systems behave rather strangely during performance tests and the teams managing the tests can sometimes be very inventive in their explanations of the issue. This particular issue has crossed my desk a couple of times (too many) in recent weeks and has been presented as 'a strange unexplained issue', 'file corruption', 'network issue' or 'SAN failure'.

Without building up any further mystery or invoking the supernatural, the answer was simple - the file system was allowed to become full. This can cause all sorts of interesting and varied problems to your system and the execution of any tests. In some cases it will fail completely, in others it might continue almost without anything appearing to be wrong and there are any number of variations in between these extremes.

One common factor is that it is quite normal for there to be no helpful error message like 'disk full' visible because often the file system that becomes full is the one being used to write the log file. This is where the creativity comes into play - people don't see an obvious error, so their imagination gets to work.

Scheduled housekeeping and other such automated cleanup solutions can add to the fun because they can sometimes free up space before somebody notices the problem... so then people may tell you that they are certain that the disk wasn't full because it isn't full now.

So if your test goes dead in the night, either temporarily or permanently, please do check your disk space monitoring (you do monitor your disk space utilisation don't you?). Alternatively, scrutinise your log files very carefully - are there any strange 'silent' periods that should contain messages?

Don't have Nightmares.