News and Notes

27 April Outage [resolved]

3087 Views
Permalink: ID #1265
Date: 2015-04-28 09:42 AM
Author: Daryl Herzmann
Tags: outage

The IEM website was very slow or unavailable for a period between about 3:20 and 4:30 PM on Monday, 27 April 2015. This was due to a cascade failure as a backup database server flooded a file server with IO requests and that slowed down another process that reads data from that server. Oye. The primary database server is about an order of magnitude faster than the backup server, so write loads the primary server generates sometimes slows down the backup server.

I am moving the backup database instance to a different disk system to prevent this from happening in the future. Thanks for your patience.