We received a call this morning from IBM (Softlayer) that our server wasn't responding shortly after 5am. After rebooting and checking the logs, it was determied that we had a faulty network card which we replaced. We run dual network cards to prevent this situation but, the one card that failed also took the other card offline which isn't supposed to happen.
To help mitigate future downtime, we have upgraded our alert services to monitor our servers every 15 minutes, vs every 90 minutes, and to monitor specific websites now instead of specific services. If there is an issue (server load or other), we will recieve call from IBM/Softlayer suppport. This is a new server as of Oct. 2013, and this was an unexpected hardware failure.
Our IBM/Softlayer support manager also recommended we perform what's called a Kernal upgrade to the operating system. We have scheduled that procedure for this coming Saturday, the 25th at 12am (midnight). The great thing about modern technology these days is, this process will not require taking the server offline so, services will continue as normal.
Again, our apologies for the unexpected downtime and we have put upgraded services in place for deeper monitoring, and even faster responses.
If you have any questions, please feel free to call my office at 501-588-1979.
Regards,
Eric Caldwell