drunkmonkey wrote:elbitjusticiero wrote:TheProwler wrote:I don't know shit about your design or what table was lost.
This is the important part.
Not really. It just means he can't walk them through step-by-step on how to fix it. He was still spot on about the failure.
That's pretty presumptuous. I also do not know what happened, what the state was, etc. I choose to believe that the admins did all they could to restore the system in a reasonable amount of time. That's the critical part. I'm sure that they may have been able to recover almost everything, but it may have taken days or weeks. I, for one, appreciate the decision to bring back the system as quickly as they did. This is NOT a financial system with SOX compliance, there's no critical medical data stored, no one is going to suffer some horrible fate because of the rollback.
TheProwler made some assumptions that I haven't seen verified. Do you know what the IT budget is? Are you assuming they have the resources for a real-time multi-colo snapshot backend system? Do you know what the issue was? I'm sure that for any complex system that you design, if I have enough information about the components and how they interact, that I can hypothesize a failure scenario where you'll be lucky to be able to recover from tape. It's all a question of how much resources can be devoted to maintaining operational uptime. You can do a lot with an infinite budget, but I assume that's not the case.