Adding the post-mortem to the top of this incident for posterity:
The root cause of the incident was a cascading failure that resulted in the downing of a Ceph cluster in FRA1. Our initial efforts to recover without physical intervention were unsuccessful, and in an effort to provide enough resources to bring the individual nodes back online, a member of our datacenter team moved on-site to perform physical augments on the cluster. Once the augment was completed, we carefully began bringing the cluster back online under close monitoring. During this time, we identified a failure state caused by the accruing backlog and paused to allow it to clear.
Previously: Service has now been restored to Kraftwerk. We are awaiting a larger post-mortem from Digital Ocean on why this outage was so lengthy. We apologize for the unintended downtime. If you continue to experience any issues with your account please open a support ticket with us and we will be happy to take a look.
Previously: Our datacenter has given us an ETA of 2 hours or less to resolution. We will update as soon as service is restored.
Previously: Sadly we still don’t have connectivity restored by Digital Ocean. Following updates at http://status.digitalocean.com/incidents/8sk3mbgp6jgl is the best way to stay apprised of the situation. We continue to monitor the situation in hopes of restoring service as soon as possible.
Previously: We are still awaiting a fix on this incident from our datacenter and monitoring the status at http://status.digitalocean.com/incidents/8sk3mbgp6jgl. Unfortunately connections to Kraftwerk are still impacted while the work continues.
Previously: Our datacenter in Frankfurt is experiencing issues with network storage causing partial outages on the Kraftwerk server located there. We are monitoring and will update when it is resolved.