Earlier today we received an automated alert that app.lightdash.cloud was unavailable and returning 502 errors. The reason for this error was that Lightdash was slowing down due to the amount of usage in Lightdash Cloud at the time. The slower response times in Lightdash triggered an automated process to restart the Lightdash servers, usually this should only trigger in the case that the server has already crashed. In this incident, this was a mistake and the server was simply running more slowly than expected.
To resolve the issue, we've added much more resource to our Lightdash Cloud servers to prevent slow response times. We've also increased the threshold to automatically restarting the servers in the case of very slow response times.