Lightdash Cloud instability (slow response times)
Incident Report for Lightdash

Today at 13:37 (UTC) we noticed that all API endpoints for were starting to run very slow. This was affecting all users and led to response times of over a minute for the API.

Our first response was to greatly increase the amount of resources available for the Lightdash server (both the number and size of servers). After this change all services remained stable.

We have investigated our logs to understand exactly what actions user’s were taking to make the server so busy. In addition to the volume of users, we noticed a higher number than usual of Databricks users. We already have a fix in testing for improved Databricks performance, which can be expected in the coming hours. For further performance improvements you can follow this milestone in GitHub:

Posted May 09, 2023 - 15:43 UTC

This incident has been resolved.
Posted May 09, 2023 - 15:33 UTC
We are monitoring performance
Posted May 09, 2023 - 15:22 UTC
The Lightdash API a is experiencing very slow response times for all users, leading to some actions taking minutes or not executing.

We're currently investigating.
Posted May 09, 2023 - 13:48 UTC
This incident affected: Lightdash Cloud (Lightdash Cloud (US)).