Today at 13:37 (UTC) we noticed that all API endpoints for https://app.lightdash.cloud were starting to run very slow. This was affecting all users and led to response times of over a minute for the API.
Our first response was to greatly increase the amount of resources available for the Lightdash server (both the number and size of servers). After this change all services remained stable.
We have investigated our logs to understand exactly what actions user’s were taking to make the server so busy. In addition to the volume of users, we noticed a higher number than usual of Databricks users. We already have a fix in testing for improved Databricks performance, which can be expected in the coming hours. For further performance improvements you can follow this milestone in GitHub: https://github.com/lightdash/lightdash/milestone/91