Back to overview
Downtime

Delay in request processing and event delivery

Aug 28 at 10:16am EDT
Affected services
Delivery HTTP Uptime
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Dashboard

Resolved
Aug 30 at 10:46am EDT

All services are now fully restored, and the backlog has been processed. The system is back to normal operator conditions.

We are continuing to monitor. We will provide a full post-mortem and have started implementation remediation to prevent similar issues from occurring again. More communication to follow.

Thank you for your patience

Updated
Aug 29 at 10:29pm EDT

The interruption in delivery has been restored and delivery is back to normal.

Also the backfilling of the 30 days of events in the Dashboard is near completion. We will update once it's finished.

Updated
Aug 29 at 09:11pm EDT

We are experiencing an interruption in delivery currently. We are looking into it and will provide another update shortly.

Updated
Aug 29 at 10:59am EDT

The delay in ingestion has been cleared. Ingestion is now back to normal. All systems are back online.
There is still a significant backlog of messages to process. We are scaling services now to handle the backlog as fast as possible.

We will update again shortly.

Updated
Aug 29 at 10:14am EDT

The current ingestion is experiencing up to a 15 minute delay.

Regarding the events page in the dashboard, reloading of the remaining 27 days for organizations with 30 days of retention is now 35% complete.

Updated
Aug 29 at 03:17am EDT

Regarding the events page in the dashboard, repopulating of data for the last 3 days is complete. The remaining 27 days for organizations with 30 days of retention is now repopulating.

Updated
Aug 28 at 05:29pm EDT

Requests and events continue to be processed as normal within a few seconds, and we've restored all platform features, such as retries.

Some historical events may not have been processed yet, and only a few projects, particularly those that routinely exceed their project throughput, should be missing events. Those are still being processed.

Lastly, the events page in the dashboard is re-populating the data for the last 3 days. As of now, 1 day out of 3 has fully populated. Once the last 3 days is completely restored, we'll restore the remaining 27 days for organizations with 30 days of retention.

Updated
Aug 28 at 04:29pm EDT

We've managed to stabilize a subset of the traffic. Most sources should process requests within a few seconds, but we have roughly 10% of sources that are still impacted by the delay.

We're still working on the issue.

Created
Aug 28 at 10:16am EDT

We're experiencing a delay in processing requests and delivering events. No data has been lost.