Delay in request processing and event delivery
Resolved
Aug 30 at 10:46am EDT
All services are now fully restored, and the backlog has been processed. The system is back to normal operator conditions.
We are continuing to monitor. We will provide a full post-mortem and have started implementation remediation to prevent similar issues from occurring again. More communication to follow.
Thank you for your patience
Affected services
Dashboard
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Updated
Aug 29 at 10:29pm EDT
The interruption in delivery has been restored and delivery is back to normal.
Also the backfilling of the 30 days of events in the Dashboard is near completion. We will update once it's finished.
Affected services
Dashboard
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Updated
Aug 29 at 09:11pm EDT
We are experiencing an interruption in delivery currently. We are looking into it and will provide another update shortly.
Affected services
Dashboard
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Updated
Aug 29 at 10:59am EDT
The delay in ingestion has been cleared. Ingestion is now back to normal. All systems are back online.
There is still a significant backlog of messages to process. We are scaling services now to handle the backlog as fast as possible.
We will update again shortly.
Affected services
Dashboard
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Updated
Aug 29 at 10:14am EDT
The current ingestion is experiencing up to a 15 minute delay.
Regarding the events page in the dashboard, reloading of the remaining 27 days for organizations with 30 days of retention is now 35% complete.
Affected services
Dashboard
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Updated
Aug 29 at 03:17am EDT
Regarding the events page in the dashboard, repopulating of data for the last 3 days is complete. The remaining 27 days for organizations with 30 days of retention is now repopulating.
Affected services
Dashboard
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Updated
Aug 28 at 05:29pm EDT
Requests and events continue to be processed as normal within a few seconds, and we've restored all platform features, such as retries.
Some historical events may not have been processed yet, and only a few projects, particularly those that routinely exceed their project throughput, should be missing events. Those are still being processed.
Lastly, the events page in the dashboard is re-populating the data for the last 3 days. As of now, 1 day out of 3 has fully populated. Once the last 3 days is completely restored, we'll restore the remaining 27 days for organizations with 30 days of retention.
Affected services
Dashboard
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Updated
Aug 28 at 04:29pm EDT
We've managed to stabilize a subset of the traffic. Most sources should process requests within a few seconds, but we have roughly 10% of sources that are still impacted by the delay.
We're still working on the issue.
Affected services
Retries
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime
Created
Aug 28 at 10:16am EDT
We're experiencing a delay in processing requests and delivering events. No data has been lost.
Affected services
Bulk & Automatic Retries, Pausing & Unpausing
Delivery HTTP Uptime