Increase in U10
Incident Report for Code Climate
Resolved
We confirmed the cause of the U10 analysis errors was the Kafka leader election. We haven't seen such an error since approximately 11:15 EDT. Leader elections are an infrequent but expected event in our system, and we have work planned to make clients more resilient to them. In response to this incident, we will increase the priority of that work.
Posted Jun 29, 2016 - 12:38 EDT
Monitoring
We were alerted to an increase in what we call "U10" analysis errors (errors with an unknown or unexpected cause) between 11:07 and 11:12 EDT. We're currently confirming a suspected root cause (leader election in our Kafka cluster), verifying the cluster is stable now, and exploring ways to make our analysis more resilient to that event in the future.
Posted Jun 29, 2016 - 11:18 EDT