Incidents represent time-sensitive threats to your infrastucture, such as URL downtime, missed heartbeat, and a low-level infrastructure alert.
The current on-call person is alerted when an incident is created.
If the current on-call person doesn't acknowledge the incident within a specified time period the entire team is alerted.
You can configure on-call escalations and escalation periods for each Monitor and Heartbeat separately.
We are monitoring facebook.com every 30 seconds.
Our on-call escalation is configured to alert the entire team if the current on-call person doesn't acknowledge the incident within 3 minutes.
facebook.com goes down at 3:25AM
New incident is created
Better Uptime calls, sends an SMS and an e-mail to the current on-call person
The current on-call person is asleep
They don't acknowledge the incident and continue dreaming 😴
After 3 minutes pass, Better Uptime alerts (call, SMS, e-mail) the entire team
After you acknowledge the incident no other team members will get alerted.
You can manually acknowledge the incident in the upper-right corner on the incident detail page.
When Better Uptime calls you, you are prompted to press 1 to acknowledge the incident.
If you don't want to acknowledge the incident — you may be without your computer and can't start resolving the problem right away — just hang up and other team members will get alerted.
To acknowledge the incident click the Acknowledge incident button in the e-mail you receive when a new incident is started.
Once the incident is acknowledged you will be able to resolve it.
Incidents are automatically resolved after the endpoint becomes available again.
You can manually resolve the incident by clicking Resolve in the upper-right corner of the incident detail page or wait until it's resolved automatically.
We take a screenshot and save a raw response of your website every time an incident caused by downtime happens. They can be extremely useful when figuring out exactly what happened.
You can find the screenshot and the response in the headline on an incident detail page.
We may not take a screenshot in some circumstances when they are not available. Example: when a response timeout is reached, no request response is typically present.
You can collaborate on resolving an incident with your colleagues using comments. Upload screenshots, share your insights, and collaborate on resolving the incident together.
You can use Markdown in the comments.
Post mortems are short summaries of incidents.
They typically describe why the incident happened, an estimated cost, and how to prevent similar incidents in the future.
The best teams write and share their post mortems after each significant incident.
To write a post mortem, just comment on the incident including "post mortem" in the comment. See the example below.