Health indicator

pattern state healthy

When do things get colored in red?

The color red is used to indicate an unhealthy status of entities and their metrics.

Hint: We differentiate between indicating problems with the color red and throwing notifications.

Why do things get red?

Obviously, because we want to emphasize unhealthy components. That one's easy. However, we use red also because we want to give our users a click path to the root cause: As a user, any time I see something red on the screen I want to be able to click on it, and continue clicking on red, until I find the root cause of the problem.

What is unhealthy?

In order to use the color red correctly, we have to decide when elements are unhealthy and when they are not.

  • Problem relation: Entities (or their metrics) are unhealthy, if they are part of a problem (i.e. they are in problem context).

  • Open events: If there is an open event on a metric, we deem this unhealthy. For example, if we detect a spike on a host's CPU metric and an event is triggered, this is unhealthy (even if there is no related problem).

  • Domain knowledge: We mark data and metrics as unhealthy, if we know from domain knowledge that they are unhealthy. For example, if we show a list of web requests and some of them end up in 404s, we know that 404s are deemed unhealthy.

  • 100% failure rate: An entity is deemed unhealthy if all the child entities it contains are unhealthy. For example, a "web request method" in Dynatrace is basically an aggregate of all the requests handled with one particular URL in one particular timeframe. If all of those requests fail (e.g. 500s) we decide that the web request method is unhealthy. (Think of this as propagation or bubbling: anytime a collection of elements and all of those elements are unhealthy, the collection itself becomes unhealthy.)

What about yellow and green?

  • Yellow: We do not use yellow as a customer environment state indication color. There are several reasons for this. One of them is that yellow states (i.e. warnings) are almost always ignored by the user and therefore not relevant. To become the better monitoring solution we impose it on ourselves to decide for the user what is important and what is not. Therefore we separate between unhealthy and healthy states - nothing in between.

  • Green: While unhealthy means red (to draw attention), we do not paint everything else in green. Reason for this is simple: as soon as something is healthy the information about it becomes just that: information. Information does not need special attention-drawing color and therefore is basically shown black on white.

Examples

Entity views

Entity views should indicate unhealthy metrics as well as other unhealthy entities that are connected.

Service

Host

Instances

Problem context

In a problem context, only events related to this problem should be shown in red.

Problem context service

Unhealthy entities that are not related to this problem are not shown.

Problem context comparison

Custom charting

Custom charting

UI components

Tables without unhealthy metric in column

Unhealthy entities in tables are indicated with a vertical red line. Information about the unhealthy metric is shown in a second line. If more than one metric is affected, the message 'Multiple problems' is shown.

Tables

Tables with unhealthy metric in column

Tables

Tables with multiple unhealthy entities

Tables

Tabs

Unhealthy metrics, hidden behind tabs need to be red. (High failure rate of a service - not of a single request)

Tabs Tabs

Other components

Failure rate, JavaScript errors, 5xx errors... can be displayed in red.

Charts

Dashboard tiles should indicate unhealthy components too.

Dashboard tiles

Infographics also show the health status of connected entities.

Infogfx

Examples

Tabs styleguide

Chart styleguide

Table styleguide

Dashboard tile styleguide