When do things get colored in red?
The color red is used to indicate an unhealthy status of entities and their metrics.
Hint: We differentiate between indicating problems with the color red and throwing notifications.
Why do things get red?
Obviously, because we want to emphasize unhealthy components. That one's easy. However, we use red also because we want to give our users a click path to the root cause: As a user, any time I see something red on the screen I want to be able to click on it, and continue clicking on red, until I find the root cause of the problem.
What is unhealthy?
In order to use the color red correctly, we have to decide when elements are unhealthy and when they are not.
Problem relation: Entities (or their metrics) are unhealthy, if they are part of a problem (i.e. they are in problem context).
Open events: If there is an open event on a metric, we deem this unhealthy. For example, if we detect a spike on a host's CPU metric and an event is triggered, this is unhealthy (even if there is no related problem).
Domain knowledge: We mark data and metrics as unhealthy, if we know from domain knowledge that they are unhealthy. For example, if we show a list of web requests and some of them end up in 404s, we know that 404s are deemed unhealthy.
100% failure rate: An entity is deemed unhealthy if all the child entities it contains are unhealthy. For example, a "web request method" in Dynatrace is basically an aggregate of all the requests handled with one particular URL in one particular timeframe. If all of those requests fail (e.g. 500s) we decide that the web request method is unhealthy. (Think of this as propagation or bubbling: anytime a collection of elements and all of those elements are unhealthy, the collection itself becomes unhealthy.)
What about yellow and green?
Yellow: We do not use yellow as a customer environment state indication color. There are several reasons for this. One of them is that yellow states (i.e. warnings) are almost always ignored by the user and therefore not relevant. To become the better monitoring solution we impose it on ourselves to decide for the user what is important and what is not. Therefore we separate between unhealthy and healthy states - nothing in between.
Green: While unhealthy means red (to draw attention), we do not paint everything else in green. Reason for this is simple: as soon as something is healthy the information about it becomes just that: information. Information does not need special attention-drawing color and therefore is basically shown black on white.
Entity views should indicate unhealthy metrics as well as other unhealthy entities that are connected.
In a problem context, only events related to this problem should be shown in red.
Unhealthy entities that are not related to this problem are not shown.
Tables without unhealthy metric in column
Unhealthy entities in tables are indicated with a vertical red line. Information about the unhealthy metric is shown in a second line. If more than one metric is affected, the message 'Multiple problems' is shown.
Tables with unhealthy metric in column
Tables with multiple unhealthy entities
Unhealthy metrics, hidden behind tabs need to be red. (High failure rate of a service - not of a single request)
Dashboard tiles should indicate unhealthy components too.
Infographics also show the health status of connected entities.