The Mostly Color Channel: January 2018

Friday, January 12, 2018

Annotating detected outliers

The so-called Twitter Anomaly Detection function for R is excellent but also very minimalistic. The input is a two-column data frame where the first column consists of the timestamps and the second column contains the observations. In addition to a plot, the output is a data frame comprising timestamps, values, and optionally, expected values.

In practice, we usually have some semantic information that we would also like to include in the output, so we do not have to refer back to the original data. Fortunately, there is a quick-and-dirty way to add a description to the outlier data frame.

We start with the annotated data frame containing at least columns with the timestamps, the observations, and factors providing contextual or semantic information on each observation. We then create a simple data frame with just the first two columns, which we pass to the outlier detection function.

We can write a trivial function that for each outlier finds the row index in the simple data frame and looks up the semantic information in the annotated data frame:

AddDescription <- function(series1, series2, outliers) {
 quantity <-  lengths(outliers$anoms[1])
 if (quantity < 1) return (NULL)
 else {
   result <- NULL
  for (i in 1:quantity) {
   rowIndex <- which(series1$timestamp == outliers$anoms$timestamp[i])
   newRow <- data.frame(outliers$anoms$timestamp[i],
    outliers$anoms$anoms[i],
    as.character(series2$note[rowIndex]))
   result <- rbind(result, newRow)
  }
  colnames (result) <- c("timestamp", "outlier_value", "description")
  return (result)
 }
}

This function is just an elementary example. It is easy to add to each outlier more detailed information you can compile from the full data frame.

Time series with outliers at green markers

outliers with descriptions
	`timestamp`	`outlier_value`	`description`
`1`	`2017-01-17 06:53:00`	`209`	`gear display flashing`
`2`	`2017-09-19 09:10:00`	`206`	`gear shift failure`
`3`	`2017-11-17 07:26:00`	`211`	`check engine lamp on`

Dates are a sore point of analytics: they alway get you. When no time zone is specified, i.e., tz = "", R assumes the local time zone. In the data frame returned by Twitter's AnomalyDetectionTs functions, the time column has UTC as the time zone. Therefore, the following statement is useful after the call to AnomalyDetectionTs:

anomalies$anoms$timestamp <- as.POSIXct(anomalies$anoms$timestamp, tz = "")

About this blog

The Internet is an amalgam of forms blurred under epistemological pressures. In Søren Kierkegaard’s words, under this flat shower of leveled information, where everybody is interested in everything and nothing is too trivial or too important, people just accumulate information and postpone decisions indefinitely, i.e., nobody takes action and nobody is responsible for truth — there is no mastery, just gossip. He called this the æsthetic sphere of existence, exhorting us to evolve to the ethical sphere, where we do not just accumulate information but take action and make commitments. Blogs are instruments to overcome flatness by creating opportunities for vertical activities. In this sense this blog is a view from my window — a collection of tidbits I judged relevant to computational color science and in general to the promotion of scientific excellence in areas of strategic importance for the future of research, economy and society.

The Mostly Color Channel

Thursday, January 25, 2018

Perceptual Similarity Sorting Experiment

Friday, January 12, 2018

Annotating detected outliers

Search This Blog

Featured Post

Meta-Palette

Understanding Color

Cognitive Aspects of Color

The Color Thesaurus...

Popular Posts

Blog Archive

Labels

Contributors

Blogroll

About this blog

Privacy Policy