Friday, September 1, 2017

Hawkular Alerts with Jaeger / OpenTracing

Two recent blogs discuss how OpenTracing instrumentation can be used to collect application metrics:

A further interesting integration can be the addition of Hawkular Alerts to the environment.

As the previous blog and demo discuss, Hawkular Alerts is a generic, federated alerts system that can trigger events, alerts, and notifications from different, independent systems such as Prometheus, ElasticSearch, and Kafka.

Here we can combine the two. Let's follow the directions for the OpenTracing demo (using the Jaeger implementation) and add Hawkular Alerts.

What this can show is OpenTracing application metrics triggering alerts when (as in this example) OpenTracing spans encounter a larger-than-expected error rates.

(Note: these instructions assume you are using Kubernetes / Minikube - see the Hawkular OpenTracing blogs linked above for more details on these instructions)

1. START KUBERNETES

Here we start minikube giving it enough resources to run all of the pods necessary for this demo. We also start up a browser pointing to the Kubernetes dashboard, so you can follow the progress of the remaining instructions.
  • minikube start --cpus 4 --memory 8192
  • minikube dashboard
2. DEPLOY PROMETHEUS
  • kubectl create -f https://raw.githubusercontent.com/coreos/prometheus-operator/master/bundle.yaml
  • kubectl create -f https://raw.githubusercontent.com/objectiser/opentracing-prometheus-example/master/prometheus-kubernetes.yml
    • (Note: The above command might not work depending on your version - if you get errors, download a copy of prometheus-kubernetes.yml and edit it, changing “v1alpha1” to “v1”)
3. DEPLOY JAEGER
  • kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/all-in-one/jaeger-all-in-one-template.yml
The following will build and deploy the Jaeger example code that will produce the OpenTracing data for the demo:
  • mkdir -p ${HOME}/opentracing ; cd ${HOME}/opentracing
  • git clone git@github.com:objectiser/opentracing-prometheus-example.git
  • cd opentracing-prometheus-example/simple
  • eval $(minikube docker-env)
  • mvn clean install docker:build
  • kubectl create -f services-kubernetes.yml
    • (Note: The above command might not work depending on your version - if you get errors, edit services-kubernetes.yml and edit it, changing “v1alpha1” to “v1”)
4. DEPLOY HAWKULAR-ALERTS AND CREATE ALERT TRIGGER

The following will deploy Hawkular Alerts and create the trigger definition that will trigger an alert when the Jaeger OpenTracing data indicates an error rate that is over 30%
  • kubectl create -f https://raw.githubusercontent.com/hawkular/hawkular-alerts/master/dist/hawkular-alerts-k8s.yml
  • Use “minikube service hawkular-alerts --url” to determine the Hawkular Alerts URL and point your browser to the path “/hawkular/alerts/ui” at that URL (i.e. http://host:port/hawkular/alerts/ui).
  • From the browser page running the Hawkular Alerts UI, enter a tenant name in the top right text field (“my-organization” for example) and click the “Change” button.
  • Navigate to the “Triggers” page (found in the left-hand nav menu).
  • Click the kabob menu icon at the top and select “New Trigger”.
  • In the text area, enter the following to define a new trigger that will trigger alerts when the Prometheus query shows that there is a 30% error rate or greater in the accountmgr or ordermgr servers:
  •  {
       "trigger":{
          "id":"jaeger-prom-trigger",
          "name":"High Error Rate",
          "description":"Data indicates high error rate",
          "severity":"HIGH",
          "enabled":true,
          "autoDisable":false,
          "tags":{
             "prometheus":"Test"
          },
          "context":{
             "prometheus.url":"http://prometheus:9090"
          }
       },
       "conditions":[
          {
             "type":"EXTERNAL",
             "alerterId":"prometheus",
             "dataId":"prometheus-test",
             "expression":"(sum(increase(span_count{error=\"true\",span_kind=\"server\"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind) / sum(increase(span_count{span_kind=\"server\"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind)) > 0.3"
          }
       ]
    }
  • Now navigate back to the “Dashboard” page (again via the left-hand nav menu). From this Dashboard page, look for alerts when they are triggered. We'll next start generating the data that will trigger these alerts.
5. GENERATE SOME SAMPLE OPEN TRACING APPLICATION DATA
  • export ORDERMGR=$(minikube service ordermgr --url)
  • ${HOME}/opentracing/opentracing-prometheus-example/simple/genorders.sh
Once the data starts to be collected, you will see alerts in the Hawkular Alerts UI as error rates become over 30% in the past minute (as per the Prometheus query).

If you look at the alerts information in the Hawkular Alerts UI, you’ll see the conditions that triggered the alerts, such as:
Time: 2017-09-01 17:41:17 -0400
External[prometheus]: prometheus-test[Event [tenantId=my-organization, id=1a81471d-340d-4dba-abe9-5b991326dc80, ctime=1504302077288, category=prometheus, dataId=prometheus-test, dataSource=_none_, text=[1.504302077286E9, 0.3333333333333333], context={service=ordermgr, version=0.0.1}, tags={}, trigger=null]] matches [(sum(increase(span_count{error="true",span_kind="server"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind) / sum(increase(span_count{span_kind="server"}[1m])) without (pod,instance,job,namespace,endpoint,transaction,error,operation,span_kind)) > 0.3]
Notice the “ordermgr” service (version "0.0.1") had an error rate of 0.3333 (33%) which caused the alert since it is above the allowed 30% threshold.

At this point, the Hawkular Alerts UI provides the ability for system admins to log notes about the issue, acknowledge the alert and mark the alert resolved if the underlying issue has been fixed. These lifecycle functions (also available as REST operations) are just part of the value add of Hawkular-Alerts.

You could do more complex things such as only trigger this alert if this Prometheus query generated results AND some other condition was true (say, ElasticSearch logs match a particular pattern, or if a Kafka topic had certain data). This demo merely scratches the surface, but does show how Hawkular Alerts can be used to work with OpenTracing to provide additional capabilities that may be found useful by system administrators and IT support personnel.

Friday, August 11, 2017

Hawkular Alerts with Prometheus, ElasticSearch, Kafka

Federated Alerts

Hawkular Alerts aims to be a federated alerting system. That is to say, it can fire alerts and send notifications that are triggered by data coming from a number of third-party external systems.
Thus, Hawkular Alerts is more than just an alerting system for use with Hawkular Metrics. In fact, Hawkular Alerts can be used independently of Hawkular Metrics. This means you do not even have to be using Hawkular Metrics to take advantage of the functionality provided by Hawkular Alerts.
This is a key differentiator between Hawkular Alerts and other alerting systems. Most alerting systems only alert on data coming from their respective storage systems (e.g. the Prometheus Alert Engine alerts only on Prometheus data). Hawkular Alerts, on the other hand, can trigger alerts based on data from various systems.

Alerts vs. Events

Before we begin, a quick clarification is in order. When it is said that Hawkular Alerts fires an "alert" it means some data came into Hawkular Alerts that matched some conditions which triggered the creation of an alert in Hawkular Alerts backend storage (which can then trigger additional actions such as sending emails or calling a webhook). An "alert" typically refers to a problem that has been detected, and someone should take action to fix it. An alert has a lifecycle attached to it - alerts are opened, then acknowledged by some user who will hopefully fix the problem, then resolved when the problem can be considered closed.
However, there can be conditions that occur that do not represent problems but nevertheless are events you want recorded. There is no lifecycle associated with events and no additional actions are triggered by events, but "events" are fired by Hawkular Alerts in the same general manner as "alerts" are.
In this document, when it is said that Hawkular Alerts can fire "alerts" based on data coming from external third-party systems such as Prometheus, ElasticSearch, and Kakfa, this also means events can be fired as well as alerts. What this means is you can record any event (not just a "problem", aka "alert") that can be gleaned from this data coming from external third-party systems.
See alerting philosophy for more.

Demo

There is a recorded demo found here that will illustrate what this document is describing. After you read this document, you should watch the demo to gain further clarity on what is being explained. The demo is the multiple-sources example which you can run yourself found here (note: at the time of writing, this example is only found in the next branch, to be merged in master soon).

Prometheus

Hawkular Alerts can take the results of Prometheus metric queries and use the queried data for triggers that can fire alerts.
This Hawkular Alerts trigger will fire an alert (and send an email) when a Prometheus metric indicates our store’s inventory of widgets is consistently low (as defined by the Prometheus query you see in the "expression" field of the condition):
"trigger":{
   "id": "low-stock-prometheus-trigger",
   "name": "Low Stock",
   "description": "The number of widgets in stock is consistently low.",
   "severity": "MEDIUM",
   "enabled": true,
   "tags": {
      "prometheus": "Prometheus"
   },
   "actions":[
      {
      "actionPlugin": "email",
      "actionId": "email-notify-owner"
      }
   ]
},
"conditions":[
   {
      "type": "EXTERNAL",
      "alerterId": "prometheus",
      "dataId": "prometheus-dataid",
      "expression": "rate(products_in_inventory{product=\"widget\"}[30s])<2 class="pl-pds" span="" style="box-sizing: border-box; color: #032f62;">"
   }
 ]

Integration with Prometheus Alert Engine

As a side note, though not demostrated in the example, Hawkular Alerts also has an integration with Prometheus' own Alert Engine. This means the alerts generated by Prometheus itself can be forward to Hawkular Alerts which can, in turn, be used for additional processing, perhaps for use with data that is unavailable to Prometheus that can tell Hawkular Alerts to fire other alerts. For example, Hawkular Alerts can take Prometheus alerts as input and feed it back into other conditions that trigger on the Prometheus alert along with ElasticSearch logs.

ElasticSearch

Hawkular Alerts can examine logs stored in ElasticSearch and trigger alerts based on patterns that match within the ElasticSearch log messages.
This Hawkular Alerts trigger will fire an alert (and send an email) when ElasticSearch logs indicate sales are being lost due to inventory being out of stock of items (as defined by the condition which looks for a log category of "FATAL" which happens to mean a lost sale in the case of the store’s logs). Notice dampening is enabled on this trigger - this alert will only fire when the logs indicate lost sales every 3 times.
"trigger":{
   "id": "lost-sale-elasticsearch-trigger",
   "name": "Lost Sale",
   "description": "A sale was lost due to inventory out of stock.",
   "severity": "CRITICAL",
   "enabled": true,
   "tags": {
      "Elasticsearch": "Localhost instance"
   },
   "context": {
      "timestamp": "@timestamp",
      "filter": "{\"match\":{\"category\":\"inventory\"}}",
      "interval": "10s",
      "index": "store",
      "mapping": "level:category,@timestamp:ctime,message:text,category:dataId,index:tags"
   },
   "actions":[
      {
      "actionPlugin": "email",
      "actionId": "email-notify-owner"
      }
   ]
},
"dampenings": [
   {
      "triggerMode": "FIRING",
      "type":"STRICT",
      "evalTrueSetting": 3
   }
],
"conditions":[
   {
      "type": "EVENT",
      "dataId": "inventory",
      "expression": "category == 'FATAL'"
   }
]

Kafka

Hawkular Alerts can examine data retrieved from Kafka message streams and trigger alerts based that Kafka data.
This Hawkular Alerts trigger will fire an alert when data over a Kakfa topic indicates a large purchase was made to fill the store’s inventory (as defined by the condition which evaluates to true when any number over 17 is received on the Kafka topic):
"trigger":{
   "id": "large-inventory-purchase-kafka-trigger",
   "name": "Large Inventory Purchase",
   "description": "A large purchase was made to restock inventory.",
   "severity": "LOW",
   "enabled": true,
   "tags": {
      "Kafka": "Localhost instance"
   },
   "context": {
      "topic": "store",
      "kafka.bootstrap.servers": "localhost:9092",
      "kafka.group.id": "hawkular-alerting"
   },
   "actions":[ ]
},
"conditions":[
   {
      "type": "THRESHOLD",
      "dataId": "store",
      "operator": "GT",
      "threshold": 17
   }
]

But, Wait! There’s More!

The above only mentions the different ways Hawkular Metrics retrieves data for use in determining what alerts to fire. What is not covered here is the fact that Hawkular Alerts can stream data in the other direction as well - Hawkular Alerts can send alert and event data to things like an ElasticSearch server or a Kafka broker. There are additional examples (mentioned below) that can demonstrate this capability.
The point is Hawkular Alerts should be seen as a shared, common alerting engine that can be shared for use by multiple third-party systems and can be used as both a consumer and producer - as a consumer of the data from external third-party systems (which is used to fire alerts and events) and as a producer to send notifications of alerts and events to external third-party systems.

More Examples

Take a look at the Hawkular Alerts examples for more examples on using external systems as data to be used for triggering alerts. (note: at the time of writing, some examples are currently in the next branch such as the Kafka ones).

Tuesday, August 1, 2017

Hawkular Alerts 2.0 UI WIP

Hawkular Alerts 2.0 UI

A quick 10-minute demo has been published to illustrate the progress that was made by the hAlerts team on the new UI.

This is a work-in-progress, and things will change, but the UI is actually functional now.

It is best the video is viewed in full screen mode. The video link is: https://www.youtube.com/watch?v=bb9SaJudPlU



Monday, November 21, 2016

Hawkular OpenShift Demo - Running Outside OpenShift

Below is a quick 8 minute demo of the Hawkular OpenShift Agent.

For more information, see: https://github.com/hawkular/hawkular-openshift-agent


Monday, November 14, 2016

Hawkular OpenShift Agent - First Demo

Below is a quick 10 minute demo of the Hawkular OpenShift Agent.

For more information, see: https://github.com/hawkular/hawkular-openshift-agent




Thursday, October 20, 2016

Hawkular OpenShift Agent is Born

A new Hawkular agent has been published on github.com - Hawkular OpenShift Agent.

It is implemented in Go and the main use case for which it was created is to be able to collect metrics from OpenShift pods. The idea is you run Hawkular OpenShift Agent (HOSA) on an OpenShift node and HOSA will listen for pods to come up and down on the node. As pods come online, the pods will tell the agent what (if any) metrics should be collected. As pods go down, the agent will stop collecting metrics from all endpoints running on that pod.

Today, only Prometheus endpoints (using either the binary or text protocol) can be scraped with Jolokia endpoints next on the list to be implemented. So HOSA will be able to support collecting metrics from either type of endpoint in the near future.

For more information - how to build and configure it - refer to the Hawkular OpenShift Agent README.

Monday, October 17, 2016

Pulling in a Go Dependency From a Fork, Branch, or Github PR using Glide

While writing a Go app, I decided to use Glide as the dependency management system (I tried Godep first, but even on the first day of using it, my dependencies were getting screwed up, lost, builds would mysteriously break - so I decided to switch to Glide, which seems much better).

I was using the Hawkular Go Client library because I needed to write metric data to Hawkular Metrics. So in my glide.yaml, I had this:
- package: github.com/hawkular/hawkular-client-go
  subpackages:
  - metrics
Which simply tells Glide that I want to use the latest master of the client library (I'm not using a versioned library yet. I guess I should start doing that).

Anyway, I needed to add a feature to the Hawkular Go Client. So I forked the git repository, created a branch in my fork where I implemented the new feature, and submitted a Github pull request from my own branch. Rather than wait for the PR to be merged, I wanted Glide to pull in my own branch in my forked repo so I could immediately begin using the new feature. It was as simple as adding three lines to my glide.yaml and running "glide update":
- package: github.com/hawkular/hawkular-client-go
  repo: git@github.com:jmazzitelli/hawkular-client-go.git
  vcs: git
  ref: issue-8

  subpackages:
  - metrics
This tells Glide that the Hawkular Go client package is now located at a different repository (my fork located at github.com) under a branch called "issue-8".

Running "glide update" pulled in the Hawkuar Go client from my fork's branch and placed it in my vendors/ directory. I can now start using my new feature in my Go app without waiting for the PR to be merged. Once the PR is merged, I can remove those three lines, "glide update" again, and things should be back to normal.