This is a follow-up post on the series about resource automation in the Cloud. In this part, we will look at monitoring. Monitoring is not the easiest thing to do in distributed systems. You have to monitor a large number of instances. The challenge is to find out what you want to monitor. If you run your application (such as a SaaS-Platform) you might not be interested in the performance of a single instance but in the performance of the application itself. You might not be interested in the I/O performance of an instance but again of the overall experience your application delivers. To find that metrics, you have to invest significant experience into monitoring.
Let us look at how monitoring works basically. There are 2 key concepts to monitor instances:

  • Agent-less Monitoring
  • Agent-based Monitoring
If we talk about Agent-less monitoring, we have two possibilities:

  • Remotly analyse the System with a remote API (e.g. Log Data on File System)
  • Analyse Network Packets: SNMP (Simple Network Management Protocol) is often used for that
What is good about agent-less monitoring?
  • No client agend to deploy
  • Lightweight
  • No application to install or run on the client. Typically doesn‘t consume resources on the System
  • Lower cost
  • Option to close or lock down a system, don‘t allow to install new Applications

What is bad about agent-less monitoring?

  • No in depth metrics for granular analysis
  • Can be affected by networking issues
  • Security

On the other hand, we can use Agend-based monitoring.

With Agend-based monitoring, a Software Component is running on each Server.  The software collects Data on the Server about different Metrics such as CPU Load, IO throughoutput, Application Performance, … The Software now sends this Data to a Master Server, which is in charge of aggregating the data. This gives an overall overview of the system performance. If Agent is not managed by a monitoring station, the System Performance might be influenced. This leads to the fact that a Lightweight Agend is necessary.
What is bad about Agend-based monitoring?
  • Need to deploy agents to systems
  • Each running System needs to have an Agent installed in order to work. This can be automated
  • Internal certification for deployment on production systems in some companies
  • Up-front Cost
  • Requires Software or custom Development

What is good about Agend-based monitoring?

  • Deeper and more granular data collection, E.g. About performance of a specific application and the CPU Utilization
  • Tighter service integration
  • Control applications and services on remote nodes
  • Higher network security
  • Encrypted proprietary protocols
  • Lower risk of downtime
  • Easier to react, e.g. If „Apache“ has high load