This is the follow-up post to our Series “Resource Automation”. In this part we will focus on Event Lifecycle Management and the associated challenges we face in terms of resource Automation. By Event Lifecycle Managment, we basically mean what happens if events in the datacenter or cloud occur. Events can be different things but most likely they are of the type “errors” or “warnings”. If an error occurs, this is triggered as an event and necessary steps will be taken.
- Alerting. Time that is necessary to realise that there is a problem. In between 15 minutes to more than an hour.
- Identification. Identifying the cause of a problem and the likely solution.
- Correction. Correcting the Error
- Validation. Validating that the error is now gone
Correcting this error often takes up to a day! So optimising in each phase leads to a signifiant cost reduction.