The Resource Automation Series is finished so far – but it will continue once there is new exciting content. There is still a lot to be covered regarding resource automation. With this post, I want to give an Overview of what was discussed.
In the first post, I discussed if and how Cloud Computing will change the datacenter design. We were talking about standardisation in the datacenter and how Cloud Computing affects this. The next post was about how automations in datacenters. Cloud Computing needs us to think different since we don’t talk about 10 or 100 servers but rather tens of thousands. This also leads to various problems, which are discussed in this blog post.
Resource Automation also requires a cooperation between Developer Teams and Operation Teams. This is often referenced as “DevOps“. How deployment can be done, requires different strategies. Resource Automation leads to several problems that need to be addressed. One approach is to apply Datacenter Event Lifecycle Management. Resource Automation should lead to self Service IT, or also called “Software defined Datacenter” (VMware). Monitoring is an important task for resource automation in the Cloud. With Resource Automation, we have several possibilities to automate processes. How to identify processes, we discuss datacenter automation and integration in this post.
I hope you liked this tutorial on Cloud Computing and resource automation. Should you have any ideas, feel free to comment below. Our next series will be about Software Architectures for the Cloud so stay tuned ;).
Posts
This is the last post on our series about resource automation in the Cloud. Today, we will look at datacenter automation and integration.
- Existing systems
- Processes
- Environments
Key areas for Datacenter Automation are:
- Reducing labor costs by allowing reduction or reallocation of people
- Improving service levels through faster measurement and reaction times
- Improving efficiency by freeing up skilled resources to do smarter work
- Improving performance and availability by reducing human errors and delays
- Improving productivity by allowing companies to do more work with the same resources
- Improving agility by allowing rapid reaction to change, delivering new processes and applications faster
- Reducing reliance on high-value technical resources and personal knowledge
A key driver for Datacenter Integration and Automation is SOA (Service Oriented Architectures). This allows much better integration of different services all over the datacenter. Drivers for Integration are:
- Flexibility. Rapid response to change, enabling shorter time to value
- Improved performance and availability. Faster reactions producing better service levels
- Compliance. Procedures are documented, controlled and audited
- Return on investment. Do more with less, reduce cost of operations and management
If you decide to automate tasks in your datacenter, there are some areas where you should start:
- The most manual process
- The most time-critical process
- The most error-prone processes
- Break donw high-level processes into smaller, granular components
- Identify where lower-level processes can be „packaged“ and reused in multiple high-level components
- Identify process triggers (e.g. End-user requests, time events) and end-points (e.g. Notifications, validation actions
- Identify linkages and interfaces between steps of each process
- Codify any manual steps wherever possible
This is a follow-up post on the series about resource automation in the Cloud. In this part, we will look at monitoring. Monitoring is not the easiest thing to do in distributed systems. You have to monitor a large number of instances. The challenge is to find out what you want to monitor. If you run your application (such as a SaaS-Platform) you might not be interested in the performance of a single instance but in the performance of the application itself. You might not be interested in the I/O performance of an instance but again of the overall experience your application delivers. To find that metrics, you have to invest significant experience into monitoring.
Let us look at how monitoring works basically. There are 2 key concepts to monitor instances:
- Agent-less Monitoring
- Agent-based Monitoring
- Remotly analyse the System with a remote API (e.g. Log Data on File System)
- Analyse Network Packets: SNMP (Simple Network Management Protocol) is often used for that
- No client agend to deploy
- Lightweight
- No application to install or run on the client. Typically doesn‘t consume resources on the System
- Lower cost
- Option to close or lock down a system, don‘t allow to install new Applications
What is bad about agent-less monitoring?
- No in depth metrics for granular analysis
- Can be affected by networking issues
- Security
On the other hand, we can use Agend-based monitoring.
- Need to deploy agents to systems
- Each running System needs to have an Agent installed in order to work. This can be automated
- Internal certification for deployment on production systems in some companies
- Up-front Cost
- Requires Software or custom Development
What is good about Agend-based monitoring?
- Deeper and more granular data collection, E.g. About performance of a specific application and the CPU Utilization
- Tighter service integration
- Control applications and services on remote nodes
- Higher network security
- Encrypted proprietary protocols
- Lower risk of downtime
- Easier to react, e.g. If „Apache“ has high load
This is a follow-up Post to the Series on Resource Automation in the Cloud. This time we will talk about self-service IT. Self-service IT is an important factor for Automation. It basically enables users to solve their problems and not to talk about the technology. For instance, if the marketing department needs to run a website for a campaign, the IT should enable the department to start this “out of the box”: an empty website template (e. g. WordPress-based) should be started. Furthermore, scaling should be enabled, since the load will change over time. The website should also be branded in the corporate design. The target of self-service IT is that the IT department provides Tools and Services that gives the users more independence.
Another sample for self-service IT is the launch of virtual instances. This is a rather easy thing to accomplish as this can be handled by self-service platforms such as OpenStack, Eucalyptus or various vendor platforms. To achieve the one explained initially, much more work is necessary. If you plan to ease the job of your marketing guys, you would have to prepare not only virtual images but also scripts and templates to build the website. However, more and more self-service platforms emerge nowadays and they will definitely come with more features and possibilities over time.
This is the follow-up post to our Series “Resource Automation”. In this part we will focus on Event Lifecycle Management and the associated challenges we face in terms of resource Automation. By Event Lifecycle Managment, we basically mean what happens if events in the datacenter or cloud occur. Events can be different things but most likely they are of the type “errors” or “warnings”. If an error occurs, this is triggered as an event and necessary steps will be taken.
- Alerting. Time that is necessary to realise that there is a problem. In between 15 minutes to more than an hour.
- Identification. Identifying the cause of a problem and the likely solution.
- Correction. Correcting the Error
- Validation. Validating that the error is now gone
Correcting this error often takes up to a day! So optimising in each phase leads to a signifiant cost reduction.
The next few posts will deal with resource automation in the Cloud. As with other tasks, this is not a thing you solve in one day nor will it be easy. In the first post on resource automation, we will look at the main issues related to resource automation before we dig deeper into resource automation itself.
The major areas of resource automation problems are:
- Service Level Monitoring
- Event Lifecycle Management
- IT Self Service
Service Level Monitoring
- Management Tools
- Monitors
- Logs
- Services
There are several reasons why building SLAs manually is not that effective:
- Higher staff levels and traing costs
- Higher service desk costs
- Increased risk of errors and downtime
- Slower remediation and downtime
- Slower remediation and missed service levels
- Increased business pressure on IT service