Important How-To’s for Open Source Tools such as OpenStack and Eucalyptus

We all remember the day when the Big Boss at Oracle, Larry Elison expressed his feelings against the cloud (see here) and guess what? Oracle is now (finally) getting serious about the cloud – or at least, they try to.

I mean it is now 7 years or more that companies like Microsoft, IBM and others decided that the cloud is probably worth investing in (and, they gained a competitive advantage). Now, Oracle finally decided that they need to get into the cloud rather soon than late (even though they are already late).

I worked with Cloud Computing for 10 years now and looking at the service offers by Oracle on cloud computing, I have to admit that they are at a stage where Microsoft, IBM, Google and Amazon has been about 7 years ago. Their service offering is – bad. The way how their services are designed – sub-standard. They push the topic heavily, as they figured out that they have no other chance if they want to remain a large player. The question is, if it is already too late for them.

Anyway, I wouldn’t trust Oracle when it comes to Cloud Computing. Some sources actually state that Oracle is using nasty tricks to get their customers on the cloud (see here). If this is true, I wouldn’t recommend anyone to use the Oracle cloud.

I bet that Oracle will be an irrelevant player in the cloud – if they stay with the same approach like now.

 

Disclaimer: this is my personal opinion.

Cloud Computing gave us several changes in how we handle IT nowadays. Common tasks that used to take a lot of time received great automation and much more is still about to come. Another interesting development is the “Software defined X”. This basically means that infrastructure elements receive larger automation as well, which ends up being more scale able and better to utilize from applications. A frequent term used lately is the “Software defined Networking” approach, however, there is another one that sounds promising, especially for Cloud Computing and Big Data: Software defined Storage.

Software defined Storage gives us the promise to abstract the way how we use storage. This is especially useful for large scale systems, as no one really wants to care about how to distribute the content to different servers. This should basically be opaque to end-users (software developers). For instance, if you are using a storage system for your website, you want to have an API like Amazon’s S3. there is no need to worry about on which physical machine your files are stored – you just specify the desired region. The back-end system (in this case, Amazon S3) takes care of that.

Software defined Storage explained

Software defined Storage explained

As of the architecture, you simply communicate with the abstraction layer, that takes care of the distribution, redundancy and other factors.

At present, there are several systems available that takes care of that: next to the well-know systems such as Amazon S3, there are also other solutions such as the Hadoop Distributed File System (HDFS) or GlusterFS.

 

Header Image Copyright: nyuhuhuu. Licensed under the Creative Commons 2.0.

//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js //

Apache Software Foundation announced that Apache Storm is now a top level Hadoop project. But what is Apache Storm about? Well, basically Apache Storm is a project to analyse data streams that are near real time. Storm works with messages and analyses what is going on. Storm originates from Twitter, which is using it for their streaming API. Storm is about processing time-critical data and Storm guarantees that your data gets processed. It is basically fault tolerant and scalable. Apache Storm is useful for fraud protection in gambling, banking and financial services, but not only there. Storm can be used wherever real-time or time-critical applications are necessary. At the moment, Storm allows to process 1 million tupels per second and node. This is massive, given the fact that Storm is all about scaling out. Imagine adding 100 nodes! Apache Storm works with Tupels that come from spouts. A spout is a messaging system such as Apache Kafka. Storm supports much more Messaging systems and it can easily be extended by it’s abstraction layer. Storm consists of some major concepts illustrated in the following image: Apache Storm Nimbus is the Master Node, similar to Hadoop‘s Job Tracker. ZooKeeper is used for Cluster coordination and the Supervisor runs the worker process. Each worker process consists of some subsets: an executor that is a thread spanned by the worker and a task itself.

Enjoy this article?

Make sure to subscribe to Cloudvane to receive regular updates here

Major concepts in Apache Storm are 4 elements: streams, spouts, bolts and topologies.

Tuples in Apache Storm

Tuples in Apache Storm

Streams are an unbound sequence of Tuples, a Spout is a source of streams, Bolts process input streams and create new output streams and a topology is a network of Bolts and Spouts.   The header image is provided as Creative Commons license by MattysFlicks.

Since there are many cloud providers out there and I often come across the problem to switch between different platforms (such as Google AppEngine, Amazon S3, …) I have decided to write a single client that will work with all different platforms – or at least as most as possible. I’ve created a project on Google Code here and I will start to write a first draft of interfaces. In the first step, I will include Amazon S3. I hope that more people will join this project and help me creating a great project 😉

This post is part of the Open Source Cloud Computing series. For an Overview, please click on the Tag.

Networking with CloudStack

Networking with CloudStack can be achieved with two topologies: the first topology is handled like with Amazon Web Services (AWS). This enables guest isolation via IP-Filtering. More networking possibilities are delivered with the “advanced” networking options. The advanced option allows multiple networks in a zone. Each individual network in an advanced setup needs to have a specific network type. They can be guest mode, management mode, public mode and storage mode.

Multi-tenancy

CloudStack provides multi-tenancy with the concept of Accounts, Domains and Users. An account is typically a tenant. Each Account may contain more users. A Domain allows the datacenter provider to group similar account types and to ease management of them. CloudStack may be extended by LDAP services such as Active Directory. Another concept is the “Project”. A project is a group of users working on similar tasks. Within a marketing department might be different project such as “product launch web site”. Several users might need to work on this project. Billing can be based either on the user’s consumption or on the project consumption, which allows even more detailed billing on a project basis. Project can also be limited in resource usage.

Header Image Copyright by Horia Varlan

This post is part of the Open Source Cloud Computing series. For an Overview, please click on the Tag.

The Management Server

The Management Server is the entry point to the CloudStack Cloud. It manages all nodes and it exposes the API as well as the graphical user interface (GUI). Typically, the Management Server runs on a dedicated machine or virtual machine. The Management Server uses Tomcat and a MySQL Database for persistence. The Management Server also assigns public and private IP addresses and it also deals with the allocation of storage to the guests as virtual disks. CloudStack allows the management of snapshots, templates and ISO images, which is also provided by the Management Server.

Cloud Infrastructure

The Cloud Infrastructure consists of several layers. The lowest level is the host itself, which is a node where virtual instances run on. Nodes usually get added to a cluster. A cluster contains several nodes and has a primary storage attached. Clusters are part of a Pod, which is typically a hardware rack including a layer-2 switch and a secondary storage. Pods are now part of a “Zone”, which represents a datacenter.

CloudStack Organisation

CloudStack Organisation

Zones are the largest entity in a CloudStack deployment. A zone normally represents a datacenter. Building various zones has the same benefits as building more datacenters: it enables replication and redundancy. CloudStack distinguishes between public and private zones. With this concept, it is possible to provide a public zone to all users and several private zones to specific users like the marketing or accounting department. When a new instance gets started, the user must select in which zone it should be launched. Clusters provide the ability to group similar nodes. They normally share the same or a very similar hardware, the same hypervisors, are in the same subnet and they share a primary storage. In large datacenters, clusters can be built for different hardware groups such as nodes with high memory, others with high CPU and or GPU-based Nodes. There are plenty of possibilities to distinguish between different hardware with the concept of clusters. ISCSI or NFS servers provide primary Storage and it is shared within a cluster. The primary storage stores all disk images of running virtual machines within the cluster. Secondary storage is associated with the zone and it’s purpose is to store templates, snapshots and ISO images.

Header Image Copyright by marya

CloudStack is currently available in the Version 4.0 and was usually initiated by Cloud.com, which was later acquired by Citrix. The source code for CloudStack is available open source and it is maintained as an Apache Project. The target of CloudStack is similar to the other 3 described projects: provide an Infrastructure as a Service Software. CloudStack supports both commercial hypervisors as well as open source hypervisors. From the commercial side, CloudStack currently implements Citrix XenServer and VMware vSphere and as for open source hypervisors there is support for XEN and KVM running on Ubuntu or CentOS. CloudStack is built to run tens of thousands of virtual Servers in geographically distributed regions. There is one managing server for all clusters, which makes cluster-wide management servers unnecessary. CloudStack configures each node automatically regarding storage and networking. Internally managed virtual appliances take care of firewalling, routing, DHCP, VPN access, console proxy, storage access, and storage replication. CloudStack also offers a graphical user interface (GUI) to ease configuration. The CloudStack API also supports Amazon Web Services (AWS) EC2 and S3. CloudStack provides an extensibility API, allowing solution providers to extend the capabilities of CloudStack. CloudStack consists of two major components: the Management Server and the Cloud Infrastructure. The Management Server controls the Cloud Infrastructure and there is typically one of that kind. The Cloud Infrastructure consists of various nodes running virtual Instances and the Management Server manages each of them. The Cloud Infrastructure consists of one or more dedicated Servers, but in a minimal installation it can also be run on the same machine as the Management Server.

CloudStack Overview

CloudStack Overview

Header Image Copyright by Alexandre Dulaunoy

OpenStack has some main components: Interfaces & API, Groups and Users, Networking, Storage, Hosts and Clusters.

  • Interfaces & API. The two main interfaces for interaction with OpenNebula are the Command Line Interface (CLI) and the Graphical User Interface (GUI). The Graphical User Interface is also known as “Sunstone”. OpenNebula offers different APIs for Developers to extend the functionality or built on top of OpenNebula. These APIs are currently available as Amazon Web Services (AWS) APIs and OCCI Implementation.
  • Groups and Users. OpenNebula allows different groups and users. It is also possible to integrate with different services such as LDAP and Microsoft’s Active Directory. Multi-tenancy is possible by default, which eases billing and accounting. OpenNebula comes with the following standard users: administrators, regular users, public users and service users. Administrators are in charge of administrative tasks within OpenNebula, regular users can use the functionality of OpenNebula in the self-service Portal. Public users are restricted users that may only use a subset of the functionality and service users are users that can use the APIs or Interfaces in OpenNebula.
  • Networking. The Networking interface in OpenNebula is fully extensible, which allows almost any integration in existing data centers. There is also support for VLAN and Open vSwitch.
  • Storage. OpenNebula supports different storage systems such as the file system storage, distributed network storage or block storage.
  • Hosts. OpenNebula supports the following hypervisors: Xen, KVM and VMware on the host. A host has three main components: the host management, the cluster management and the host-monitoring component. The host management is implemented by the “onehost” and allows common operations on hosts such as initial setup or the machine lifecycle management. The cluster management allows placing a host in a specific cluster. This is implemented by the “onecluster” command. Host monitoring is done with the information driver (IM). Monitoring allows administrators to gather information about the health of a host.
  • Clusters. Clusters are a pool of hosts that share networking and data stores. A Cluster can be compared to a zone. Clusters are typically fulfilling different needs such as the production/testing differences.

OpenNebula allows the grouping of different Hosts into a virtual data center (VDC) within a cluster. Different Hosts can also be grouped into zones that allow better administration for similar hosts.

Header Image Copyright by European Southern Observatory

This post is part of the Open Source Cloud Computing series. For an Overview, please click on the Tag.

OpenNebula is an open source Software for Infrastructure as a Service Solutions, which started as a research project in 2005. The first public release was available in 2008. Ubuntu, Debian and OpenSuse currently support OpenNebula. The project is funded by European Institutions. OpenNebula provides Amazon Web Services (AWS) EC2 and Elastic Block Storage (EBS) APIs, as well as OGF OCCI (Open Cloud Computing Interface) APIs. OpenNebula also provides a self-service Portal to their users. OpenNebula has several third-party tools for Software Stack automation and it is easy to integrate a marketplace for applications in OpenNebula platforms. Administrators have their own portal, which is called “Sunstone”, and OpenNebula provides a Unix-inspired command line interface (CLI). OpenNebula Marketplace allows virtual appliances to be managed and run in OpenNebula environments.

Billing is basically easy as there is a fine-grained accounting and monitoring system available. Account Controls and quota management allows administrators to set limits on compute, storage and network utilization. To enable this, OpenNebula supports multi-tenancy built into the system. OpenNebula can be extended by popular directory services such as LDAP or Active Directory.

OpenNebula distinguishes between clusters and virtual data centers. Clusters are a pool of hosts that share data stores. Clusters also support virtual networks dedicated to load balancing, high availability and high performance computing. Virtual data centers are isolated virtual infrastructures where an administrator can manage the compute, storage and network capacity. OpenNebula is built for high availability with a persistent database as a backend.

A key challenge for OpenNebula is to allow the management of large enterprise data centers. To fulfill these needs, a complete life cycle for virtual resource management is possible and can be extended with a hooking system. The virtual infrastructure can be controlled, monitored and accounted to the correct tenants.

Header Image Copyright by Bob Familiar

This post is part of the Open Source Cloud Computing series. For an Overview, please click on the Tag.

Walrus

Walrus is also called “WS3” and is the storage service provided by Eucalyptus. The Storage Service provides simple storage functionality, which is exposed by ReSTful and Soap APIs. Walrus takes care of storing the virtual machine images, storing the snapshots and serving Files. As with all other public facing Services in Eucalyptus, these Services are based on the Amazon Web Services API.

Containers in Walrus Storage are called „Buckets“ and they have to be unique across accounts, just like it is with Amazon Web Services (AWS). Some naming restrictions are:

  • Containers can contain lowercase letters, numbers, periods (.), underscores (_), and dashes (-)
  • Container Names must start with a number or letter
  • The Length of a Name must be between 3 and 255 characters long
  • It is not allowed to use an IP-Address as Name (e.g., 265.255.5.4)

The maximum File Size in a Walrus Container is 5 Terabytes and Files can either be public or private. If the Container should be deleted, a container must be empty, which means that all files have to be deleted prior to deleting the container. Files are identified via unique Keys represented by Uniform Resource Identifiers (URIs).

Common Actions performed on the Walrus storage are the creation of containers, store data in containers, download data and grant or deny permissions. These Actions can be performed via the ReSTful or SOAP Interfaces. The Walrus Storage distinguishes two major read options: consistent read or eventually consistent read. The later one is faster but might server inconsistent data whereas the first one might have higher latency but data is always consistent.

Storage Controller

The Storage Controller is comparable to the Elastic Block Storage (EBS) for Amazon Web Services. Elastic Block Storage is a fast storage for virtual Image Files. The Storage Controller takes care of the creation of persistent EBS devices. Block Storage Devices are typically provided over over the ATAoverEthernet or iSCSI protocol to the instances.

The header image is provided by  jar (away for a while) under the creative commons licence.