How to: Start and Stop Cloudera on Azure with the Azure CLI


The Azure CLI is my favorite tool to manage Hadoop Clusters on Azure. Why? Because I can use the tools I am used to from Linux now from my Windows PC. In Windows 10, I am using the Ubuntu Bash for that, which gives me all the major tools for managing remote Hadoop Clusters. One thing I am doing frequently, is starting and stopping Hadoop Clusters based on Cloudera. If you are coming from Powershell, this might be rather painfull for you, since you can only start each vm in the cluster sequentially, meaning that a cluster consisting of 10 or more nodes is rather slow to start and might take hours! In the Azure CLI I can easily do this by specifiying “–nowait” and all runs in parallel. The only disadvantage is that I won’t get any notifications on when the cluster is ready. But I am doing this with a simple hack: ssh’ing into the cluster (since I

read more How to: Start and Stop Cloudera on Azure with the Azure CLI

RACEing to agile Big Data Analytics


I am happy to announce the development we did over the last month within Teradata. We developed a light-weight process model for Big Data Analytic projects, which is called “RACE”. The model is agile and resembles the know-how of more than 25 consultants that worked in over 50 Big Data Analytic projects in the recent month. Teradata also developed CRISP-DM, the industry leading process for data mining. Now we invented a new process for agile projects that addresses the new challenges of Big Data Analytics. Where does the ROI comes from? This was one of the key questions we addressed when developing RACE. The economics of Big Data Discovery Analytics are different to traditional Integrated Data Warehousing economics. ROI comes from discovering insights in highly iterative projects run over very short time periods (4 to 8 weeks usually) Each meaningful insight or successful use case that can be actioned generates ROI. The total ROI is a sum of all the

read more RACEing to agile Big Data Analytics

What is necessary to achieve interoperability in the Cloud?


What is necessary to achieve interoperability in the Cloud? As described in the previous sections, 3 major interoperability approaches arise. First, there is the standardisation approach, next there is the middleware approach and last but not least there is the API approach. This is also supported by [Hof09] and [Gov10]. In addition to that, [Gov10] suggests building abstraction layers in order to achieve interoperability and transportability. There are two main aspects where interoperability is necessary. One level is the management level. This deals with handling the virtual machine(s), applying load balancing, setting DNS settings, auto scaling features and other tasks that come with IaaS solutions. However, this level is mainly necessary in IaaS solutions as PaaS solutions already take care of most of it. The other level is the services level. The services level is basically everything that comes with application services such as messaging, data storage and databases. Figure: Cloud interoperability approaches These requirements are described in several relevant

read more What is necessary to achieve interoperability in the Cloud?

Discussion of existing standards and frameworks for Cloud Interoperability


As discussed in the previous sections, there are several standards and interoperability frameworks available. Most of them are infrastructure related. The standards and frameworks can generally be clustered into 3 groups. The first group is the “Standards” group, which consists of OCCI an the DMTF standards. The second group is the “Middleware” group. This group contains mOSAIC, the PaaS Semantic Interoperability Framework and Frascati. The third group is the “Library” group. This group is a concrete implementation that provides a common API for several cloud platforms. The two projects in here are Apache jClouds and Apache libcloud. Figure: Interoperability in the Cloud OCCI provides great capabilities for infrastructure solutions, but there is nothing done for individual services. The same applies to the standards proposed by the distributed management task force. As with the libraries and frameworks, a similar picture is drawn. Apache jClouds and Apache libcloud provide some interoperability features for infrastructure services. As for platform services, only the blob

read more Discussion of existing standards and frameworks for Cloud Interoperability

PaaS Semantic Interoperability Framework (PSIF)


PaaS Semantic Interoperability Framework (PSIF) Loutas et al. defines semantic interoperability as “the ability of heterogeneous Cloud PaaS systems and their offerings to overcome the semantic incompatibilities and communicate” [Lou11]. The target of this framework is to give developers the ability to move their application(s) and data seamlessly from one provider to another. Loutas et al. propose a three-dimensional model addressing semantic interoperability for public cloud solutions [Lou11]. Fundamental PaaS Entities The fundamental PaaS entities consist of several models: the PaaS System, the PaaS Offering, an IaaS-Offering, Software Components and an Application [Lou11]. Levels of Semantic Conflicts Loutas et al. [Lou11] assumes that there are 3 major semantic conflicts that can be raised for PaaS offerings. The first one is an interoperability problem between the metadata definitions. This occurs when different data models describe one PaaS offering. The second problem is when the same data gets interpreted differently and the third is when different pieces of data have similar meaning.

read more PaaS Semantic Interoperability Framework (PSIF)

Interoperability in the cloud: mOSAIC and Frascati


mOSAIC mOSAIC is a European project supported by the European union [Dan10]. The target of the project was to build a unified Application programing interface (API) for Cloud services that is not only available in Java but also for other languages. mOSAIC is platform and language agnostic. It supports a large number of platforms for this approach. The mOSAIC framework itself is a middleware that runs on top of each cloud provider and abstracts provider specifics. The platform then exposes it’s own API to clients. The mOSAIC project is built in a layered architecture. On the lowest level, there is the native API or protocol. This is either a ReST, SOAP, RPC or a language-specific library. On the next level, a driver API is found. This API can now be exchanged easily with different platforms such as Amazon’s S3. On top of that is an interoperability-API that allows programming language interoperability. Cloud resources can be access via the connector API.

read more Interoperability in the cloud: mOSAIC and Frascati

Interoperability libraries in the cloud: Apache jClouds and Libcloud


Apache jClouds Apache jclouds is a framework provided by the Apache Software Foundation. The framework is written in Java and serves the purpose to provide an independent library for typical cloud operations. At present (November 2014), Apache jclouds provides 2 kinds of services: a compute service and a blob service [Apa14b]. Apache jclouds can be used from Java and Clojure. The library offers an abstraction for more than 30 cloud providers, including AWS, Azure, OpenStack and Rackspace. Apache jclouds is primarily built for infrastructure interoperability. As for the platform layer, only blob storage is currently supported. The focus of jclouds is to support a large variety of platforms over implementing a large variety of services. The Blob storage in Apache jclouds works with the concept of Containers, Folders and Blobs. The library supports access control lists for objects. The upload of Multipart objects is also supported, which allows jclouds to handle large files. [Apa14c] Libcloud [Apa14d] Apache libcloud is similar

read more Interoperability libraries in the cloud: Apache jClouds and Libcloud

Standards in the Cloud: Open Cloud Computing Interface and DMTF


Open Cloud Computing Interface (OCCI) The Open Grid Forum created the Open Cloud Computing Interface (OCCI). They claim to have one of the first standards in the cloud. OCCI was initially built to deliver portability and interoperability for IaaS platforms. In its initial implementation, it was used for different tasks around deployment, scaling and monitoring for virtual machines in the cloud. The library also supports common tasks for other cloud layers such as SaaS and PaaS [Ope14a]. OCCI is not a specific library that enables interoperability and portability. Hence, it is a definition of standards that can be implemented by individual platforms. The standard exists of 3 elements: the core description, the infrastructure description for the IaaS domain and a rendering description. The rendering is used to provide a REST HTTP service [Ope14b]. OCCI is implemented by a large number of cloud platforms. Major platforms such as Apache Cloudstack, OpenStack, OpenNebula and Eucalyptus implement that standard. However, large public cloud

read more Standards in the Cloud: Open Cloud Computing Interface and DMTF

Current initiatives in cloud computing interoperability


Gonidis [Gon11] defines an approach on how to enable interoperability for PaaS solutions. It is about building a standard for existing cloud platforms. However, this is a challenge given that every platform provider has it’s own proprietary API. Furthermore, services available on the one platform aren’t available on the other platform. In the following posts, existing interoperability frameworks, solutions and standards for Platform as a Service are evaluated. Standard initiatives Current standard initiatives are the Open Cloud Computing Interface (OCCI) and the Distributed Management Task Force (DMTF) initiatives. In the next posts, I will outline them in detail. Libraries and Frameworks Libraries and Frameworks for Cloud Interoperability are: Apache jClouds and Libcloud. They will be described in the following posts. Middleware solutions Middleware Solutions for Cloud interoperability are: mOSAIC, PaaS Semantic Interoperability Framework (PSIF), Frascati-based Multi PaaS Solution and SimpleCloud. They will be described in the following posts. This post is part of a work done on Cloud interoperability. You can access the

read more Current initiatives in cloud computing interoperability

Interoperability challenges for Platform as a Service


On the IaaS layer, work on cloud interoperability was already conducted [Ste13a], [Ste13b]. The authors described in “Challenges in the Management of Federated Heterogeneous Scientific Clouds” the problem and a feasible solution to migrate virtual machines between providers. Another challenge identified is the layer between the vendor API and the user. This problem is addressed by the same authors in the paper “Building an On-Demand Virtual Computing Market in Non-Commercial Communities”, where a market is introduced to handle that problem. The concept of the market is then described in “Take a Penny, Leave a Penny Scaling out to Off-premise Unused Cloud Resources” [Ste13b] in detail, where a solution is presented that allows users to use different cloud vendors with one abstract API. Gonidis et al. [Gon11] give a first hint at challenges addressed for Platform as a Service interoperability. Platform as a Service gives the promise of speeding up application development [Mei11] by utilizing services. Platforms such as Microsoft’s Azure,

read more Interoperability challenges for Platform as a Service