MultiStack

BigData on MultiCloud

View the Project on GitHub siel-iiith/MultiStack

MultiStack

There is a growing requirement of running Hadoop on Cloud. In response to this, various solutions have come out in both public and private domains. Being early solutions, they provide basic features but have only touched the tip of the iceberg. We are yet to see the real benefits of taking Hadoop to cloud. In this presentation we discuss our experiences in building an orchestrator called MultiStack for deploying, managing and monitoring Hadoop jobs across clouds. It exploits cloud to offer features like auto-scaling, aggregated monitoring, job specific optimizations, isolation and smarter job scheduling within and across clouds.

Traditionally, Hadoop has never been a part of cloud and has been restricted to a single cluster. MultiStack allows enterprises to run jobs across multiple clouds/clusters seamlessly. It reduces job completion time and improves resource utilization using machine learning based job scheduling. MultiStack scheduler uses fine-grained monitoring data to scale Hadoop clusters automatically and can use spot instances to lower the cost. We plan to integrate Hive, Mahout and Giraph to drive data analytics on cloud.

Features

Use Cases

MultiStack-deployment

Architecture

MultiStack Server Architecture

MultiStack API

Description for Rest API of v0.1.0

RoadMap

License

The use and distribution terms for this software are covered by the Apache 2.0 License (http://www.apache.org/licenses/LICENSE-2.0.html) which can be found in the file LICENSE at the root of this distribution. By using this software in any fashion, you are agreeing to be bound by the terms of this license. You must not remove this notice, or any other, from this software.

Contributors