grascomp Graduate School in Computing Science
Documents and Links

INGI Fall 2012 Doctoral School Day in Cloud Computing

Tuesday 20 November 2012 Université catholique de Louvain

The Computing Science and Engineering pole (INGI) of the Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM) at the Université catholique de Louvain is organizing a full-day session of doctoral lectures on the broad theme of Large Scale and Cloud Computing.  The program consists of extended, didactic presentations (45 minutes each) on selected topics in the thematic area, given by a selected international panel of researchers.  The presentations will be given in English.

This seminar allows Belgian doctoral researchers to learn about latest advances and trends in the broad field of Large Scale and Cloud Computing and to meet with colleagues and researchers from Belgium and abroad, all within a single day.  It also offers an opportunity for the presenters to disseminate their research achievements and meet their peers in the Belgian academic community.

Talks available

All slides and related papers are available here.


9h00-9h15: Introduction to the Cloud Computing Day

9h15-10h00: Ahmad Al-Shishtawy, KTH, Kista, Sweden, ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores
10h00-10h45: Johann Bourcier, Irisa-Univ. Rennes 1 & INRIA, France, Models@Run.time to Support Adaptation in Future Internet Services

10h45-11h00: Coffee break (Maxwell Building, room A164)

11h00-11h45: Annette Bieniusa, University of Kaiserslautern, Germany, Scalable Consistency for Replicated Data
11h45-12h30: Martin Quinson, LORIA, Nancy, France, Simulation of Next Generation Systems

12h30-13h30: Lunch (Maxwell Building, room A164)

13h30-14h15: Nico Kruber, Zuse Institut Berlin, Germany, Scalable Data Models with the Transactional Key-Value Store Scalaris
14h15-15h00: João Leitão, INESC-ID, Lisbon, Portugal, Managing the Topology of Unstructured Overlay Networks
15h00-15h45: Eva Kalyvianaki, Imperial College London, United Kingdom, Data Stream Processing in the Cloud

15h45-16h00: Coffee break (Maxwell Building, room A164)

16h00-16h45: Héctor Fernández, Vrije Universiteit Amsterdam, Netherlands, ConPaaS: A Cloud Platform for Hosting Elastic Applications
16h45-17h30: Marco Canini, TU Berlin & T-Labs, Germany, Network Reliability in the Software Era–Finding Bugs in OpenFlow-Based Software Defined Networks

17h30-17h45: Closing Remarks


The lectures will take place in Maxwell Building, Auditorium A105, 1348 Louvain-la-Neuve (Google Maps).  This building is part of the ICTEAM Institute at the Louvain Engineering School and is 10 minutes walking distance from the Louvain-la-Neuve train station.  For people coming by car, there is a freely accessible parking lot, Parking Malin du Rédimé, next to the Maxwell Building (near the large white satellite antenna you see when driving into the campus along the Avenue Georges Lemaître).


The Doctoral School Day is targeted for Belgian doctoral students but is open to all.  Participation will be credited to doctoral students upon request, as part of the GRASCOMP doctoral school programme.  Attendance is free but registration is required by subscribing to the course website (see instructions below).  Coffee breaks and lunch are included in the registration.

To register:

 Visit the GRASCOMP Campus Website at

 If you have not yet, create a user account for yourself.  Make sure you provide a valid e-mail address.

 Enroll to course COMP053 (no key is needed).


For any questions, please contact Peter Van Roy (Peter.VanRoy (at)


Ahmad Al-Shishtawy
KTH, Kista, Sweden

ElastMan: Autonomic Elasticity Manager for Cloud-Based Key-Value Stores

The increasing spread of elastic Cloud services, together with the pay-as-you-go pricing model of Cloud computing, has led to the need of an elasticity controller. The controller automatically resizes an elastic service, in response to changes in workload, in order to meet Service Level Objectives (SLOs) at a reduced cost. However, variable performance of Cloud virtual machines and nonlinearities in Cloud services, such as the diminishing reward of adding a service instance with increasing the scale, complicates the controller design. We present the design  and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores. ElastMan combines feedforward and feedback control. Feedforward control is used to respond to spikes in the workload by quickly resizing the service to meet SLOs at a minimal cost. Feedback control is used to correct modeling errors and to handle diurnal workload. To address nonlinearities, our design of ElastMan leverages the near-linear scalability of elastic Cloud services in order to build a scale-independent model of the service. Our design based on combining feedforward and feedback control allows to efficiently handle both diurnal and rapid changes in workload in order to meet SLOs at a minimal cost. Our evaluation shows the feasibility of our approach to automation of Cloud service elasticity.


Ahmad Al-Shishtawy has recently received his PhD from the KTH Royal Institute of Technology, Stockholm, Sweden.  He is currently a senior researcher at KTH and a member of the Computer Systems Laboratory (CSL) at the Swedish Institute of Computer Science (SICS). Earlier, he completed his B.Sc. and M.Sc. at the Faculty of Computer and Information Sciences (FCIS), Ain Shams University, Cairo, Egypt.  Ahmad’s research interests include autonomic computing, Cloud computing, large-scale distributed systems, distributed algorithms, peer-to-peer systems, and Cloud federations. Currently, his research is focused on automating the elasticity control of Cloud-based elastic storage services using techniques such as control theory and machine learning.

Johann Bourcier
Irisa-Univ. Rennes 1 & INRIA, France

Models@Run.Time to Support Adaptation in Future Internet Services

The proliferation of smart mobile nodes, such as smartphones, tablets and sensors together with the rapidly growing cloud infrastructures have lead to the emergence of a new class of services which we called Future Internet Services. These Services aim to bridge the gap between the Internet of Things and the Internet of Services. Despite the available technologies and devices, the development of these services is raising major challenges from a software engineering point of view. This presentation will focus on the challenge of building dynamically distributed adaptive systems featuring heterogeneous execution infrastructure. The presentation will highlight the use of a relatively new approach called Models@Run.Time to support developers who build these systems. The presentation will provide insight on Kevoree a tool which leverage the Models@Run.time approach to provide a suitable abstraction for dealing with dynamic adaptation in the context of Future Internet Services.


Johann Bourcier is currently associate professor at the University of Rennes 1. He holds a PhD from Grenoble University in Autonomic Computing for Pervasive Services. He has spent one year as a postdoc at Imperial College London in the Distributed System Engineering group. His current research interests are around self-adaptive services for Future Internet Services, bridging the gap between pervasive services and cloud services. He has published several articles on self-adaptive services in pervasive environment, and more recently in highly distributed environment.

Annette Bieniusa
University of Kaiserslautern, Germany

Scalable Consistency for Replicated Data

Replicating dynamically updated data is a principal mechanism in large-scale distributed systems, but it suffers from a fundamental tension between scalability and data consistency.  Eventual consistency sidesteps the synchronization bottleneck, but remains ad-hoc, error-prone, and difficult to prove correct.

In this talk, I will introduce a promising approach to synchronization-free sharing of mutable data: conflict-free replicated data types (CRDTs).  Complying to simple mathematical properties (namely commutativity of concurrent updates, or monotonicity of object states in a semi-lattice), any CRDT provably converges, provided all replicas eventually receive all operations.  A CRDT requires no synchronization: an update can execute immediately, irrespective of network latency, faults, or disconnection; it is highly scalable and fault-tolerant.

Building on the CRDT concept, we implemented the Swiftcloud CRDT store.  Swiftcloud aims to deploy CRDT objects at extreme scale, close to clients at the network edge. To simplify program with CRDTs, a conflict-free transaction presents the application with a consistent snapshot of the database and ensures that its results are transmitted atomically.  We have implemented a social-network application, a file system and other application on top of Swiftcloud; experiments show several orders of magnitude performance improvement over a more classical synchronisation-based approach.

This is joint work with Marek Zawirski and Marc Shapiro of LIP6, Nuno Preguica and Sérgio Duarte of Universidade Nova de Lisboa, and Carlos Baquero, of Universidade do Minho.


Annette Bieniusa is a lecturer at the University of Kaiserslautern. She received a PhD in Computer Science from the University of Freiburg in 2011 and spent one year as postdoctoral researcher at INRIA Paris-Rocquencourt.  Her research interests include semantics of concurrent and distributed programming, with a focus on replication, synchronization, and programming language concepts such as Software Transactional Memory.  She was involved in several national and international research projects (ConcoRDanT (ANR, France), Streams (ANR, France), JCell (BmBF, Germany)).

Martin Quinson
ancy, France

Simulation of Next Generation Systems

Recent and foreseen technical evolutions allow to build information systems of unprecedented dimensions. The potential power of the resulting distributed systems offers new possibilities in terms of applications, be them scientific such as multi-physic simulations in High Performance Computing (HPC), commercial in the Cloud with the data centers underlying the Internet, or public in very large peer-to-peer systems. Evaluating computer systems of such scale raises severe methodological challenges. Simply executing them is not always possible as it requires to build the complete system beforehand, and it may not even be enough when uncontrolled external load prevents reproducibility.  Simulation is an appealing alternative to study such systems. It may not be sufficient in some cases to capture the whole complexity of the phenomena, but allows to easily capture some important trends, while ensuring the controllability and reproducibility of experiments.

I will present the SONGS scientific project (Simulation Of Next Generation Systems), of which I am the scientific leader. This project aims at designing a unified and open simulation framework for evaluation of Grids, Peer-to-Peer systems, Clouds and HPC systems. The rationale to address these seemingly different domains is that they actually have a lot of similarities, in term of hardware, software and in term of resulting scientific questions.  The SONGS project builds upon the SimGrid simulation framework, which can simulate efficiently and accurately very large systems. This tool was used for the experimentations of over hundred publications worldwide.

During this talk, I will briefly present the SimGrid environment and assess its accuracy, scalability and usability. I will present ongoing work on how to address the challenge of Open Science to ensure the reproducibility of the discoveries. I will then focus on Cloud scientific challenges, presenting how SimGrid can be used in practice for large-scale studies, both on provider or client side, and the ongoing efforts to ease further such studies.


Martin Quinson has been lecturer of the University of Lorraine since 2005. His research interests cover distributed computing, clouds and HPC. In particular his research emphasizes on assessing the quality of distributed applications and the experimental evaluation of distributed algorithms.  He has published over 30 research articles in peer-reviewed journals and conferences.  He leads several scientific projects around the SimGrid scientific instrument since 2007, such as the SONGS project that is funded by the French research agency ANR for about 1.8M€ over four years.

Nico Kruber
Zuse Institut Berlin, Germany

Scalable Data Models with the Transactional Key-Value Store Scalaris

Scalaris [1] is a scalable, transactional, distributed key-value store. It implements a distributed hash table based on a set of different algorithms organised in three layers: P2P, Replication and Transaction. The P2P layer organises Scalaris nodes in a ring via the successor/predecessor relation. During scale-in and scale-out operations, data needs to be migrated to new nodes. Migration algorithms with a low down-time of the transferred data and atomic transfers are needed to support fast responses to spontaneous changes in resource demands.  Scalaris' transaction layer implements strong consistency, atomicity and isolation via quorum-based approaches. These allow Scalaris to support more advanced data models and keep the consistency among multiple key-value pairs.

This talk will give an introduction to Scalaris and some of the algorithms it is made of. It will then present the demonstrator application "Wiki on Scalaris" and will focus on how to create data models that scale. Several approaches to improve non-optimal data models will be analysed with regard to scalability and performance.



Nico Kruber is a research assistant at the Zuse Institute Berlin (ZIB) working towards his PhD. He received his diploma degree in Computer Science from the Humboldt University of Berlin in 2009 and has since been working at ZIB. His research interests include distributed systems with a focus on DHTs, gossiping, load balancing and NoSQL data models. He was involved in the MoSGrid (BMBF, Germany) project and currently works in 4CaaSt (EU-FP7).

João Leitão
INESC-ID, Lisbon, Portugal

Managing the Topology of Unstructured Overlay Networks

The peer-to-peer (P2P) paradigm has emerged more than 10 years ago as a viable alternative to overcome limitations of the client-server model namely, in terms of scalability, fault-tolerance, and even operational costs. This paradigm has gained significant popularity with its successful application in the context of file sharing applications. The success of these applications is illustrated by systems such as Napster, Emule, Gnutella, and recently, BitTorrent. In order to ensure the scalability of these solutions many P2P services operate on top of unstructured overlay networks, which are logical networks deployed at the application level. Unstructured overlay networks establish random neighboring associations among participants of the system. Although the random nature of these overlays is desirable by many P2P services, the resulting topology may present sub-optimal characteristics, for instance from the point of view of link latency. This may have a significant impact of the performance of P2P services executed over these overlays.

This talk will discuss several techniques that can be employed to manage the topology of unstructured overlay networks, allowing to imbue some form of relaxed topology over these overlays in a decentralised fashion. The talk will also discuss experimental results that were obtained through both simulation and pro type deployments over PlanetLab, that measure the benefits achieved by each technique in the context of specific case-study applications.


João Leitão graduated (2006) and has a Master (2007) in Computer Engineering from the Faculty of Sciences of the University of Lisbon. He holds a PhD in Computer Engineering from Instituto Superior Tècnico of the Technical University of Lisbon (2012), and conducts his research in the context of the Distributed Systems Group of INESC-ID Lisboa. His research interests are focused on the design of fault-tolerant and efficient large-scale systems with particular emphasis on peer-to-peer systems and on the design of overlay networks. He is a member of the ACM and IEEE.

Eva Kalyvianaka
Imperial College London, United Kingdom

Data Stream Processing in the Cloud

As users of big data applications want fresh data processing results, we witness a new breed of stream processing systems that are designed to address the challenges from handling unprecedented volumes of data and queries.

In this talk I will describe two different systems to address competing challenges of real-time big data processing. Federated stream processing systems, which utilise nodes from multiple independent domains, can be found increasingly in multi-provider cloud deployments. To pool resources from several sites and take advantage of network locality, submitted continuous queries are split into query fragments, which are executed collaboratively by different sites. When supporting many concurrent users, however, queries may exhaust available processing resources, thus requiring constant load shedding. Given that individual domains have autonomy over how they allocate query fragments to their nodes, it is an open challenge how to ensure global fairness in terms of the processing quality experienced by individual queries in such a scenario. I will describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments. It executes queries in a globally fair fashion and provides users with constant feedback on the experienced processing quality for their queries. Our evaluation shows that, compared to a random shedding approach, THEMIS can achieve a low global spread of processing quality across federated queries, independently of specific query semantics.

We witness a new breed of stream processing systems that are designed to scale to a large number of cloud-hosted machines. Such systems face new requirements: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators at runtime when the processing load increases; (ii) with deployment on hundreds of VMs, failures are common---systems must be fault-tolerant with fast recovery times yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered automatically without affecting query results. Our system exposes internal operator state explicitly and enable the stream processing system to manage it. Following this approach, we introduce combined mechanisms for scale out and fault tolerance in cloud-deployed systems: dynamic operator scale out and upstream backup with checkpointing. The system scales out by allocating new VMs from a small pre-allocated pool and automatically partitioning checkpointed operator state. The checkpointed state is also used to recover failed stateful operators before replaying unprocessed tuples. We evaluate this approach with the Linear Roads Benchmark on the Amazon EC2 cloud platform.


Eva Kalyvianaki is a Post-Doctoral Researcher at the large-scale distributed systems group at the Department of Computing, Imperial College London. Before this, she obtained her PhD from the Computer Laboratory, University of Cambridge. Her main research interests are in system, distributed systems, data steam processing, cloud computing and autonomic computing.

Héctor Fernández
Vrije Universiteit Amsterdam, Netherlands

ConPaaS: A Cloud Platform for Hosting Elastic Applications

Cloud computing opens new perspectives for hosting applications. From an application developer's point of view, migrating an application to the cloud may be as simple as replacing one physical machine from a traditional IT system with an equivalent virtual machine in the cloud. However, the cloud provides many more opportunities for building sophisticated application especially when regarding resource consumption issues. Fully utilizing the cloud to host complex applications is not easy, however. Most clouds today offer Infrastructure-as-a-Service (IaaS) functionality, which means they rent out basic resources such as virtual machines, disks and networks. Application developers must therefore handle the complexity of deploying applications composed of many inter-related components, implementing automatic resource provisioning, orchestrating application re-configurations such that users do not notice any downtime, etc... Tailoring such mechanisms for each application is tedious and error-prone, as subtle implementation bugs may appear only in production.

To address these concerns, we are designing and developing ConPaaS, an open-source runtime environment for hosting applications in Cloud environments. Within the Cloud computing paradigm, ConPaaS belongs to the platform-as-a-service family, in which a variety of systems aim to offer the full power of the cloud to application developers while shielding them from its associated complexity. In ConPaaS, an application is defined as a composition of one or more distributed components called services. Each service is self-managed and elastic: it can deploy itself on the cloud, monitor its own performance, and increase or decrease its computational power based on the demanding workload.


Héctor Fernández is a postdoctoral research scientist member of the ConPaaS team at VU University Amsterdam. He received his PhD in Computer Science from University of Rennes 1 in 2012 in the area of service-oriented computing, and in particular on the decentralized workflow coordination in distributed infrastructures. Hector's current research interests include service coordination, resource management and fault-tolerance in large scale infrastructures.

Marco Canini
TU Berlin & T-Labs, Germany

Network Reliability in the Software Era – Finding Bugs in OpenFlow-based Software Defined Networks

Nowadays users expect to experience highly dependable network connectivity and services. However, several recent episodes demonstrate that software errors and operator mistakes continue to cause undesired disturbances and outages. SDN (Software Defined Networking) is a new kind of network architecture that decouples the control plane from the data plane---a vision currently embodied in OpenFlow. By logically centralizing the control plane computation, SDN provides the opportunity to remove complexity from and introduce new functionality in our networks. On the other hand, as the network programmability enhances and software plays a greater role in it, risks that buggy software may disrupt an entire network also increase.

In this talk, I will present efficient, systematic techniques for testing the SDN software stack at both its highest and lowest layers. That is, our testing techniques target at the top layer, the OpenFlow controller programs and, at the bottom layer, the OpenFlow agents---the software that each switch runs to enable remote programmatic access to its forwarding tables.

Our NICE (No bugs In Controller Execution) tool applies model checking to explore the state space of an unmodified controller program composed with an environment model of the switches, and the hosts. Scalability is the main challenge, given the diversity of data packets, the large system state, and the many possible event orderings. To address this, we propose a novel way to augment model checking with symbolic execution of event handlers (to identify representative packets that exercise code paths on the controller), and effective strategies for generating event interleavings likely to uncover bugs. Our prototype tests Python applications on the popular NOX platform. In testing three real applications, we uncover eleven bugs.

Our SOFT (Systematic OpenFlow Testing) tool automates testing the interoperability of OpenFlow switches. Our key insight is in automatically identifying the testing inputs that cause different OpenFlow agent implementations to behave inconsistently. To this end, we first symbolically execute each agent under test in isolation to derive which set of inputs causes which behavior. We then crosscheck all distinct behaviors across different agent implementations and evaluate whether a common input subset causes inconsistent behaviors. Our evaluation shows that our tool identified several inconsistencies between the publicly available Reference OpenFlow switch and Open vSwitch implementations.


Marco Canini is a senior research scientist at T-Labs, a joint institute of TU Berlin and Telekom Innovation Laboratories. Marco obtained his Ph.D. degree in Computer Science and Engineering from the University of Genoa in 2009 after spending the last year as a visiting student at the University of Cambridge, Computer Laboratory. He holds a laurea degree with honors in Computer Science and Engineering from the University of Genoa. He also held positions at Intel Research and Google, and he was a postdoctoral researcher at EPFL from 2009 to 2012.