Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology

Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology
Author: Nitin Sukhija
Publisher:
Total Pages: 172
Release: 2015
Genre:
ISBN:

Download Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology Book in PDF, Epub and Kindle

Large scale systems provide a powerful computing platform for solving large and complex scientific applications. However, the inherent complexity, heterogeneity, wide distribution, and dynamism of the computing environments can lead to performance degradation of the scientific applications executing on these computing systems. Load imbalance arising from a variety of sources such as application, algorithmic, and systemic variations is one of the major contributors to their performance degradation. In general, load balancing is achieved via scheduling. Moreover, frequently occurring resource failures drastically affect the execution of applications running on high performance computing systems. Therefore, the study of deploying support for integrated scheduling and fault-tolerance mechanisms for guaranteeing that applications deployed on computing systems are resilient to failures becomes of paramount importance. Recently, several research initiatives have started to address the issue of resilience. However, the major focus of these efforts was geared more toward achieving system level resilience with less emphasis on achieving resilience at the application level. Therefore, it is increasingly important to extend the concept of resilience to the scheduling techniques at the application level for establishing a holistic approach that addresses the performability of these applications on high performance computing systems. This can be achieved by developing a comprehensive modeling framework that can be used to evaluate the resiliency of such techniques on heterogeneous computing systems for assessing the impact of failures as well as workloads in an integrated way. This dissertation presents an experimental methodology based on discrete event simulation for the analysis and the evaluation of the resilience of scheduling scientific applications on high performance computing systems. With the aid of the methodology a wide class of dependencies existing between application and computing system are captured within a deterministic model for quantifying the performance impact expected from changes in application and system characteristics. Ideally, the results obtained by employing the proposed simulation-based performance prediction framework enabled an introspective design and investigation of scheduling heuristics to reason about how to best fully optimize various often antagonistic objectives, such as minimizing application makespan and maximizing reliability.


Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems Using a Simulation-based Methodology
Language: en
Pages: 172
Authors: Nitin Sukhija
Categories:
Type: BOOK - Published: 2015 - Publisher:

GET EBOOK

Large scale systems provide a powerful computing platform for solving large and complex scientific applications. However, the inherent complexity, heterogeneity
Resilience Assessment and Evaluation of Computing Systems
Language: en
Pages: 485
Authors: Katinka Wolter
Categories: Computers
Type: BOOK - Published: 2012-11-02 - Publisher: Springer Science & Business Media

GET EBOOK

The resilience of computing systems includes their dependability as well as their fault tolerance and security. It defines the ability of a computing system to
Scheduling Problems
Language: en
Pages: 156
Authors: Rodrigo Righi
Categories: Computers
Type: BOOK - Published: 2020-07-08 - Publisher: BoD – Books on Demand

GET EBOOK

Scheduling is defined as the process of assigning operations to resources over time to optimize a criterion. Problems with scheduling comprise both a set of res
Scheduling in Parallel Computing Systems
Language: en
Pages: 177
Authors: Shaharuddin Salleh
Categories: Computers
Type: BOOK - Published: 2012-12-06 - Publisher: Springer Science & Business Media

GET EBOOK

Scheduling in Parallel Computing Systems: Fuzzy and Annealing Techniques advocates the viability of using fuzzy and annealing methods in solving scheduling prob
Design and Analysis of Scheduling Techniques for Throughput Processors
Language: en
Pages:
Authors: Adwait Jog
Categories:
Type: BOOK - Published: 2015 - Publisher:

GET EBOOK

Throughput Processors such as Graphics Processing Units (GPUs) are becoming an inevitable part of every computing system because of their ability to accelerate