Auflistung nach Autor:in "Kessler, Christoph"
1 - 6 von 6
Treffer pro Seite
Sortieroptionen
- ZeitschriftenartikelEnergy-Efficient Static Scheduling of Streaming Task Collections with Malleable Tasks(PARS-Mitteilungen: Vol. 30, Nr. 1, 2013) Kessler, Christoph; Eitschberger, Patrick; Keller, JörgWe investigate the energy-efficiency of streaming task collections with parallelizable or malleable tasks on a manycore processor with frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently so that cores typically run several tasks that are scheduled round-robin on user level. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. We first show the equivalence of task mapping for streaming task collections and normal task collections in the case of continuous frequency scalingunder reasonable assumptions for the user-level schedulerif a makespani.e. a throughput requirement of the streaming applicationis given and the energy consumed is to be minimized. We then show that in the case of discrete frequency scalingit might be necessary for processors to switch frequenciesand that idle times still can occurin contrast to continuous frequency scaling. We formulate the mapping of (streaming) task collections on a manycore processor with discrete frequency levels as an integer linear program. Finallywe propose two heuristics to reduce energy consumption compared to the previous results by improved load bal- ancing through the parallel execution of a parallelizable task. We evaluate the effects of the heuristics analytically and experimentally on the Intel SCC.
- ZeitschriftenartikelEnergy-Efficient Static Scheduling of Streaming Task Collections with Malleable Tasks(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 30, No. 1, 2013) Kessler, Christoph; Eitschberger, Patrick; Keller, JörgWe investigate the energy-efficiency of streaming task collections with parallelizable or malleable tasks on a manycore processor with frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin on user level. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. We first show the equivalence of task mapping for streaming task collections and normal task collections in the case of continuous frequency scaling, under reasonable assumptions for the user-level scheduler, if a makespan, i.e. a throughput requirement of the streaming application, is given and the energy consumed is to be minimized. We then show that in the case of discrete frequency scaling, it might be necessary for processors to switch frequencies, and that idle times still can occur, in contrast to continuous frequency scaling. We formulate the mapping of (streaming) task collections on a manycore processor with discrete frequency levels as an integer linear program. Finally, we propose two heuristics to reduce energy consumption compared to the previous results by improved load balancing through the parallel execution of a parallelizable task. We evaluate the effects of the heuristics analytically and experimentally on the Intel SCC.
- ZeitschriftenartikelFlexible Scheduling and Thread Allocation for Synchronous Parallel Tasks(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 29, No. 1, 2012) Kessler, Christoph; Hansson, ErikWe describe a task model and dynamic scheduling and resource allocation mechanism for synchronous parallel tasks to be executed on SPMD-programmed synchronous shared-memory MIMD parallel architectures with uniform, unit-time memory access and strict memory consistency, also known in the literature as PRAMs (Parallel Random Access Machines). Our task model provides a two-tier programming model for PRAMs that flexibly combines SPMD and fork-join parallelism within the same application. It offers flexibility by dynamic scheduling and late resource binding while preserving the PRAM execution properties within each task, the only limitation being that the maximum number of threads that can be assigned to a task is limited to what the underlying architecture provides. In particular, our approach opens for automatic performance tuning at run-time by controlling the thread allocation for tasks based on run-time predictions. By a prototype implementation of a synchronous parallel task API in the SPMDbased PRAM language Fork and experimental evaluation with example programs on the SBPRAM simulator, we show that a realization of the task model on a SPMDprogrammable PRAM machine is feasible with moderate runtime overhead per task.
- KonferenzbeitragHybrid Parallel Sort on the Cell Processor(9th workshop on parallel systems and algorithms – workshop of the GI/ITG special interest groups PARS and PARVA, 2008) Keller, Jörg; Kessler, Christoph; König, Kalle; Heenes, WolfgangSorting large data sets has always been an important application, and hence has been one of the benchmark applications on new parallel architectures. We present a parallel sorting algorithm for the Cell processor that combines elements of bitonic sort and merge sort, and reduces the bandwidth to main memory by pipelining. We present runtime results of a partial prototype implementation and simulation results for the complete sorting algorithm, that promise performance advantages over previ- ous implementations.
- KonferenzbeitragLoad balancing of irregular parallel divide-and-conquer algorithms in group-SPMD programming environments(ARCS'06, 19th International Conference on Architecture of Computing Systems, 2006) Eriksson, Mattias; Kessler, Christoph; Chalabine, Mikhail
- ZeitschriftenartikelA Quantitative Comparison of PRAM based Emulated Shared Memory Architectures to Current Multicore CPUs and GPUs(PARS-Mitteilungen: Vol. 31, Nr. 1, 2014) Hansson, Erik; Alnervik, Erik; Kessler, Christoph; Forsell, MarttiThe performance of current multicore CPUs and GPUs is limited in computations making frequent use of communication/synchronization between the subtasks executed in parallel. This is because the directory-based cache systems scale weakly and/or the cost of synchronization is high. The Emulated Shared Memory (ESM) architectures relying on multithreading and efficient synchronization mechanisms have been developed to solve these problems affecting both performance and programmability of current machines. In this paper, we compare preliminarily the performance of three hardware implemented ESM architectures with state-of-the-art multicore CPUs and GPUs. The benchmarks are selected to cover different patterns of parallel computation and therefore reveal the performance potential of ESM architectures with respect to current multicores.