Auflistung nach Schlagwort "GPU"
1 - 6 von 6
Treffer pro Seite
Sortieroptionen
- TextdokumentFast CSV Loading Using GPUs and RDMA for In-Memory Data Processing(BTW 2021, 2021) Kumaigorodski, Alexander; Lutz, Clemens; Markl, VolkerComma-separated values (CSV) is a widely-used format for data exchange. Due to the format's prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time
- TextdokumentFull-Scale File System Acceleration on GPU(Tagungsband des FG-BS Frühjahrstreffens 2024, 2024) Maucher, Peter; Kittner, Lennard; Rath,Nico; Lucka,Gregor; Werling,Lukas; Khalil,Yussuf; Gröninger,Thorsten; Bellosa,FrankModern HPC and AI Computing solutions regularly use GPUs as their main source of computational power. This creates a significant imbalance for storage operations for GPU applications, as every such storage operation has to be signalled to and handled by the CPU. In GPU4FS, we propose a radical solution to this imbalance: Move the file system implementation to the application, and run the complete file system on the GPU. This requires multiple changes to the complete file system stack, from the actual storage layout up to the file system interface. Additionally, this approach frees the CPU from file system management tasks, which allows for more meaningful usage of the CPU. In our pre- liminary implementation, we show that a fully-featured file system running on GPU with minimal CPU interaction is possible, and even bandwidth-competitive depending on the underlying storage medium.
- KonferenzbeitragImproving GPU Matrix Multiplication by Leveraging Bit Level Granularity and Compression(BTW 2023, 2023) Fett, Johannes; Schwarz, Christian; Kober, Urs; Habich, Dirk; Lehner, WolfgangIn this paper we introduce BEAM as a novel approach to perform GPU based matrix multiplication on compressed elements. BEAM allows flexible handling of bit sizes for both input and output elements. First evaluations show promising speedups compared to an uncompressed state-of-the-art matrix multiplication algorithm provided by nvidia.
- ZeitschriftenartikelIn-Depth Analysis of OLAP Query Performance on Heterogeneous Hardware(Datenbank-Spektrum: Vol. 21, No. 2, 2021) Broneske, David; Drewes, Anna; Gurumurthy, Bala; Hajjar, Imad; Pionteck, Thilo; Saake, GunterClassical database systems are now facing the challenge of processing high-volume data feeds at unprecedented rates as efficiently as possible while also minimizing power consumption. Since CPU-only machines hit their limits, co-processors like GPUs and FPGAs are investigated by database system designers for their distinct capabilities. As a result, database systems over heterogeneous processing architectures are on the rise. In order to better understand their potentials and limitations, in-depth performance analyses are vital. This paper provides interesting performance data by benchmarking a portable operator set for column-based systems on CPU, GPU, and FPGA – all available processing devices within the same system. We consider TPC‑H query Q6 and additionally a hash join to profile the execution across the systems. We show that system memory access and/or buffer management remains the main bottleneck for device integration, and that architecture-specific execution engines and operators offer significantly higher performance.
- ZeitschriftenartikelScaling Results for a Discontinuous Galerkin Finite-Element Wave Solver on Multi-GPU Systems(PARS: Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware: Vol. 28, No. 1, 2011) Rietmann, Max; Schenk, Olaf; Burkhart, HelmarMax Rietmann, Olaf Schenk, Helmar Burkhart
- TextdokumentUnparalleled Parallelism? CPU & GPU Architecture Trends and Their Implications for HPC Software(Tagungsband des FG-BS Frühjahrstreffens 2021, 2021) Morgenstern, Laura; Kabadshow, Ivo; Werner, MatthiasThe free lunch is over – again? In 2004, Herb Sutter observed the stagnation of clock frequencies and predicted hyperthreading and multicore capabilities as drivers for performance growth on CPUs. This prediction and the resulting advice to focus more on concurrency to achieve sustainable application performance, has become the daily reality of HPC software engineers. In this paper, we compare trends in the development of CPU and GPU architectures and examine their implications for the parallelization and portability of HPC software. The data analysis still reveals levelling clock frequencies but this time also for GPUs. Additionally, an increasing amount of hardware parallelism can be observed for both architectures.