Auflistung nach Schlagwort "distributed computing"
1 - 3 von 3
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragDeLorean: A Storage Layer to Analyze Physical Data at Scale(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Kußmann, Michael; Berens, Maximilian; Eitschberger, Ulrich; Kilic, Ayse; Lindemann, Thomas; Meier, Frank; Niet, Ramon; Schellenberg, Margarete; Stevens, Holger; Wishahi, Julian; Spaan, Bernhard; Teubner, JensModern research in high energy physics depends on the ability to analyse massive volumes of data in short time. In this article, we report on DeLorean, which is a new system architecture for high-volume data processing in the domain of particle physics. DeLorean combines the simplicity and performance of relational database technology with the massive scalability of modern cloud execution platforms (Apache Drill for that matter). Experiments show a four-fold performance improvement over state-of-the-art solutions.
- TextdokumentMAGPIE: A Scalable Data Storage System for Efficient High Volume Data Queries(BTW 2019, 2019) Lindemann, Thomas; Brinkmann, Patrick; Dalbah, Fadi; Hakert, Christian; Honysz, Philipp-Jan; Matuszczyk, Daniel; Müller, Nikolas; Schmulbach, Alexander; Todorinski, Stefan Petyov; Tüselmann, Oliver; Wonsak, Shimon; Teubner, JensModern challenges in huge sized data storage and querying require new approaches in the field of data storage systems. With MAGPIE, we are introducing a hardware-software-co-design, which is efficient in querying data by distributed storage with storage-near pre-processing and designed to be scalable up to large dimensions.
- TextdokumentOptimized Theta-Join Processing(BTW 2021, 2021) Weise, Julian; Schmidl, Sebastian; Papenbrock, ThorstenThe Theta-Join is a powerful operation to connect tuples of different relational tables based on arbitrary conditions. The operation is a fundamental requirement for many data-driven use cases, such as data cleaning, consistency checking, and hypothesis testing. However, processing theta-joins without equality predicates is an expensive operation, because basically all database management systems (DBMSs) translate theta-joins into a Cartesian product with a post-filter for non-matching tuple pairs. This seems to be necessary, because most join optimization techniques, such as indexing, hashing, bloom-filters, or sorting, do not work for theta-joins with combinations of inequality predicates based on <, ?, ?, ?, >. In this paper, we therefore study and evaluate optimization approaches for the efficient execution of theta-joins. More specifically, we propose a theta-join algorithm that exploits the high selectivity of theta-joins to prune most join candidates early; the algorithm also parallelizes and distributes the processing (over CPU cores and compute nodes, respectively) for scalable query processing. The algorithm is baked into our distributed in-memory database system prototype A2DB. Our evaluation on various real-world and synthetic datasets shows that A2DB significantly outperforms existing single-machine DBMSs including PostgreSQL and distributed data processing systems, such as Apache SparkSQL, in processing highly selective theta-join queries.