General purpose parallel computations on GPU devices

What We Do

GPU devices offer amazing parallel processing capabilities. Our team deals with the development and practical application of new algorithms and data structures that allow to fully use the computing power of modern SIMD (Single Instruction Multiple Data) coprocessors. Our results find applications in data compression, information searching and indexing, graph processing, genomics and many others.


,,New data intensive algorithms and structures for GPU processors''

National Science Centre, Sonata (decision: DEC-2012/07/D/ST6/02483) – 36 months
Budget: 433.400 pln

,,Computational cluster for genomical mathematics and computer science''

Ministerstwo Nauki i Szkolnictwa Wyższego (rej. II/0070/2016)
Budget: 100.000 pln

,,A system for efficient time series creation from textual data''

National Centre for Research and Development, Tango 3 (ID: 416322) – 12 months
Budget: 190.550 pln
New data intensive algorithms and structures for GPU processors

The goal of the project is to design new or adapt existing parallel structures and algorithms for multi-core general purpose graphic processing units. Among results we expect significant acceleration of existing solutions and enlargement of processed volumes of data. Scientific hypothesis proposes that this goal may be achieved with popular GPU devices and personal computers.

The project will be focused on highly attractive algorithms devoted to data intensive applications. Conducted literature studies and analysis of many implemented algorithms for GPUs showed a need of creating universal and consistent design of structures and algorithms for graphical devices including current trends in processors’ evolution. In the project we shall analyze solutions for MOLAP and time series databases. Results evaluation and load testing which will be performed with real-life data will allow to fine tune algorithms and structures to graphical devices requirements.

Because of the nature of the proposed research, the work method envisages performing the following tasks: design of theoretical foundations of a parallel-enabled algorithm for vector processor architecture; design of a prototype library focused on particular GPU processor limitations; implementation of a working prototype and efficiency measurements; prototype optimization in order to achieve maximum efficiency; publishing of results.
All the research will be performed with the equipment available at the Faculty of Mathematics and Information Science WUT, using PC workstations or professional computational nodes in a server-room. The results will be statistically analyzed according to the following factors: scalability of an algorithm or a data structure when the data volume changes, importance of acceleration according to a similar CPU based solution, effective time, memory, memory transfer and parallel core complexity.

The need for computations, and data volumes to be processed, grows rapidly, even processors with streaming capabilities and parallel cores are often unable to run calculations with the necessary efficiency. Supercomputers which can be found in computational centres are both too exotic and expensive to become common. The answer to these issues are computational cards built upon multicore graphics cards which outperforms classical CPU processors. What is missing are new representations of data structures and methods of processing them, which will enable efficient programming of algorithms working within the specific limitations of a GPU. By developing new data structures and new operations utilizing all available cores the project will open up new possibilities of using graphics cards in many fields of science, where drastic efficiency improvement of data processing is crucial. The expected impact of the project will enable popular personal computers in analysis.

A system for efficient time series creation from textual data
The aim of the project is to carry out conceptual and R&D works investigating the possibilities of implementing a textual logs processing system using GPU processors. Such logs can collect billions of entries, which should be translated into numeric information and stored in a time series database in a limited time (possibly real-time). Distributed systems based on the map-reduce method are commonly used for this purpose, in order to obtain the necessary efficiency. Utilization of algorithms that arose as a result of the base project will allow to build a prototype on only one machine equipped with a computational device. Such a solution will lead to obvious savings in electricity consumption, space, resources and personnel costs. The new solution will be tested on data from an industrial partner. The analyzes necessary to establish cooperation with the partner regarding the system performance, implementation costs and benefits from the application will be made.


Key works

Recent contributions

2019 Krzysztof Kaczmarski, Albert Wolant. GPU R-Trie: Dictionary with ultra fast lookup. Concurrency and Computation: Practice and Experience 31(19). John Wiley & Sons, Inc. 2019.
2018 Krzysztof Kaczmarski, Albert Wolant. Radix Tree for Binary Sequences on GPU. In: Wyrzykowski R., Dongarra J., Deelman E., Karczewski K. (eds) Parallel Processing and Applied Mathematics. Parallel Processing and Applied Mathematics 2017. Lecture Notes in Computer Science, vol 10777. 2018. Springer, Cham
2017 Krzysztof Kaczmarski,Piotr Przymus. Fixed length lightweight compression for GPU revised. Journal of Parallel and Distributed Computing. Elsevier. September 2017. DOI:


2016 Krzysztof Kaczmarski, Paweł Rzążewski, Albert Wolant: Parallel algorithms constructing the cell graph. Concurrency and Computation: Practice and Experience 29(23). John Wiley & Sons, Inc. 2016.
2015 Krzysztof Kaczmarski, Pawel Rzazewski, Albert Wolant: Massively Parallel Construction of the Cell Graph. Parallel Processing and Applied Mathematics Volume 9573 of the series Lecture Notes in Computer Science pp 559-569
2014 Krzysztof Kaczmarski, Piotr Przymus, Pawel Rzazewski: Improving High-Performance GPU Graph Traversal with Compression. ADBIS 2014:201-214
2014 Piotr Przymus, Krzysztof Kaczmarski, Krzysztof Stencel: A Bi-objective Optimization Framework for Heterogeneous CPU/GPU Query Plans. Fundam. Inform. (FUIN) 135(4):483-501 (2014)
2014 Piotr Przymus, Krzysztof Kaczmarski: Compression Planner for Time Series Database with GPU Support. T. Large-Scale Data- and Knowledge-Centered Systems (TLSDKCS) 15:36-63 (2014)
2014 Joanna Porter-Sobieraj, Sebastian Cygert, Daniel Kikola, Jan Sikorski, Marcin Slodkowski: Optimizing the computation of a parallel 3D finite difference algorithm for graphics processing units. Concurrency and Computation: Practice and Experience (CONCURRENCY) 27(6):1591-1602 (2015)
2013 Piotr Przymus, Krzysztof Kaczmarski: Time Series Queries Processing with GPU Support. ADBIS 2013. Volume 241 of the series Advances in Intelligent Systems and Computing pp 53-60. Springer-Verlag. ISBN: 978-3-319-01862-1
2013 Piotr Przymus, Krzysztof Kaczmarski: Dynamic Compression Strategy for Time Series Database Using GPU. ADBIS 2013. New Trends in Databases and Information Systems Volume 241 of the series Advances in Intelligent Systems and Computing pp 235-244. Springer-Verlag. ISBN: 978-3-319-01862-1
2013 Piotr Przymus, Krzysztof Kaczmarski, Krzysztof Stencel: A Bi-objective Optimization Framework for Heterogeneous CPU/GPU Query Plans. CS&P 2013:342-354


dr Krzysztof Kaczmarski


inż. Artur Niewiadomski


inż. Stanisław Piotrowski


Have a Project in Mind?

Please get in touch with the head of the group.