English - Deutsch

Hello! I'm

Paul Groß

Hi, I'm Paul! Ever since I got my hands on a book about app development in 9th grade, computer science hasn't let me go. I'm fascinated by the diversity of the field - from very technical, mathematical aspects to political-social topics.
Get in Touch
Paul Groß

My Recent Projects

Dash
A web application and duckdb extension for creating interactive dashboards. It has an AI chat integration that allows users to get insights from their data and create dashboards without writing SQL queries. Try it out at dash.builders.
Maple
A web application for generating office floor plans enabling architects rapid prototyping

My App Projects

Spacecraft Command

A casual mobile game made in Unity 3D
Omes
A combination of a planer and a messenger
Gympion
A draft for an app for gyms.
Hexplorers
A turn-based strategy game
Gympion Admin
The administrational component of Gympion
Heavy Landing
My first game released in 2014

Experience

2020
-
today
Philomatech UG (haftungsbeschränkt)

Founder and CEO
Freelance project-based contract work.

Consulting, planning and development of mobile and web applications as well as mobile games.
2022
-
2024
maple. GmbH

Co-Founder
Developed a generative AI tool for office design, enabling rapid architectural prototyping.

Managed a team of freelancers.


2021
National Aeronautics & Space Administration (NASA)

C.S. Intern
Adapting a python-based data-analysis application for long term telemetry trending of the James Webb Space Telescope to integrate legacy data from an testing campaign.

Legacy telemetry data preparation, processing, and visualization.

Working in an international team remotly with a seven hour timeshift.
2018
-
2021
Airbus Defence & Space GmbH

Dual Student
Conceptualize a cloud migration strategy for existing applications. Developing an Angular based Web Client for a geographic information system (GIS) application and a microsevervice bases cloud-first backend adapter to intergrate the existing backend services.

Development of a gazetteer-search component for an miliary geographic information system (GIS).

Education

Centrum Wiskunde & Informatica (CWI)

Master Thesis Internship → PhD Student

Master's thesis on factorized aggregates and cyclic joins in DuckDB.

Improved TPC-H SF100 performance by 6%.

Continued as PhD student under Peter Boncz, focusing on string processing in DB systems.
2024 - today
VU Amsterdam & University of Amsterdam

MSc Computer Science – Big Data Engineering

Co-founded maple as part of the Demonstrator Lab.

Completed courses in Big Data Engineering and Machine Learning.

Graduated cum laude.

Dutch grading: 9.0 | US equivalent: A/A+
2022 - 2024
Baden-Wuerttemberg Cooperative State University (DHBW)

B. Sc. Computer Science – Information Technology

Received the 2021 Volunteers Award for 'outstanding academic engagement'.

Physics Tutor and student senator.

German grading: 1.4 | US equivalent: A–
2018 - 2021
Kreisgymnasium Riedlingen (KGR)

A-Level

School Award for "diligence and good performance", Award of the German Physical Society for "very good performance in physics", Award of the education partner Feinguss Blank GmbH for "diligence and good performance in the subjects math and physics".

Vice-chairman of the upper school association of the KGR, homework tutor.
2010 - 2018

Publications & Essays

Adaptive Factorization Using Linear-Chained Hash Tables

CIDR 2025 Conference Paper

Abstract: We introduce factorized aggregations and worst-case optimal joins in DuckDB with an adaptive mechanism that only uses them when they enhance query performance. This builds on the adoption of a new hash table design (“Linear-Chained”) for equi-joins. Our first insight is that the collision-free chains of this new design enable efficient factorized and worst-case optimal processing. We further defer the decision to use factorization and worst-case optimal joins from optimization to runtime. Our second insight is that we can obtain accurate statistics, even if the join inputs lack these (e.g. because they are sub-queries or Parquet files), by leveraging runtime heuristics and constructing efficient on-the-fly sketches, during the hash join build. Finally, we show that machine learning models using these metrics can achieve close to optimal performance with a high accuracy. Furthermore, we propose heuristic-based approaches that offer comparable performance to these models, while relying on cheaper to obtain run-time statistics and being more explainable.

PDF
2025
Master's Thesis: Lightweight Integration of Factorized d-Representations into DuckDB

Master's Thesis

Abstract: While factorization and Worst-Case Optimal Join (WCOJ) algorithms promise significant performance improvements, their widespread adoption lacks behind because they are complex to implement and still face query optimization challenges. We aim to address this problem by proposing an adaptive, lightweight solution. We integrate factorized representations into DuckDB using pointers to hash table chains of a Linear-Chained hash table. This newly proposed hash table combines linear probing with chaining to insert a tuple only in a collision chain when their keys match, enabling collision-free chains. We use d-representations to (1) efficiently calculate aggregates and (2) perform cyclic joins in a worst-case optimal manner. By operating on d-representations, we avoid the explosion of intermediate results and efficiently reuse cached results, achieving speedups of up to 17.58x for aggregate computations and 16.77x for cyclic joins. To ensure that the new techniques are only employed when beneficial, we propose adaptive factorization, shifting the decision to use factorization from the planning stage to runtime. We then can collect statistics, which allows for accurate decision-making even in sub-queries or quering Parquet files. The metrics are then used by machine learning models to predict whether factorization would be beneficial. These models demonstrate an accuracy of 88% in our benchmarks.

PDF
2024
Finding Clusters of Similar-minded People on Twitter Regarding the Covid-19 Pandemic

MLDS 2021 Conference Paper

Abstract: In this paper we present two clustering methods to determine users with similar opinions on the Covid-19 pandemic and the related public debate in Germany. We believe, they can help gaining an overview over similar-minded groups and could support the prevention of fake-news distribution. The first method uses a new approach to create a network based on retweet-relationships between users and the most retweeted characters (influencers). The second method extracts hashtags from users posts to create a “user feature vector” which is then clustered using a similarity matrix based on [1] to identify groups using the same language. With both approaches it was possible to identify clusters that seem to fit groups of different public opinion in Germany. However, we also found that clusters from one approach cannot be associated with clusters from the other due to filtering steps in the two methods.

arXiv
2021
Concept for the Use of Cloud Technologies in Existing Military Applications

Bachelor Thesis

No further information providable, sorry :(
2021