Hash Aggregations in Flink

Written by Alexander Alexandrov and Gábor Gévay and Andreas Kunft on Apr 7, 2016

Hash-based aggregations have been a long-standing item on the feature list for Flink (see FLINK-2237). We recently submitted a PR implementing the first step towards providing this fuctionality with FLINK-3477, which introduces a hash-based aggregation strategy for combiners. In this post, we review the main differences between sort- and hash-based aggregations and present a Peel-based experiments bundle which analyzes the benefits of the hash-based strategy.