Much like the legendary Gordian Knot, sometimes when a problem becomes too thorny to solve, you just need to approach it from a different perspective. Especially in the fast-paced world of data, it’s tempting to devise complex systems when a simple one works just as well.
A few months ago, the analytics team at JW Player started working on breaking apart the usage computation from our larger daily pipeline. This usage data needed to be more timely to help our customers avoid overage charges or unexpected loss of service. We called this new pipeline Usage-Mini because it would run more frequently on smaller batches of data as they arrive.
As part of splitting apart the usage aggregation, we redesigned the pipeline, making one key decision that has led to huge improvements in performance, monitoring, and stability. This change was to reformulate our definition of batch from fixed time to fixed size. What’s the big deal about that? Keep reading to understand the impact of this simple choice.