Skip to content

kaskada-ai/kaskada

Repository files navigation

Kaskada: Modern, open-source event-processing

Python CI Rust CI (Nightly) Notebooks CI

kaskada.io | Docs

Kaskada is a unified event processing engine that provides all the power of stateful stream processing in a high-level, declarative query language designed specifically for reasoning about events in bulk and in real time.

Kaskada's query language builds on the best features of SQL to provide a more expressive way to compute over events. Queries are simple and declarative. Unlike SQL, they are also concise, composable, and designed for processing events. By focusing on the event-processing use case, Kaskada's query language makes it easier to reason about when things happen, state at specific points in time, and how results change over time.

Kaskada is implemented as a modern compute engine designed for processing events in bulk or real-time. Written in Rust and built on Apache Arrow, Kaskada can compute most workloads without the complexity and overhead of distributed execution.

Read more at kaskada.io. See the docs to get started.

Features

  • Stateful aggregations: Aggregate events to produce a continuous timestream whose value can be observed at arbitrary points in time.
  • Automatic joins: Every expression is associated with an “entity”, allowing tables and expressions to be automatically joined. Entities eliminate redundant boilerplate code.
  • Event-based windowing: Collect events as you move through time, and aggregate them with respect to other events. Ordered aggregation makes it easy to describe temporal interactions.
  • Pipelined operations: Pipe syntax allows multiple operations to be chained together. Write your operations in the same order you think about them. It's timestreams all the way down, making it easy to aggregate the results of aggregations.
  • Row generators: Pivot from events to time-series. Unlike grouped aggregates, generators produce rows even when there's no input, allowing you to react when something doesn't happen.
  • Continuous expressions: Observe the value of aggregations at arbitrary points in time. Timestreams are either “discrete” (instantaneous values or events) or “continuous” (values produced by a stateful aggregations). Continuous timestreams let you combine aggregates computed from different event sources.
  • Native time travel: Shift values forward (but not backward) in time, allowing you to combine different temporal contexts without the risk of temporal leakage. Shifted values make it easy to compare a value “now” to a value from the past.
  • Simple, composable syntax: It is functions all the way down. No global state, no dependencies to manage, and no spooky action at a distance. Quickly understand what a query is doing, and painlessly refactor to make it DRY.

Join Us!

We're building an active, inclusive community of users and contributors. Come get to know us on Slack - we'd love to meet you!

For specific problems, file an issue.

Discussion and Development

Most development discussions take place on GitHub in this repo.

Contributing

All contributions -- issues, fixes, documentation improvements, features and ideas -- are welcome.

See CONTRIBUTING.md for more details.