An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Jun 10, 2024 - Scala
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
An open protocol for secure data sharing
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Free High-Quality Financial Data in Azure
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
This project builds an End-to-End Azure Data Engineering Pipeline, performing ETL and Analytics Reporting on the AdventureWorks2017LT Database.
PawMark is a platform for developers to build, schedule and monitor data pipelines.
Analytical database for data-driven Web applications 🪶
A native Rust library for Delta Lake, with bindings into Python
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
🦖 Efficiently evolve your old fixed-length data files into more modern file formats, fully parallelized!
Hackolade plugin for Delta Lake on Databricks
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Schema mappings in SQL and PySpark for ELT pipelines to normalize data to OCSF
Add a description, image, and links to the delta-lake topic page so that developers can more easily learn about it.
To associate your repository with the delta-lake topic, visit your repo's landing page and select "manage topics."