RealTime StockStream

RealTime StockStream is a streamlined system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis 💹🕊️

Getting Started

This guide will walk you through setting up and running the RealTime StockStream on your local machine for development and testing.

Prerequisites

Ensure you have the following software installed:

Docker
Python (version 3.11 or higher)

Todo Features

Live Market Data Integration ⌛
Advanced Analytics Features ⌛
Interactive Data Visualization ⌛
Improved Scalability ⌛
User Customization Options ⌛
Stronger Security ⌛

Used Techs

Appache Kafka
Appache Cassandra
Appache ZooKeeper
Appache Spark
Python

Installation

Follow these steps to set up your development environment:

Setting Up Kafka

Create a Kafka Topic:

kafka-topics.sh --create --topic stocks --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

Suppored Data Opreations

Grouping Aggregation: Summarize data by groups.
Pivot Aggregation: Reshape data, converting rows to columns.
Rollups and Cubes: Perform hierarchical and combinational aggregations.
Ranking Functions: Assign ranks within data partitions.
Analytic Functions: Compute aggregates while maintaining row-level details.

Database Schema

Configuring Cassandra

Create a Keyspace and Table: Execute the following CQL commands to set up your Cassandra database:

CREATE KEYSPACE stockdata WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};

CREATE TABLE stockdata.stocks (
    stock text,
    trade_id uuid,
    price decimal,
    quantity int,
    trade_type text,
    trade_date date,
    trade_time time,
    PRIMARY KEY (stock, trade_id)
);

System Architecture

Docker Compose

Launch Services: Use Docker Compose to start Kafka, Zookeeper, Cassandra, and Spark services:

 version: '3.9'

 name: "realtime-stock-market"

 services:
 zookeeper:
     image: bitnami/zookeeper:latest
     ports:
     - "2181:2181"
     environment:
     - ALLOW_ANONYMOUS_LOGIN=yes
     networks:
     stock-net:
         ipv4_address: 172.28.1.1
         
 kafka:
     image: bitnami/kafka:latest
     ports:
     - "9092:9092"
     environment:
     - KAFKA_BROKER_ID=1
     - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
     - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://172.28.1.2:9092
     - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
     - ALLOW_PLAINTEXT_LISTENER=yes
     depends_on:
     - zookeeper
     networks:
     stock-net:
         ipv4_address: 172.28.1.2
     volumes:
     - ./scripts/init-kafka.sh:/init-kafka.sh
     # entrypoint: ["/bin/bash", "init-kafka.sh"]
     restart: always

 cassandra:
     image: cassandra:latest
     ports:
     - "9042:9042"
     volumes:
     - ./init-cassandra:/init-cassandra
     - ./scripts/init-cassandra-schema.sh:/init-cassandra-schema.sh
     environment:
     - CASSANDRA_START_RPC=true
     networks:
     stock-net:
         ipv4_address: 172.28.1.3
     # entrypoint: ["/bin/bash", "init-cassandra-schema.sh"]
     restart: always

 spark:
     image: bitnami/spark:latest
     volumes:
     - ./spark:/opt/bitnami/spark/jobs
     - ./scripts/submit-spark-job.sh:/opt/bitnami/spark/submit-spark-job.sh
     ports:
     - "8080:8080"
     depends_on:
     - kafka
     networks:
     stock-net:
         ipv4_address: 172.28.1.4
     # entrypoint: ["sh", "-c", "./submit-spark-job.sh"]
     restart: always

 kafka_producer:
     build:
     context: ./kafka-producer
     dockerfile: kafka_producer.dockerfile
     depends_on:
     - kafka
     networks:
     stock-net:
         ipv4_address: 172.28.1.8
     restart: always

 plotly:
     build:
     context: ./plotly
     dockerfile: plotly.dockerfile
     volumes:
     - ./plotly/dashboard.py:/dashboard.py
     ports:
     - "8050:8050"
     depends_on:
     - cassandra
     networks:
         stock-net:
         ipv4_address: 172.28.1.9
     restart: always

 networks:
 stock-net:
     driver: bridge
     ipam:
     config:
         - subnet: 172.28.0.0/16

Run Docker Compose:
```
docker-compose up -d
```

Usage

Run the Spark Job: Use the spark-submit command to run your Spark job.

$ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,com.datastax.spark:spark-cassandra-connector_2.12:3.0.0 spark_job.py stocks

Produce and Consume Data: Start producing data to the stocks topic and monitor the pipeline's output.

Monitoring and Logging

Check the logs for each service in their respective directories for monitoring and debugging.

Visualizations

To run the dashbaord, you need to run the following command:

$ cd plotly & python3 dashboard.py

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
assets		assets
init-cassandra		init-cassandra
kafka-producer		kafka-producer
plotly		plotly
scripts		scripts
spark		spark
CHANGELOG.md		CHANGELOG.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

anqorithm/RealTime-StockStream

Folders and files

Latest commit

History

Repository files navigation

RealTime StockStream

Getting Started

Prerequisites

Todo Features

Used Techs

Installation

Setting Up Kafka

Suppored Data Opreations

Database Schema

Configuring Cassandra

System Architecture

Docker Compose

Usage

Monitoring and Logging

Visualizations

Testing

Tables Results

Stocks Table

Analysis Stocks Table

Analysis Stocks Table

Pivoted Stocks Table

Ranked Stocks Table

Rollup Stocks Table

Contributing

Authors

License

About

Topics

Resources

Stars

Watchers

Forks

Languages