Skip to content

RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis

Notifications You must be signed in to change notification settings

anqorithm/RealTime-StockStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RealTime StockStream

RealTime StockStream is a streamlined system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis πŸ’ΉπŸ•ŠοΈ

real-time-stock-stream

Getting Started

This guide will walk you through setting up and running the RealTime StockStream on your local machine for development and testing.

Prerequisites

Ensure you have the following software installed:

  • Docker
  • Python (version 3.11 or higher)

Todo Features

  1. Live Market Data Integration βŒ›
  2. Advanced Analytics Features βŒ›
  3. Interactive Data Visualization βŒ›
  4. Improved Scalability βŒ›
  5. User Customization Options βŒ›
  6. Stronger Security βŒ›

Used Techs

used techs

  • Appache Kafka
  • Appache Cassandra
  • Appache ZooKeeper
  • Appache Spark
  • Python

Installation

Follow these steps to set up your development environment:

Setting Up Kafka

  1. Create a Kafka Topic:
    kafka-topics.sh --create --topic stocks --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

Suppored Data Opreations

  1. Grouping Aggregation: Summarize data by groups.
  2. Pivot Aggregation: Reshape data, converting rows to columns.
  3. Rollups and Cubes: Perform hierarchical and combinational aggregations.
  4. Ranking Functions: Assign ranks within data partitions.
  5. Analytic Functions: Compute aggregates while maintaining row-level details.

Database Schema

stockdata-schema

Configuring Cassandra

  1. Create a Keyspace and Table: Execute the following CQL commands to set up your Cassandra database:
    CREATE KEYSPACE stockdata WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};
    
    CREATE TABLE stockdata.stocks (
        stock text,
        trade_id uuid,
        price decimal,
        quantity int,
        trade_type text,
        trade_date date,
        trade_time time,
        PRIMARY KEY (stock, trade_id)
    );

System Architecture

system-architecture

Docker Compose

  1. Launch Services: Use Docker Compose to start Kafka, Zookeeper, Cassandra, and Spark services:

     version: '3.9'
    
     name: "realtime-stock-market"
    
     services:
     zookeeper:
         image: bitnami/zookeeper:latest
         ports:
         - "2181:2181"
         environment:
         - ALLOW_ANONYMOUS_LOGIN=yes
         networks:
         stock-net:
             ipv4_address: 172.28.1.1
             
     kafka:
         image: bitnami/kafka:latest
         ports:
         - "9092:9092"
         environment:
         - KAFKA_BROKER_ID=1
         - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
         - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://172.28.1.2:9092
         - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
         - ALLOW_PLAINTEXT_LISTENER=yes
         depends_on:
         - zookeeper
         networks:
         stock-net:
             ipv4_address: 172.28.1.2
         volumes:
         - ./scripts/init-kafka.sh:/init-kafka.sh
         # entrypoint: ["/bin/bash", "init-kafka.sh"]
         restart: always
    
     cassandra:
         image: cassandra:latest
         ports:
         - "9042:9042"
         volumes:
         - ./init-cassandra:/init-cassandra
         - ./scripts/init-cassandra-schema.sh:/init-cassandra-schema.sh
         environment:
         - CASSANDRA_START_RPC=true
         networks:
         stock-net:
             ipv4_address: 172.28.1.3
         # entrypoint: ["/bin/bash", "init-cassandra-schema.sh"]
         restart: always
    
     spark:
         image: bitnami/spark:latest
         volumes:
         - ./spark:/opt/bitnami/spark/jobs
         - ./scripts/submit-spark-job.sh:/opt/bitnami/spark/submit-spark-job.sh
         ports:
         - "8080:8080"
         depends_on:
         - kafka
         networks:
         stock-net:
             ipv4_address: 172.28.1.4
         # entrypoint: ["sh", "-c", "./submit-spark-job.sh"]
         restart: always
    
     kafka_producer:
         build:
         context: ./kafka-producer
         dockerfile: kafka_producer.dockerfile
         depends_on:
         - kafka
         networks:
         stock-net:
             ipv4_address: 172.28.1.8
         restart: always
    
     plotly:
         build:
         context: ./plotly
         dockerfile: plotly.dockerfile
         volumes:
         - ./plotly/dashboard.py:/dashboard.py
         ports:
         - "8050:8050"
         depends_on:
         - cassandra
         networks:
             stock-net:
             ipv4_address: 172.28.1.9
         restart: always
    
     networks:
     stock-net:
         driver: bridge
         ipam:
         config:
             - subnet: 172.28.0.0/16
  2. Run Docker Compose:

    docker-compose up -d

Usage

  1. Run the Spark Job: Use the spark-submit command to run your Spark job.

    $ spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,com.datastax.spark:spark-cassandra-connector_2.12:3.0.0 spark_job.py stocks
  2. Produce and Consume Data: Start producing data to the stocks topic and monitor the pipeline's output.

Monitoring and Logging

Check the logs for each service in their respective directories for monitoring and debugging.

Visualizations

To run the dashbaord, you need to run the following command:

$ cd plotly & python3 dashboard.py

graph 1

graph 2

graph 3

graph 4

Testing

docker-compose-d

docker-monitoring

docker-ps

cqlsh

stocks-data-before

creat-kafka-topic

kafka-producer

spark-processing-1

spark-processing-1

cassandra

Tables Results

Stocks Table

stocks

Analysis Stocks Table

analytics_stocks

Analysis Stocks Table

grouped_stocks

Pivoted Stocks Table

grouped_stocks

Ranked Stocks Table

grouped_stocks

Rollup Stocks Table

grouped_stocks

Contributing

Contributions to RealTime StockStream are welcome, just open a PR 😊.

Authors

License

This project is licensed under the MIT License.

About

RealTime StockStream is a streamlined, simulation system for processing live stock market data. It uses Apache Kafka for data input, Apache Spark for data handling, and Apache Cassandra for data storage, making it a powerful yet easy-to-use tool for financial data analysis

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published