Skip to content

HarshitDawar55/Apache_Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache_Spark

✨🎉 This repository explains about Apache Spark with practicals 🎉✨

What is Apache Spark?

  • It is a real time processing tool which is developed to address the problem of working with real time Data.
  • It is handled by Apache.
  • It is lightning fast, gives result on click.
  • Uses Lazy Evaluation i.e process whenever required.
  • As MapReduce was unable to handle real time data, Spark come into picture to help.
  • It is now used by many big tech giants like Oracle, Amazon, Microsoft, Visa, Cisco, Verizon, Hortonworks.
  • Like above we have 3000 companies using Apache Spark.

Downloading Apache Spark

Installing Apache Spark

Languages used for Programming in Spark:

  • Python
  • Scala

Instructions for running Apache Spark scripts

  • Copy the code from the repective script and paste it in the corresponding Spark shell to run code.

Instructions for running python scripts

  • Open terminal(in Linux encironment)/Command Propmpt(Windows).
  • Run "python <script name>"

LICENSE

To check the license visit LICENSE