Skip to content

Efficiently tackle large datasets and perform big data analysis with Spark and Python

License

Notifications You must be signed in to change notification settings

TrainingByPackt/Big-Data-Processing-with-Apache-Spark-eLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub issues GitHub forks GitHub stars PRs Welcome

Big Data Processing with Apache Spark eLearning

Processing big data in real-time is challenging due to scalability, information consistency, and fault tolerance. This course shows you how you can use Spark to make your overall analysis workflow faster and more efficient. You'll learn all about the core concepts and tools within the Spark ecosystem, like Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.

What you will learn

  • Write your own Python programs that can interact with Spark
  • Implement data stream consumption using Apache Spark
  • Recognize common operations in Spark to process known data streams
  • Integrate Spark streaming with Amazon Web Services
  • Create a collaborative filtering model with Python and the movielens dataset
  • Apply processed data streams to Spark machine learning APIs

Hardware requirements

For an optimal student experience, we recommend the following hardware configuration:

  • Processor: Intel Core i5 or equivalent
  • Memory: 4 GB RAM
  • Hard disk: 40 GB available space
  • An Internet connection

Software requirements

You’ll also need the following software installed in advance:

  • Operating System: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit
  • Browser: Google Chrome, Latest Version
  • PostgreSQL 9.0 or above
  • Spark 2.3.0
  • Amazon Web Services (AWS) account

About

Efficiently tackle large datasets and perform big data analysis with Spark and Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages