About The Course

Expert Compute Solutions course on Apache Spark and Scala will cover in-depth concepts Big Data processing and approach to solve complex data problems using Apache Spark.Apache Spark is an open-source data analytics cluster computing framework and general engine for large-scale data processing building on top of the Hadoop Distributed File System (HDFS).There will be many practical and focused hands-on sessions

Course Objectives

  • Understanding Big Data concepts and challenges

  • Approach to solve Big Data problems with Apache Spark

  • In-depth knowledge in implementing Apache Spark concepts

  • Programming with Scala

  • Data processing using Spark

Target Audience

  • Software Professionals

  • Data Scientists

  • ETL developers

  • Analysts and Project Managers


Intermediate programming skills in either Scala or Python or basic understanding of functional programming and object oriented programming.

Why learn Apache Spark & Scala?

Spark is an open source alternative to MapReduce designed to make it easier to build and run fast and sophisticated applications on Hadoop.Spark comes with a library of machine learning (ML) and graph algorithms, and also supports real-time streaming and SQL apps, via Spark Streaming and Shark, respectively. Spark apps can be written in Java, Scala, or Python, and have been clocked running 10 to 100 times faster than equivalent MapReduce apps.

Companies Using Apache Spark & Scala

Yahoo,Amazon, Baidu, Opentable, ebay, Alibaba Taobao, IBM etc.

Course Curriculum

Introduction to Scala

  • Overview of Scala

  • Installing Scala

  • Scala Basics

  • IDE for Scala

Scala Programming

  • Variables & Methods

  • Functions

  • Array

  • Sets

  • Literals

  • Lists

  • Tuples

  • Options

  • Maps

  • Reserved Words

  • Operators

  • Precedence Rules

  • If statements

  • While Loops

  • Do-While Loops

  • Conditional Operators

  • Pattern Matching

  • Enumerations

Traits & OOPs in Scala

  • Traits

  • Classes & Objects

  • Functional Programming

Introduction to Spark

  • Problems with Traditional Large-Scale Systems

  • Introducing Spark

  • What is Spark?

Spark Basics

  • Spark Installation

  • Spark Shell

  • File operations in Spark

  • Spark Streaming Overview


  • RDD Operations

  • Key-Value Pair RDD

  • MapReduce and Pair RDD Operations

  • Scala and Hadoop Integration

Parallel Programming Using Spark

  • RDD Partitions and HDFS Data Locality

  • Working with Partitions

  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage

  • Caching Overview

  • Distributed Persistence

Spark Algorithms

  • Spark SQL

  • Spark Streaming

  • MLlib

  • GraphX

Course Complete Certifications

  • Once your course gets completed you will receive a Course Complete Certificate.

POC (Proof Of Concept) Certification

  • You will be provided with real time data-set for your POC projects.On successful completion of your POC project (Reviewed by an expert),you will receive a certificate with a performance-based grading.