About The Course

Expertcompute course on Apache Spark and Scala will cover in-depth concepts Big Data processing and approach to solve complex data problems using Apache Spark.Apache Spark is an open-source data analytics cluster computing framework and general engine for large-scale data processing building on top of the Hadoop Distributed File System (HDFS).

Course Objectives

  • Understanding Big Data concepts and challenges

  • Approach to solve Big Data problems with Apache Spark

  • In-depth knowledge in implementing Apache Spark concepts

  • Programming with Scala

  • Data processing using Spark

Target Audience

  • Software Professionals

  • Data Scientists

  • ETL developers

  • Analysts and Project Managers

Pre-requisite

Intermediate programming skills in either Scala or Python or basic understanding of functional programming and object oriented programming.

Why learn Apache Spark & Scala?

Spark is an open source alternative to MapReduce designed to make it easier to build and run fast and sophisticated applications on Hadoop.Spark comes with a library of machine learning (ML) and graph algorithms, and also supports real-time streaming and SQL apps, via Spark Streaming and Shark, respectively. Spark apps can be written in Java, Scala, or Python, and have been clocked running 10 to 100 times faster than equivalent MapReduce apps.

Companies Using Apache Spark & Scala

Yahoo,Amazon, Baidu, Opentable, ebay, Alibaba Taobao, IBM etc.

Course Curriculum

Introduction to Scala

  • Overview of Scala

  • Installing Scala

  • Scala Basics

  • IDE for Scala

Scala Programming

  • Variables & Methods

  • Functions

  • Array

  • Sets

  • Literals

  • Lists

  • Tuples

  • Options

  • Maps

  • Reserved Words

  • Operators

  • Precedence Rules

  • If statements

  • While Loops

  • Do-While Loops

  • Conditional Operators

  • Pattern Matching

  • Enumerations

Traits & OOPs in Scala

  • Traits

  • Classes & Objects

  • Functional Programming

Introduction to Spark

  • Problems with Traditional Large-Scale Systems

  • Introducing Spark

  • What is Spark?

Spark Basics

  • Spark Installation

  • Spark Shell

  • File operations in Spark

  • Spark Streaming Overview

RDDs

  • RDD Operations

  • Key-Value Pair RDD

  • MapReduce and Pair RDD Operations

  • Scala and Hadoop Integration

Spark Algorithms

  • Spark SQL

  • Spark Streaming

  • MLlib

  • GraphX