05696601 Introduction to Big Data Ecosystem 🏖️
Academic Year 2025 | Semester 2
Course Description
Many people talk about Big Data, Big Data, and Big Data... What are they? Throughout this course, the term Big Data, which has been one of the most buzzing words in IT, will be demystified. This course wraps around three main characteristics of Big Data, also known as 3Vs (Volume, Variety, Velocity). Students will learn challenges posed by each characteristic along with techniques to tackle each of them. Those techniques include database partitioning/replication, distributed processing, schemaless data models, object storage, data warehouse, data lake, lakehouse, and stream processing. Modern data platforms (e.g., Spark, MongoDB, Neo4j, MinIO, DuckLake, Databricks, Apache Kafka) will be leveraged to demonstrate those techniques.
Course Instructor
Course Information
Course Evaluation
Schedule
Date Topic
Nov 24 Lecture #1 - Introduction to Big Data
Online A1 Out
Dec 1 Lecture #2 - Horizontal Scaling
Online
Dec 8 Lecture #3 - Programming Models
Online A1 Due A2 Out
Dec 15 Lecture #4 - Schemaless Data Models
Online
Dec 22 Lab #1 - Spark and Spark SQL
A2 Due A3 Out
Dec 29
New Year Week
Lecture #5 - Distributed Object Store
Video Only
Jan 5 Lab #2 - MongoDB
Jan 12 Lab #3 - Neo4j
A3 Due A4 Out
Jan 19 Midterm Week (No Class)
Jan 26 Lab #4 - MinIO
Project Proposal Due
Feb 2 Lecture #6 - Data Warehouse, Data Lake, and Lakehouse
A4 Due
Feb 9 Lab #5 - DuckLake
Project Checkpoint 1 Due
Feb 16 Lab #6 - Databricks
Feb 23 Lecture #7 - Stream Processing
Project Checkpoint 2 Due
Mar 2 Lab #7 - Apache Kafka
Mar 9 Project Presentation
Mar 25 Final Exam
Assignment