05696601 Introduction to Big Data Ecosystem 🏖️
Academic Year 2025 | Semester 2
Course Description
Many people talk about Big Data, Big Data, and Big Data... What are they? Throughout this course, the term Big Data, which has been one of the most buzzing words in IT, will be demystified. This course wraps around three main characteristics of Big Data, also known as 3Vs (Volume, Variety, Velocity). Students will learn challenges posed by each characteristic along with techniques to tackle each of them. Those techniques include database partitioning/replication, distributed processing, schemaless data models, object storage, data warehouse, data lake, lakehouse, and stream processing. Modern data platforms (e.g., Spark, MongoDB, Neo4j, MinIO, DuckLake, Apache Kafka) will be leveraged to demonstrate those techniques.
⚠️ Recommended for students who received B or higher grade in DBP
Course Instructor
Course Information
Course Evaluation
Schedule
⚠️ The first three sessions will NOT be in-person, as the instructor still gets stuck in 🇺🇸
Date Topic
Nov 24 Lecture #1 - Introduction to Big Data
Online A1 Out
Dec 1 Lecture #2 - Horizontal Scaling and Database Partitioning
Online
Dec 8 Lecture #3 - Database Replication
Video Only A1 Due A2 Out
Dec 15 Lecture #4 - Big Data Programming Model
Dec 22 Lab #1 - Spark and Spark SQL
A2 Due A3 Out
Dec 29
New Year Week
Lecture #5 - Schemaless Data Model
Video Only
Jan 5 Lab #2 - MongoDB
Jan 12 Lab #3 - Neo4j
A3 Due A4 Out
Jan 19 Midterm Week (No Class)
Jan 26 Lecture #6 - Object Storage
Project Proposal Due
Feb 2 Lab #4 - MinIO
A4 Due
Feb 9 Lecture #7 - Data Warehouse, Data Lake, and Lakehouse
Project Checkpoint 1 Due
Feb 16 Lab #5 - DuckLake
Feb 23 Lecture #8 - Stream Processing
Project Checkpoint 2 Due
Mar 2 Lab #6 - Apache Kafka
Mar 9 Project Presentation
Mar 25 Final Exam
Assignment