05696601 Introduction to Big Data Ecosystem 🏖️🌏
Academic Year 2025 | Semester 2
BigGrade | 4rum
Course Description
This course wraps around three main characteristics of Big Data, also known as 3Vs (Volume, Variety, Velocity). Students will learn challenges posed by each characteristic along with techniques to tackle each of them. Those techniques include database partitioning/replication, distributed processing, schemaless data models, object storage, data warehouse, data lake, lakehouse, and stream processing. Modern data platforms (e.g., Spark, Ray, MongoDB, Neo4j, MinIO, dbt, DuckLake, Databricks, Apache Kafka) will be leveraged to demonstrate those techniques.
Course Information
  • Time: Monday 9:00 - 12:00
  • Location: Sc08 Room 714
  • Office Hours: Monday 13:00 - 16:00
Course Evaluation
  • Homework Assignment 20%
  • Lab Assignment 20%
  • Project 40%
  • Final Exam 20%
Course Staff

Yuttapichai Kerdcharoen (Guide)
Instructor
Malapchai Chaisihat (Nueng)
Head Teaching Assistant
Schedule
Date Topic
Nov 24 Lecture #1 - Introduction to Big Data
Online A1 Out
Slides Video
Dec 1 Lecture #2 - Evolution of Data Technologies
Online
Slides Video
Dec 8 Lecture #3 - Distributed Databases (Sharding and Replication)
Online A1 Due
Slides
Dec 15 Lab #1 - Building Data Pipelines with dbt
Dec 22 Lecture #4 - Distributed Processing
Dec 29
New Year Week
Lecture #5 - Schemaless Data Models
Video Only
Jan 5 Lab #2 - MongoDB 101 / Neo4j 101
A2 Out
Jan 12 Lecture #6 - Modern Data Warehouse, Data Lake, and Lakehouse
Jan 19 Midterm Week (No Class)
Jan 26 Lab #3 - Building Data Lakes with DuckLake and MinIO
Project Proposal Due
Feb 2 Lab #4 - PySpark and Ray Data for Distributed Data Transformation
A2 Due
Feb 9 Lab #5 - Databricks
Project Checkpoint 1 Due
Feb 16 Lecture #7 - Stream and Incremental Processing
Feb 23 Lab #6 - Apache Kafka
Project Checkpoint 2 Due
Mar 2 Lecture #8 - Emerging Big Data Technologies
Mar 9 Project Presentation
Mar 25 Final Exam
Assignment