About Me Education Research Teaching Project Competition
Yuttapichai "Guide" Kerdcharoen
Ph.D. Candidate @ Carnegie Mellon University and CMKL University
LinkedIn GitHub CV

About Me

I am a software developer who is obsessed in database systems and high-performance computing. Currently, I am conducting research on high-performance graph query engines.

My other obsession is in teaching. I can spend several hours teaching without being tired out.

Two of my ambitions are (1) to create a frontier data system group in Thailand and (2) develop a world-class Computer Science program in Thailand.

Education

Ph.D. Candidate (Electrical and Computer Engineering) @ Carnegie Mellon University and CMKL University
Dual Degree
Teaching Assistant: CMU 18-746 Storage Systems (F22), CMKL 18-613 Foundation of Computer Systems (S24)
Selected Coursework: CMU 15-799 Special Topics in Databases: Query Optimization (S25), CMU 15-721 Advanced Database Systems (S23), CMU 18-746 Storage Systems (F21), CMU 15-745 Optimizing Compilers for Modern Architectures (S23), CMU 18-740 Modern Computer Architecture and Design (F22)

B.Sc. (Computer Science) @ KMITL
First Class Distinction, Gold Medal (GPA 3.98/4.00)
Co-Founder of CS-INFINITE, a group of students who initiates passion-based activities for undergraduate students including Code Arcade and Programming Bootcamp.
Selected Coursework: Database Systems, Data Warehousing, Operating Systems, Automata Theory

Research

Linear Algebra Approaches for Directed Triad Counting and Enumeration @ IA3 2024
Yuttapichai Kerdcharoen, Upasana Sridhar, Orathai Sangpetch, Tze Meng Low
Workshop Paper
Abstract Triangle counting and enumeration are commonly used in real-world applications on directed graphs. However, the performance of triangle counting algorithms is usually benchmarked on undirected graphs. As such, many of these algorithms and formulations are not suitable for identifying the types of directed triangles in directed graphs. In this work, we show how algorithms for counting each type of directed triad (directed triangle) can be formulated using linear algebra. Leveraging the FLAME methodology, we show that provably correct counting and enumeration algorithms for directed triads can be derived from the linear algebraic formulation. These algorithms can be used to either count individual triads or together to count all possible triads. We show that despite being designed for individual use, the combined use of these algorithms yields a speedup of 16.77x to 1122.2x over the implementation in NetworkX, and 0.37x to 33.49x over GraphBLAS implementations using SuiteSparse 7.2 on various workloads from real-world directed graphs.

Exploiting Fusion Opportunities in Linear Algebraic Graph Query Engines @ IEEE HPEC 2023
Yuttapichai Kerdcharoen, Upasana Sridhar, Tze Meng Low
Conference Paper
Abstract Queries in a graph database are often converted into a sequence of graph operations by a graph query engine. In recent years, it has been recognized that the query engine benefits from using high-performance graph libraries via the GraphBLAS interface to implement time-consuming operations such as graph traversal. However, using GraphBLAS requires explicitly casting data into linear algebra objects and decomposing the query into multiple operations, some of which are expressible by the GraphBLAS. The combination of these two requirements translates into increased memory footprints and additional execution times. In this paper, we show that fusing different stages of the query engines into GraphBLAS calls can reduce the size of the intermediate data generated during the query. Furthermore, by relaxing the semi-ring constraints imposed by GraphBLAS, more aggressive fusions of the stages can be performed. We show a speedup of up to 1235.89x (8.82x on geometric average) relative to an open-source graph query engine using GraphBLAS (i.e. RedisGraph) for processing undirected subgraph enumeration queries.

Opportunities for Linear Algebraic Graph Databases @ ARRAY 2023
Yuttapichai Kerdcharoen, Upasana Sridhar, Tze Meng Low
Extended Abstract
Abstract In recent years, there has been renewed interest in casting graph algorithms in the language of linear algebra. By replacing the computations with appropriate operations over different semi-rings, different graph algorithms can be cast as a sequence of linear algebra operations. In this work, we study the use of the linear algebraic approach to graph algorithms within the context of graph database systems. Specifically, we identify the issues with using existing linear algebraic graph libraries, such as SuiteSparse, which conform to the GraphBLAS specifications. We also highlight gaps between the GraphBLAS specification and computations that are required by the graph query algorithms utilized in graph databases. We show that overcoming these challenges in using a linear algebraic approach within a graph database system can lead to significant performance improvements to an open-source graph database system.

Teaching

Database Programming in Practice (Spring 2025) @ K-DAI, KMITL
Instructor
Teach and develop the course. Topics include data models, query languages (focusing on SQL), physical data organization, storage model (NSM, DSM, PAX), query engine, query planning and optimization, concurrency control, and recovery control. DuckDB is used to demonstrate the concepts.


Distributed Data Storage (Fall 2024) @ CMKL University
Instructor
Teach and develop the course. Topics include multi-disk systems (RAID), internal consistency mechanism, caching, distributed system basics, and distributed file systems (multi-client, multi-server).


Parallel Computing (Fall 2024) @ CMKL University
Instructor
Teach and develop the course. Topics include performance, parallelism basics, modern computer architecture (pipelining, superscalar, out-of-order processing, multicore, GPU), instruction-level parallelism (unrolling, separate accumulator, vectorization), and shared-memory parallelism (memory consistency, cache coherency). SIMD, POSIX thread (pthread), and OpenMP are used to demonstrate the concepts.


Storage and File System Fundamentals (Fall 2024) @ CMKL University
Instructor
Teach and develop the course. Topics include device management, persistent storage (HDD, NAND Flash SSD), file system design and implementation techniques.


Database Management (Fall 2024) @ SUIC
Instructor
Teach and develop the course. Topics include SQL (single-table, multiple-table, DDL, constraints) and logical database design. MySQL and DuckDB are used to demonstrate the concepts.


Special Topic in Computer Science 2: Computer Systems (Fall 2024) @ Computer Science, KMITL
Co-Instructor
Teach and develop the course. Topics include data abstraction, machine language, memory hierarchy, code optimization, virtual memory, dynamic memory allocation, network programming, concurrent programming, and thread-level parallelism.


Foundation of Programming (Summer 2024) @ K-DAI, KMITL
Instructor
Teach and develop the course. Topics include problem solving, algorithmic thinking, abstractions, structured programming (sequential, selective, iterative), subprograms, object-oriented programming (object, message passing, inheritance), file I/O, and well-documented code.


Database Programming in Practice (Spring 2024) @ K-DAI, KMITL
Co-Instructor
Co-teach and co-develop the course. Topics include SQL (single-table, multiple-table, DDL, constraints) and logical database design. DuckDB is used to demonstrate the concepts.

Project

dbt's DAG Rewriter
Course Project
A DAG rewriter that allows for optimizing multiple SQL queries through traditional optimization heuristics (e.g., predicate/projection pushdown).

Calcite Query Optimizer
Course Project
A two-phase query optimizer (rule-based then cost-based) based on Apache Calcite (Volcano-style). The optimizer includes applying built-in projection/filter pushdown, query decorrelation. Some additional rules (e.g., FilterDistributiveRule) and the NestedLoopJoin's custom cost computation are (re-)written by myself to improve query plans.

db4b
Side Project
The "just-workable" database management system for visually impaired persons.

pgfixeypointy - Fast Fixed-Point Decimals
Course Project
A PostgreSQL extension for libfixeypointy, an open-source fast fixed-point decimal library by CMU-DB.

Article on RedisGraph @ dbdb.io
Course Project

Foreign Data Wrapper for Columnar File Formats
Course Project
A PostgreSQL foreign data wrapper for accessing columnar file formats (similar to Apache Parquet). The foreign data wrapper includes query optimizations such as projection and predicate pushdown.

Compiling Neural Network with Dynamic Resource Adaptation
Course Project
An approach to improve elasticity of FlexGen for supporting multi-tenancy environments.

Hybrid Cloud/Local File System
Course Project
A hybrid cloud/local file system implemented via FUSE aiming to minimize cloud expenses. Key features are deduplication, snapshot, and write-back caching.

ounglang
Side Project
An interpretative, turing-incomplete, and esoteric programming language for seals written in C. Due to its overwhelming stupidity, it won an award at SHT5.

PEaRLS - Programming Evaluation and Rapid Learning System
Side Project
A (possibly legacy) learning management system for CS-KMITL students.

Competition

ASEAN Data Science Explorers 2020 (Thailand National Final)
2nd Place

Thailand's Agoda Programming Competition
Top 50

AI & Big data Challenge for Data Engineers (ABCDE) in Bangkok
2nd Place

ACM-ICPC 2019 Asia Bangkok Regional Contest
Participant

ACM-ICPC 2018 Asia Nakhon Pathom Regional Contest
Participant

Thapster TV Battle
1st Place

TH4K Tournament 5th (Under Top 50)
2nd Place