ECE 5290/7290 | Cornell

ECE 5290/7290 & ORIE 5290: Distributed Optimization for Machine Learning and AI

Instructor: Tianyi Chen | tianyi.chen@cornell.edu | Semester: Fall 2025

Class time: MoWe 1:25 PM – 2:40 PM | Location: Cornell Tech, Bloomberg Center 161

Overview

This is a graduate-level course about theory, algorithms, and applications of distributed optimization and machine learning. The course covers the basics of distributed optimization and learning algorithms, and their performance analyses when they are used to solve large-scale distributed problems arising in AI, machine learning, signal processing, communication networks, and power systems.

Prerequisites: Linear Algebra, Calculus, Probability, and Basic ML and optimization background
Format: Lectures + Homework + Exam (only a little) + Project

Public resources: The lecture slides and assignments will be posted online as the course progresses. We are happy for anyone to learn from these resources, but we cannot grade the work of any unenrolled students

Cornell students: Students should ask all course-related questions on Ed Discussion, submit homework on Gradescope, and find all the announcements on Canvas.

Instructor, TAs, and Graders

Tianyi Chen
Instructor
Email

Yuheng Wang
Teaching Assistant
Email

Anisha Kechaphna
Grader
Email

Zhaoxian Wu
Student Volunteer
Email

Modules

Course surveys: To help tailor the course and improve your learning experience, please complete:

Background Survey (first class) — let us understand your preparation and interests.
Anonymous Feedback Form — share your thoughts or suggestions during the semester.

Week 1 Aug 25 – Aug 27 2 sessions

Mon, Aug 25

Introduction & Motivation
Overview of course goals, applications of distributed optimization in modern AI and ML systems, and motivating examples from large-scale model training and federated learning.
SlidesWelcome to the course! We’ll set the stage for why distributed optimization is now central to AI at scale.
Wed, Aug 27

ML fundamentals for optimization (ERM, supervised learning)
Review of empirical risk minimization (ERM), supervised learning frameworks, convexity basics, and gradient-based learning methods.
Slides

Week 2 Sep 01 – Sep 03 2 sessions

Mon, Sep 01

No class - Labor day
No Class
Wed, Sep 03

ML Fundamentals Continued: Optimization Viewpoint
From linear and nonlinear regression to general loss functions; connecting optimization to machine learning tasks and understanding challenges in non-convex learning.
Slides

Week 3 Sep 08 – Sep 10 2 sessions

Mon, Sep 08

Optimization Basics I: Gradient Descent on Quadratic Problems
Derivation and intuition of gradient descent, convergence on quadratic objectives, and the geometry of step sizes and conditioning.
Slides
Wed, Sep 10

Optimization Basics II: Convergence and Complexity
Convergence analysis for smooth and strongly convex functions; understanding sublinear and linear rates of convergence for gradient descent.
SlidesHW 1 due (9/12)

Week 4 Sep 15 – Sep 17 2 sessions

Mon, Sep 15

Beyond Convexity: Non-Convex Landscapes and Smoothness
Extending gradient-based methods to non-convex settings; smoothness assumptions, saddle points, and convergence guarantees.
Slides
Wed, Sep 17

Gradient Methods for Constrained Optimization
Projected gradient methods, constraint handling in distributed settings, and convergence rates under constraints.
Slides

Week 5 Sep 22 – Sep 24 2 sessions

Mon, Sep 22

Stochastic Optimization: SGD, Minibatching, and Convergence
Fundamentals of stochastic gradient descent, convergence properties under noise, and trade-offs between batch size and computation.
Slides
Wed, Sep 24

Variance Reduction and Momentum
Modern SGD enhancements — variance-reduced methods (SVRG) and momentum techniques for accelerating convergence.
Slides

Week 6 Sep 29 – Oct 01 2 sessions

Mon, Sep 29

Variance Reduction and Momentum in Practice
Finite-sum optimization, adaptive methods (Adam, AdaGrad), and a deeper look at the variance–bias trade-off in stochastic learning.
SlidesHW 2 due (9/26)
Wed, Oct 01

Consensus and Spectrum of Graphs
Fundamentals of consensus algorithms, properties of doubly-stochastic matrices, and spectral connectivity measures in networks.
SlidesProject announced

Week 7 Oct 06 – Oct 08 2 sessions

Mon, Oct 06

Gossip and Random Walks
Randomized gossip algorithms, asynchronous communication, and relationships between random walks and averaging in networks.
Slides
Wed, Oct 08

Data and Model Parallelism in Distributed Training
Data and model parallelism in distributed training; local SGD, synchronization intervals, and communication–computation balancing.
Slides

Week 8 Oct 13 – Oct 15 2 sessions

Mon, Oct 13

No class - Fall break
No Class
Wed, Oct 15

Communication-Efficient Distributed Methods
Quantization and local updates; techniques to reduce bandwidth while preserving convergence in distributed learning.
SlidesPractice problems announced

Week 9 Oct 20 – Oct 22 2 sessions

Mon, Oct 20

Communication-Efficient Distributed Methods
Sparsification, and worker selections; techniques to reduce bandwidth while preserving convergence in distributed learning.
SlidesHW 3 due (10/20)
Wed, Oct 22

Decentralized Algorithms: Consensus GD and Its Convergence
Gradient tracking algorithms, convergence under directed and time-varying graphs.
Slides

Week 10 Oct 27 – Oct 29 2 sessions

Mon, Oct 27

No class - Asynchronous Office Q&A on Ed Discussion and Emails
Welcome Q&A for HWs 1-3 and Practice problemsNo Class
Wed, Oct 29

In-person Exam
Project Idea Due (11/1)Exam

Week 11 Nov 03 – Nov 05 2 sessions

Mon, Nov 03

Robust distributed optimization: adversaries, attacks and defenses
Robustness against adversarial clients, Byzantine attacks, and noisy updates; algorithmic defenses and aggregation strategies.
SlidesTry every effort to attend; attendance bonus
Wed, Nov 05

The curse of data heterogeneity in distributed learning
Personalization techniques and handling heterogeneous data distributions in federated settings.
Slides

Week 12 Nov 10 – Nov 12 2 sessions

Mon, Nov 10

Transformers - Architecture, Parameters, and Memories
Transformer architectures, parameter scaling laws, and memory components relevant to efficient and distributed training.
Slides
Wed, Nov 12

Memory footprint of GPT and mixed precision training
Memory bottlenecks in GPT training, covering parameters, activations, optimizer states, and how mixed-precision methods enable efficient distributed optimization.
SlidesHW 4 due (11/14)

Week 13 Nov 17 – Nov 19 2 sessions

Mon, Nov 17

Analogc computing for energy-efficient AI: Part I
Principles of analog computing for neural network inference, focusing on in-memory computation, and hardware–algorithm co-design.
Slides
Wed, Nov 19

Analogc computing for energy-efficient AI: Part II
Analog computing techniques for neural network training, highlighting gradient computation and hardware–algorithm co-design under device nonidealities.
Slides

Week 14 Nov 24 – Nov 26 2 sessions

Mon, Nov 24

Pre-training and fine-tuning LLMs
Optimization methods for pre-training and fine-tuning LLMs, including data scaling, parameter-efficient adaptation, and distributed training tradeoffs.
Slides
Wed, Nov 26

No class - Thanksgiving
HW 5 due (12/1)No Class

Week 15 Dec 01 – Dec 03 2 sessions

Mon, Dec 01

Project presentation - Part I
ECE/ORIE 5290 student projects: educational presentations highlighting key papers, methods, and open challenges from the recent literature in distributed optimization.
Wed, Dec 03

Project presentation - Part II
ECE/ORIE 5290/ECE 7290 Students project presentations: showcasing research topics, key findings, and proposed extensions in distributed optimization.

Week 16 Dec 08 – Dec 08 1 sessions

Mon, Dec 08

Project presentation - Part III
ECE 7290 Students project presentations: showcasing research topics, key findings, and proposed extensions in distributed optimization.
Project Report due (12/13)

Assignments & Project

📘HW1: Mathematical and Machine Learning Foundations Due: Sep 12, 2025
• Solutions
📘HW2: Gradient Descent Convergence and Trade-offs of SGD Due: Sep 28, 2025
Starter code ↗ • Solutions
📘HW3: Consensus Averaging and Spectral Properties of Graphs Due: Oct 20, 2025
Starter code ↗ • Solutions
📘Extra Practice Problems Due: Oct 27, 2025
• Solutions
📘Project Idea Due: Nov 01, 2025
📘HW4: Implementation of Various Forms of All-reduce Due: Nov 15, 2025
Starter code ↗
📘HW5: Implementation of Zero Redundancy Optimizer Due: Dec 01, 2025
Starter code ↗
📘Project Guidelines Due: Dec 13, 2025

Synergy with other ECE Courses

Due to space limitations, the current list highlights selected ECE courses offered at Cornell Tech that are open to Master’s students and have strong thematic connections with ECE 5290.

ECE 5242 — Intelligent Autonomous Systems: Shares strong links through optimization and control for multi-agent and autonomous systems, where distributed algorithms enable real-time decision-making and coordination.
ECE 5260 — Graph-Based Data Science for Networked Systems: Aligns with distributed and graph-structured optimization, where consensus algorithms form a unifying framework for networked learning and inference.
ECE 5414 — Applied Machine Learning: ECE 5290 complements this course by emphasizing the optimization algorithms that drive scalable learning, convergence analysis, and the theoretical underpinnings of model training.
ECE 5415 — Digital Signal Processing and Learning: Connects signal processing and optimization perspectives, showing how filtering, transforms, and iterative methods relate to modern ML and neural network training.
ECE 5545 — Machine Learning Hardware and Systems: Extends concepts to practical implementation, examining how distributed optimization interacts with computing hardware and systems for efficient ML deployment.

Resources

Optimization Methods for Large-Scale Machine Learning by Leon Bottou & Frank Curtis & Jorge Nocedal
First-Order Methods in Optimization by Amir Beck
Large-Scale Convex Optimization: Algorithms & Analyses via Monotone Operators by Ernest Ryu & Wotao Yin

Acknowledgement

Parts of the course material were adapted from STAT9910-303: Large-Scale Optimization for Data Science by Yuxin Chen (Wharton Statistics and Data Science, UPenn).
Parts of the course material were drawn from Optimization for Deep Learning and Foundation Models by Kun Yuan (Center for Machine Learning Research, PKU).

Course Policies

Academic integrity, late submission, and collaboration policies will follow Cornell Tech standards.