Instructor: Tianyi Chen | tianyi.chen@cornell.edu | Semester: Fall 2025
Class time: MoWe 1:25 PM – 2:40 PM | Location: Cornell Tech, Bloomberg Center 161
This is a graduate-level course about theory, algorithms, and applications of distributed optimization and machine learning. The course covers the basics of distributed optimization and learning algorithms, and their performance analyses when they are used to solve large-scale distributed problems arising in AI, machine learning, signal processing, communication networks, and power systems.
Public resources: The lecture slides and assignments will be posted online as the course progresses. We are happy for anyone to learn from these resources, but we cannot grade the work of any unenrolled students
Cornell students: Students should ask all course-related questions on Ed Discussion, submit homework on Gradescope, and find all the announcements on Canvas.
Tianyi Chen
Instructor
Email
Yuheng Wang
Teaching Assistant
Email
Anisha Kechaphna
Grader
Email
Zhaoxian Wu
Student Volunteer
Email
Course surveys: To help tailor the course and improve your learning experience, please complete:
Due to space limitations, the current list highlights selected ECE courses offered at Cornell Tech that are open to Master’s students and have strong thematic connections with ECE 5290.
These papers offer background and provide deeper insights into distributed optimization and its applications in machine learning.
Advances and Open Problems in Federated Learning
P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, et al.; arXiv preprint, 2021
The definitive survey of federated learning—covering foundations, system challenges, privacy, personalization, and open research directions. Essential background for advanced topics in this course.
DeepSpeed: System Optimizations Enable Training Beyond 100 Billion Parameters
J. Rasley, S. Rajbhandari, O. Ruwase, Y. He; Proceedings of the International Conference for High Performance Computing (SC), 2020
System-level innovations enabling efficient large-scale model training—essential reading for project work.
Local SGD Converges Fast and Communicates Little
S. U. Stich; ICLR, 2019
Formal analysis of local SGD showing near-linear speedup with limited communication.
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
T. Chen, G. Giannakis, T. Sun, W. Yin; NeurIPS, 2018
Introduces LAG, which adaptively skips redundant gradient updates to reduce communication cost without harming convergence.
Communication Compression for Decentralized Training
H. Tang, X. Lian, M. Yan, C. Zhang, J. Liu; NeurIPS, 2018
Shows how gradient compression techniques accelerate decentralized training while maintaining convergence.
Achieving geometric convergence for distributed optimization over time-varying graphs
A. Nedić, A. Olshevsky, W. Shi; SIAM Journal on Optimization, 2017
Pioneering paper introducing gradient tracking, enabling linear convergence in time-varying graphs.
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. Arcas; AISTATS, 2017
Introduces FedAvg, the foundational algorithm for federated learning.
On the Convergence of Decentralized Gradient Descent
K. Yuan, Q. Ling, W. Yin; SIAM Journal on Optimization, vol. 26, no. 3, pp. 1835–1854, 2016
A seminal theoretical study of decentralized gradient descent (DGD), providing a convergence rate analysis for both diminishing and fixed step sizes over static networks.
EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization
W. Shi, Q. Ling, K. Yuan, G. Wu, W. Yin; SIAM Journal on Optimization, 2015
A breakthrough decentralized algorithm achieving exact convergence to the global optimum using local updates.
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein; Foundations and Trends in Machine Learning, 2011
The classic monograph on ADMM—still the most cited reference for distributed convex optimization.
Academic integrity, late submission, and collaboration policies will follow Cornell Tech standards.