LLM Inference @ Nvidia
Lucas Liebenwein
Currently, I am a tech lead with the TensorRT-LLM team at NVIDIA, where I co-lead the development of AutoDeploy, a novel compiler-driven approach to high-performance LLM inference.
I joined NVIDIA through the acquisition of OmniML, where I was the founding engineer and chief architect. Prior to that, I was a Ph.D. researcher at MIT CSAIL, advised by Prof. Daniela Rus, where my research was focused on efficient deep learning algorithms and autonomous driving.
Throughout my professional career, I have been passionate about making ML more easily accessible for individuals and organizations alike by bridging the gap from ML research to user-friendly and scalable AI tools and platforms.
Education
PhD, Computer Science @ Massachusetts Institute of Technology | 2018 – 2021
GPA: 5.0/5.0
Thesis: “Efficient Deep Learning: From Theory to Practice”
Advisor: Prof. Daniela Rus
Minor: Math (High-dimensional Probability)
SM, Electrical Engineering & Computer Science @ Massachusetts Institute of Technology | 2016 – 2018
GPA: 5.0/5.0
Thesis: “Contract-Based Safety Verification for Autonomous Driving“
Advisor: Prof. Daniela Rus
Major: Machine Learning
BSc, Mechanical Engineering @ ETH Zurich | 2012 – 2015
GPA: 5.86/6.0 (Valedictorian)
Thesis: “Autonomous Pairing Of Distributed Flight Array Modules” (Advisor: Prof. Raffaello D’Andrea)
Major: Robotics, Control
Experience
Tech Lead @ Nvidia | May 2025 – Now
Working on TensorRT LLM AutoDeploy, a compiler-driven workflow for converting off-the-shelf PyTorch models into inference-optimized graphs. Check out the latest docs: https://nvidia.github.io/TensorRT-LLM/latest/torch/auto_deploy/auto-deploy.html
Engineering Manager @ Nvidia | Feb 2023 – May 2025
Prior to that I worked on algorithmic model optimizations (quantization, pruning, distillation, speculative decoding, …) for LLMs and diffusion models. Now open-sourced as Model Optimizer: https://github.com/NVIDIA/TensorRT-Model-Optimizer
Chief Architect & Founding Engineer @ OmniML (acquired) | Oct 2021 – Feb 2023
We built a scalable, accessible machine learning platform by redefining how we create and deploy deep neural networks in production. Our product is based on years of research into optimizing models for efficient deployment across a wide range of systems.
In my role at OmniML, I led the design and implementation of Omnimizer, our scalable platform for efficient ML training and deployment, while exploring state-of-the-art model optimization research for future product iterations.
Machine Learning Consultant @ Neural Magic | Jul 2021 – Oct 2021
Doctoral Researcher @ MIT Computer Science and Artificial Intelligence Lab (CSAIL) | Sep 2016 – Aug 2021
My research focused on optimizing deep neural networks for resource-constrained applications, such as robotics and cloud computing. I developed novel techniques in model compression and pruning that improve the speed-accuracy trade-off and provide theoretical insights into network design and training.
Before that, I worked on verification algorithms for safe autonomous driving and contributed to our AV research platforms (self-driving cars and wheelchairs) for testing and validation.
Visiting Researcher @ TU Vienna | Jul 2020 – May 2021
Autopilot Software Intern @ Tesla | Jun 2019 – Sep 2019
Visiting Researcher @ Singapore-MIT Alliance for Research & Technology Centre | Jan 2017 – Feb 2017
Autonomous Car Intern @ nuTonomy | Dec 2015 – Feb 2016
I designed, developed, and implemented an automated velocity controller using a novel combination of traditional control techniques and machine learning.
Undergraduate Researcher @ ETH Zurich | Sep 2014 – Oct 2015
Under the supervision of Prof. Raffaello D’Andrea, I co-led the implementation of a real-time operating system for the Distributed Flight Array (DFA), developed autonomous sensing and decision-making capabilities, and conducted research on self-assembly algorithms.