LLM Inference @ Nvidia

Lucas Liebenwein

Currently, I am a tech lead with the TensorRT-LLM team at NVIDIA, where I co-lead the development of AutoDeploy, a novel compiler-driven approach to high-performance LLM inference.

I joined NVIDIA through the acquisition of OmniML, where I was the founding engineer and chief architect. Prior to that, I was a Ph.D. researcher at MIT CSAIL, advised by Prof. Daniela Rus, where my research was focused on efficient deep learning algorithms and autonomous driving.

Throughout my professional career, I have been passionate about making ML more easily accessible for individuals and organizations alike by bridging the gap from ML research to user-friendly and scalable AI tools and platforms.

Education

PhD, Computer Science @ Massachusetts Institute of Technology | 2018 – 2021

GPA: 5.0/5.0
Thesis: “Efficient Deep Learning: From Theory to Practice”
Advisor: Prof. Daniela Rus
Minor: Math (High-dimensional Probability)

SM, Electrical Engineering & Computer Science @ Massachusetts Institute of Technology | 2016 – 2018

GPA: 5.0/5.0
Thesis: “Contract-Based Safety Verification for Autonomous Driving“
Advisor: Prof. Daniela Rus
Major: Machine Learning

BSc, Mechanical Engineering @ ETH Zurich | 2012 – 2015

GPA: 5.86/6.0 (Valedictorian)
Thesis: “Autonomous Pairing Of Distributed Flight Array Modules” (Advisor: Prof. Raffaello D’Andrea)
Major: Robotics, Control

Experience

Tech Lead @ Nvidia | May 2025 – Now

Working on TensorRT LLM AutoDeploy, a compiler-driven workflow for converting off-the-shelf PyTorch models into inference-optimized graphs. Check out the latest docs: https://nvidia.github.io/TensorRT-LLM/latest/torch/auto_deploy/auto-deploy.html

Engineering Manager @ Nvidia | Feb 2023 – May 2025

Prior to that I worked on algorithmic model optimizations (quantization, pruning, distillation, speculative decoding, …) for LLMs and diffusion models. Now open-sourced as Model Optimizer: https://github.com/NVIDIA/TensorRT-Model-Optimizer

Chief Architect & Founding Engineer @ OmniML (acquired) | Oct 2021 – Feb 2023

We built a scalable, accessible machine learning platform by redefining how we create and deploy deep neural networks in production. Our product is based on years of research into optimizing models for efficient deployment across a wide range of systems.

In my role at OmniML, I led the design and implementation of Omnimizer, our scalable platform for efficient ML training and deployment, while exploring state-of-the-art model optimization research for future product iterations.

Machine Learning Consultant @ Neural Magic | Jul 2021 – Oct 2021

Doctoral Researcher @ MIT Computer Science and Artificial Intelligence Lab (CSAIL) | Sep 2016 – Aug 2021

My research focused on optimizing deep neural networks for resource-constrained applications, such as robotics and cloud computing. I developed novel techniques in model compression and pruning that improve the speed-accuracy trade-off and provide theoretical insights into network design and training.

Before that, I worked on verification algorithms for safe autonomous driving and contributed to our AV research platforms (self-driving cars and wheelchairs) for testing and validation.

Visiting Researcher @ TU Vienna | Jul 2020 – May 2021

Autopilot Software Intern @ Tesla | Jun 2019 – Sep 2019

Visiting Researcher @ Singapore-MIT Alliance for Research & Technology Centre | Jan 2017 – Feb 2017

Autonomous Car Intern @ nuTonomy | Dec 2015 – Feb 2016

I designed, developed, and implemented an automated velocity controller using a novel combination of traditional control techniques and machine learning.

Undergraduate Researcher @ ETH Zurich | Sep 2014 – Oct 2015

Under the supervision of Prof. Raffaello D’Andrea, I co-led the implementation of a real-time operating system for the Distributed Flight Array (DFA), developed autonomous sensing and decision-making capabilities, and conducted research on self-assembly algorithms.

Curriculum Vitae

Download Here