Lucas Liebenwein

LLM Inference @ Nvidia

About Me

Currently, I am a tech lead with the TensorRT-LLM team at NVIDIA, where I co-lead the development of AutoDeploy, a novel compiler-driven approach to high-performance LLM inference.

I joined NVIDIA through the acquisition of OmniML, where I was the founding engineer and chief architect. Prior to that, I was a Ph.D. researcher at MIT CSAIL, advised by Prof. Daniela Rus, where my research was focused on efficient deep learning algorithms and autonomous driving.

Throughout my professional career, I have been passionate about making ML more easily accessible for individuals and organizations alike by bridging the gap from ML research to user-friendly and scalable AI tools and platforms.

Recent Experience

Tech Lead @ Nvidia | Feb 2023 – Now

Working on TensorRT LLM AutoDeploy, a compiler-driven workflow for converting off-the-shelf PyTorch models into inference-optimized graphs. Check out the latest docs: https://nvidia.github.io/TensorRT-LLM/latest/torch/auto_deploy/auto-deploy.html

Prior to that I worked on algorithmic model optimizations (quantization, pruning, distillation, speculative decoding, …) for LLMs and diffusion models. Now open-sourced as ModelOptimizer: https://github.com/NVIDIA/Model-Optimizer

We built a scalable, accessible machine learning platform by redefining how we create and deploy deep neural networks in production. Our product is based on years of research into optimizing models for efficient deployment across a wide range of systems.

In my role at OmniML, I led the design and implementation of Omnimizer, our scalable platform for efficient ML training and deployment, while exploring state-of-the-art model optimization research for future product iterations.

Education

PhD & SM, Computer Science @ MIT | Sep 2016 – Aug 2021

My research focused on optimizing deep neural networks for resource-constrained applications, such as robotics and cloud computing. I developed novel techniques in model compression and pruning that improve the speed-accuracy trade-off and provide theoretical insights into network design and training.

Before that, I worked on verification algorithms for safe autonomous driving and contributed to our AV research platforms (self-driving cars and wheelchairs) for testing and validation.