Deploying a Model for Inference at Production Scale (NDMIPS-OD)

At scale machine learning models can interact with up to millions of users in a day. As usage grows, the cost of both money and engineering time can prevent models from reaching their full potential. It’s these types of challenges that inspired creation of Machine Learning Operations (MLOps).

Learning Objectives

Practice Machine Learning Operations by:

  • Deploying neural networks from a variety of frameworks onto a live Triton Server
  • Measuring GPU usage and other metrics with Prometheus
  • Sending asynchronous requests to maximize throughput

Upon completion, learners will be able to deploy their own machine learning models on a GPU server


  • Familiarity with at least one Machine Learning framework such as:
    • PyTorch
    • TensorFlow *
    • ONNX
    • TensorRT

* Covered in Getting Started with Deep Learning

  • Familiarity with Docker recommended but not required.

Tools, libraries, frameworks used: NVIDIA Triton