Build, Track, and Orchestrate AI Models with MLFlow & Kubeflow

Turn Jupyter notebooks into production-grade MLOps using MLflow for experiment tracking and model packaging and Kubeflow Pipelines on Kubernetes for real AIOps use cases.

Nourhan Mohamed

DevOps Lead | Cloud Native Enthusiast | Golden Kubestronaut

Fill this form to get a notification when course is released.

Enroll for Free

Subscribe Now

Enroll Now

Already Subscribed? Log in

Enroll in this Course

Start Course

Lessons

Challenges

Topics

What you’ll learn

Our students work at..

Description

This course takes you from ad-hoc Jupyter notebooks to production-style MLOps workflows tailored for AIOps use cases such as anomaly detection and incident prediction. You will learn how to make models reproducible, trackable, and deployable at scale using MLflow for experiment tracking and model packaging, and Kubeflow Pipelines for end-to-end orchestration on Kubernetes. Through hands-on labs, you will convert exploratory code into parameterized scripts, track experiments with MLflow and MinIO, deploy inference services to Kubernetes, and automate train → register → validate → deploy pipelines for AIOps workloads.

Prerequisites

Python Basics
Kubernetes Basics
Machine Learning Fundamentals

Course Highlights

Why MLOps Is Critical for AIOpsThis module explains the shift from exploratory notebook-based experimentation to production-grade ML workflows for AIOps, showing how MLOps practices improve reliability, observability, and governance for models used in incident management and anomaly detection.
- Lab 1.1 – From Notebook to Production
  - Convert an anomaly detection Jupyter notebook into a train.py script.
  - Introduce CLI arguments, random seeds, and reproducibility best practices.
  - Run multiple configurations manually to prepare for later automation.
Experiment Tracking & Model Packaging with MLflowThis module introduces MLflow Tracking for capturing parameters, metrics, and artifacts across experiments, and shows how to configure MinIO as an S3-compatible artifact store integrated with an MLflow Tracking Server for reproducible AIOps workflows.
- Lab 2.1 – Setting Up MLflow & MinIO
  - Deploy MinIO and MLflow Tracking Server.
  - Verify UI access and connectivity between MLflow and MinIO as an artifact backend.
- Lab 2.2 – Logging Parameters, Metrics, and Artifacts
  - Instrument train.py with MLflow Tracking API calls.
  - Log parameters, metrics, and model artifacts, then explore runs in the MLflow UI.
- Lab 2.3 – Packaging Models for Reproducibility
  - Create MLproject and conda.yaml files for reproducible runs.
  - Define entry points and re-run experiments using the MLflow CLI.
Deploying & Serving AIOps ModelsThis module focuses on moving from trained models to live inference endpoints suitable for real-time anomaly detection, covering different ways to serve MLflow models and expose them via REST APIs for consumption by AIOps systems.
- Lab 3.1 – Serving Models with MLflow
  - Serve the trained model locally from MLflow runs.
  - Register the model in the MLflow Model Registry and serve it from there.
  - Test predictions using curl and Python requests.
- Lab 3.2 – Containerizing and Deploying to Kubernetes
  - Package the serving application into a Docker image.
  - Deploy the model-serving service to Kubernetes using Deployment and Service resources.
  - Verify access to predictions through a REST API endpoint.
Orchestrating AIOps Pipelines with KubeflowThis module teaches how to automate the full ML lifecycle — from training to validation to deployment — using Kubeflow Pipelines, and how to connect Kubeflow components with MLflow to build traceable, production-style AIOps pipelines.
- Lab 4.1 – Exploring Kubeflow Pipelines
  - Access the Kubeflow UI and inspect pre-built sample pipelines.
  - Run sample pipelines and observe execution graphs and artifacts.
- Lab 4.2 – Building the Training & Registration Components
  - Create a train component that uses the existing train.py script.
  - Create a register component that pushes the trained model to the MLflow Model Registry.
  - Run a two-step Kubeflow pipeline to train and log a model to MLflow.
- Lab 4.3 – Building the Full Train → Validate → Deploy Pipeline
  - Add a validate component that checks whether the model’s anomaly rate is within acceptable limits.
  - Add a deploy component that consumes the trained model and deploys it to production.
  - Compile, upload, and trigger a four-step Kubeflow pipeline to automate model training and serving end to end.

About the instructor

Nourhan Mohamed is a DevOps Instructor and Cloud Native Enthusiast at KodeKloud, specializing in Kubernetes, Docker, CI/CD, and cloud-native technologies. As a Golden Kubestronaut, she focuses on container orchestration, automation, and troubleshooting. At KodeKloud, she designs hands-on DevOps labs that bridge theory with real-world application, empowering learners to build scalable and resilient systems.

No items found.