How to Deploy MLflow 3.12 for Experiment Tracking ML

Install MLflow 3.12.0 for experiment tracking ML - backend config, the uvicorn timeout fix, and the gotchas every other tutorial skips.

Morgan Hayes2026-05-157 min readIntermediate

End state first: a running MLflow 3.12 tracking server on a Linux VM, reachable from any teammate’s laptop, with a Postgres backend storing experiment metadata and an S3-style bucket holding artifacts. Anyone on the team can call mlflow.set_tracking_uri() and start logging runs. That’s what this experiment tracking ML deployment looks like in production, and we’re going to walk backwards through the steps to get there.

MLflow is the open-source piece of the MLOps stack that records runs, parameters, metrics, and model artifacts so you can compare experiments later. MLflow 3 reorganized the data model around a new LoggedModel entity – a first-class citizen that moves beyond the traditional run-centric approach, enabling better organization and comparison of GenAI agents, deep learning checkpoints, and model variants across experiments. If you’ve used 2.x, the CLI feels identical; the schema underneath does not.

System requirements

MLflow is a Python package plus a small web server. The hardware floor is genuinely low – what kills people is the storage backend, not CPU.

	Minimum	Recommended (team server)
Python	3.10	3.11 or 3.12
OS	Linux / macOS / Windows	Ubuntu 22.04 LTS
CPU	1 vCPU	2-4 vCPU
RAM	1 GB	4 GB+
Disk	2 GB free	50 GB+ (artifact growth)
Backend store	SQLite file	Postgres / MySQL

MLflow requires Python 3.10 or newer. Older 3.8 / 3.9 environments will fail at install. If you’re stuck on 3.9, pin MLflow to 2.x – but you’ll miss the GenAI features.

Install MLflow 3.12 (the recommended path)

As of May 2026, MLflow 3.12.0 is the latest stable release on PyPI. Use a virtualenv. Always.

python3 -m venv mlflow-env
source mlflow-env/bin/activate
pip install --upgrade pip
pip install mlflow==3.12.0

Three install variants exist and they are not interchangeable:

pip install mlflow – full package, the default choice
pip install mlflow[extras] – adds scikit-learn, boto3, mysqlclient, and friends
pip install mlflow-skinny – minimal dependencies for client-side logging in containers

The skinny variant is meant for production training jobs where you only need to log to a remote server, not host one. PyPI explicitly warns that co-installing the skinny package with the full MLflow package may cause version mismatch issues – so pick one per environment and stick with it.

First-time configuration

The local-only mode (file-based tracking in ./mlruns) is fine for a laptop but useless for a team. Here’s the real config you want.

Start a tracking server with a SQLite backend and a local artifact directory:

mkdir -p ~/mlflow-data/artifacts
mlflow server 
 --backend-store-uri sqlite:///mlflow-data/mlflow.db 
 --default-artifact-root file:///home/$USER/mlflow-data/artifacts 
 --host 0.0.0.0 
 --port 5000

That --host 0.0.0.0 is non-negotiable for a shared server. The MLflow server listens on http://localhost:5000 by default and only accepts connections from the local machine – to let it accept connections from other machines, you need to pass –host 0.0.0.0 to listen on all network interfaces. Every guide repeats this but the failure mode is silent: teammates see a connection refused with no log entry on the server side, because the OS never routed the packet.

For real production, swap SQLite for Postgres: --backend-store-uri postgresql://user:pass@host:5432/mlflow. SQLite locks under concurrent writes; you’ll feel it around 5-10 simultaneous users.

Verify it works

From a separate terminal (still inside the venv):

mlflow --version
# Should print: mlflow, version 3.12.0

# Log a tiny test run
python -c "
import mlflow
mlflow.set_tracking_uri('http://localhost:5000')
mlflow.set_experiment('smoke-test')
with mlflow.start_run():
 mlflow.log_param('lr', 0.01)
 mlflow.log_metric('acc', 0.97)
print('OK')
"

Open http://your-server-ip:5000 in a browser. The smoke-test experiment should appear with one run logged. If it doesn’t, jump to the next section.

The uvicorn timeout trap nobody mentions

This is the gotcha that breaks MLflow deployments at scale, and almost no tutorial covers the 3.x version of the fix.

Pro tip: If you’re upgrading from MLflow 2.x and your old GUNICORN_CMD_ARGS="--timeout 600" trick stopped working – it’s not broken. MLflow 3.x runs the server on uvicorn, not gunicorn, by default. Use --uvicorn-opts "--timeout-keep-alive=120" instead.

When the server returns WARNING: Request timeout exceeded or ERROR: Exception in ASGI application on large artifact uploads or long-running LLM evaluations, the official fix is to start the server like this:

mlflow server --uvicorn-opts "--timeout-keep-alive=120" 
 --backend-store-uri sqlite:///mlflow-data/mlflow.db 
 --host 0.0.0.0

For users still using gunicorn via –gunicorn-opts, the equivalent command would be different, so check which worker class your install runs before pasting random Stack Overflow answers. The MLflow Tracking Server docs have the current syntax.

Common install errors and fixes

Real errors from GitHub issues, in rough order of frequency:

[CRITICAL] WORKER TIMEOUT with exit code 137 – the container got OOM-killed. Reported in issue #14465 on a Kubernetes deploy with MLflow 2.18 and Python 3.10. Fix: raise the pod memory limit to at least 2 GB and tune liveness probe timeouts. Exit 137 = SIGKILL from the kernel, not an MLflow bug.
llvmlite requires python <3.11 – old issue from the 2.x era when numba pinned llvmlite. On MLflow 3.x this is gone, but if you’re installing into a frozen Python 3.10 env with pinned legacy deps, you may still hit it. Fix: upgrade Python or use mlflow-skinny which doesn’t pull numba.
sqlite3.OperationalError: database is locked – too many concurrent writers on the SQLite backend. Switch to Postgres. There is no other real fix.
MaxRetryError: HTTPSConnectionPool ... SSLEOFError – TLS handshake failing between client and tracking server. Almost always a proxy or load balancer dropping idle connections. Set MLFLOW_HTTP_REQUEST_TIMEOUT=300 on the client and check your reverse proxy’s keep-alive.

Upgrading from MLflow 2.x

The schema changed in 3.0. Don’t just pip install --upgrade on a shared server with months of run history – you’ll get an alembic migration error mid-startup.

Stop the running server
Back up the backend store: pg_dump mlflow > mlflow_backup.sql (or copy the SQLite file)
Snapshot the artifact directory
Upgrade: pip install --upgrade mlflow==3.12.0
Run mlflow db upgrade <backend-uri> to apply migrations
Start the server and verify a known-good experiment still loads

The new multi-workspace support in MLflow 3.10 lets you organize experiments, models, and prompts with coarser granularity and logically isolate them in a single tracking server. Enable it by adding --enable-workspaces to the server command – useful if multiple teams share one MLflow instance and you’re tired of seeing everyone’s runs in one list.

Uninstall and cleanup

If you need to remove MLflow entirely:

# Stop the server (Ctrl+C or systemctl stop)
pip uninstall mlflow
rm -rf ~/mlflow-data # backend + artifacts
rm -rf ./mlruns # any local mlruns folders
deactivate && rm -rf mlflow-env

That removes the package and the data. If you used Postgres, drop the database manually. The Kubernetes community Helm chart (v1.7.1) has its own helm uninstall path – don’t try to clean up Helm-managed resources by hand.

FAQ

Should I use mlflow ui or mlflow server?

mlflow ui is for local solo work – it points at ./mlruns in the current directory. mlflow server is for everything else.

Why does my client time out when uploading a 2 GB model artifact?

Two layers can drop the request: the MLflow server’s own timeout (fixed with --uvicorn-opts as shown above) and whatever reverse proxy sits in front of it. If you’re running behind nginx or an ALB, also raise proxy_read_timeout there. A common pattern: bump both to 300 seconds, then profile whether the bottleneck is network or disk I/O on the artifact store.

Can I run MLflow without a tracking server at all?

Yes – set MLFLOW_TRACKING_URI to a local file: path or omit it entirely, and MLflow writes runs to ./mlruns/ on disk. Works fine for single-developer projects and notebooks. The moment a second person needs to see your runs, deploy the server.

Next step: harden the deploy. Put the server behind nginx with TLS, point the artifact root at S3 (or MinIO if you’re on-prem), and add --enable-workspaces before the second team joins. The official docs on backend stores have the S3 credentials format.