Skip to content

Install Ray 2.55.1: A Practical ML Cluster Setup Guide

Install Ray 2.55.1 for distributed ML workloads. Real commands, the shm-size trap, dashboard port gotchas, and verification steps that actually work.

8 min readIntermediate

There are two ways most people install Ray for ML workloads. One works for about five minutes. The other actually survives a real workload.

Path A: pip install ray. Done. Path B: pip install "ray[default]" inside a clean virtual environment, with shared memory configured if you’re in Docker. Path B is the right answer – and not because Path A is broken, but because Path A silently strips out the dashboard and cluster launcher, which is what you’ll need within ten minutes of starting any non-trivial Ray ML job.

This guide walks through Ray 2.55.1 (the current stable as of early 2026) using the install path that won’t waste your afternoon.

System requirements for Ray ML

Ray is more sensitive to RAM and shared memory than CPU. The official wheels target Linux (x86_64 and aarch64), macOS (including Apple Silicon), and Windows – though per the Ray installation docs, Windows support is still alpha and not recommended for production.

Spec Minimum Recommended
Python 3.9 3.11 or 3.12 (3.13 wheels are beta as of early 2026)
RAM 4 GB 16 GB+ for any real ML workload
/dev/shm (Linux/Docker) 2 GB 30%+ of total RAM
Disk 2 GB free 10 GB+ if pulling Docker images
OS Linux, macOS 12+ Ubuntu 22.04 LTS

The shared-memory number isn’t arbitrary. Ray’s memory management docs explain that Ray reserves 30% of available memory for the object store by default, and allocates it to /dev/shm on Linux. If shm is too small, Ray falls back to /tmp – which still works, but everything gets slower in a way that’s hard to debug after the fact.

Pick the right install command

The ray package on PyPI ships several install “extras.” Most tutorials only show pip install ray, which is the bare-minimum option. That’s almost never what you actually want.

  • pip install ray – Ray Core only. No dashboard, no cluster launcher. Smallest install.
  • pip install "ray[default]" – Adds the dashboard (port 8265) and the cluster launcher CLI. This is the one you want.
  • pip install "ray[data]" / [train] / [tune] / [serve] / [rllib] – Specific ML libraries. Pull only what you need.
  • pip install "ray[all]" – Everything. Heavy. Use only if you genuinely don’t know which libraries you’ll need.

The conda note is easy to miss – and has a large blast radius if you’re reproducing a teammate’s environment: Ray’s conda packages are community-maintained, not maintained by the Ray team (as confirmed by the Ray installation docs). Even inside a conda environment, Ray’s own docs recommend installing via pip from PyPI rather than conda-forge.

Install Ray 2.55.1 step by step

This assumes Linux or macOS. For Windows, use WSL2 unless you have a specific reason not to.

# 1. Create an isolated environment
python3.11 -m venv ray-env
source ray-env/bin/activate

# 2. Upgrade pip first - old pip versions sometimes resolve Ray to a stale release
pip install --upgrade pip

# 3. Install Ray with the dashboard and cluster launcher
pip install -U "ray[default]==2.55.1"

# 4. (Optional) Add specific ML libraries
pip install -U "ray[train,tune,serve]==2.55.1"

Version pinning deserves a moment. The ==2.55.1 isn’t just defensive habit – there’s a documented historical pattern where pip install "ray[all]" resolved to an older release than the latest. Pinning short-circuits that. It also matters because Ray 2.56.0 drops Pydantic V1 support (more on this below), so an unpinned upgrade at the wrong time can break Ray Serve silently. The extra four characters are worth it.

If you prefer Docker, the official image is rayproject/ray on Docker Hub, with tags shaped like {version}-{python}-{platform} – for example rayproject/ray:2.55.1-py311-cpu. Don’t run it bare:

# Wrong - will run, but slow and unstable
docker run -it rayproject/ray:2.55.1-py311-cpu

# Right
docker run -it --shm-size=4g -p 8265:8265 
 rayproject/ray:2.55.1-py311-cpu 
 bash -c "ray start --head --dashboard-host=0.0.0.0 --block"

KubeRay note: If you’re running Ray in Kubernetes via KubeRay, mount an emptyDir volume with medium: Memory at /dev/shm. Without it, every pod silently falls back to /tmp for the object store and training jobs get slower as data grows – with no obvious error message pointing at shm.

First-time configuration and verification

Start a single-node head cluster – the simplest valid config for local ML work:

ray start --head --port=6379 --dashboard-host=0.0.0.0

Port 6379 is the default for the head node’s gRPC server; 8265 is where the dashboard lands. The --dashboard-host=0.0.0.0 flag is required if you want the dashboard reachable from outside localhost – it defaults to binding only locally, which surprises people running Ray on a remote box.

Three verification steps, in order:

  1. ray --version should print ray, version 2.55.1.
  2. ray status should show CPU and memory resources – not “No cluster status” (a real error from Ray’s GitHub tracker when the head wasn’t actually running).
  3. Open http://localhost:8265. One node listed = good install.

Then the one-line sanity check:

python -c "import ray; ray.init(address='auto'); print(ray.cluster_resources())"

You should see something like {'CPU': 8.0, 'memory': ..., 'object_store_memory': ..., 'node:127.0.0.1': 1.0}. Empty dictionary or a hang means the cluster isn’t up.

The four errors that eat the most time

Pulled from real GitHub issues and Ray’s own troubleshooting docs – not invented scenarios.

1. “WARNING: The object store is using /tmp instead of /dev/shm” – Ray didn’t crash. It demoted itself. Docker’s default /dev/shm is 64 MB; Ray needs at least 2 GB or it silently falls back. Fix: add --shm-size=4g to docker run, or mount a memory-backed volume in Kubernetes.

2. Dashboard unreachable, gRPC “failed to connect to all addresses”Ray’s configuration docs are direct about this: --include-dashboard=true with a closed port 8265 on the head node produces repeated StatusCode.UNAVAILABLE warnings. Open 8265 in your firewall, or pass --include-dashboard=false if you don’t need it.

3. Worker exits with UNEXPECTED_SYSTEM_EXIT – The Linux OOM killer sent SIGKILL to a worker process. Ray cannot intercept SIGKILL, so all you see is an exit code. The fix is reducing concurrency: raise num_cpus per task in @ray.remote so fewer tasks compete for memory at once.

4. pip install ray appears to install but no dashboard – Not a bug. Minimal install doesn’t include it. Run pip install -U "ray[default]" instead.

Upgrading and uninstalling cleanly

Stop the cluster before touching the package. Skip this step and pip sometimes upgrades a running Ray, which leaves stale sockets in /tmp/ray that corrupt the next startup.

ray stop --force
pip uninstall -y ray
pip install -U "ray[default]==<new_version>"

Before any upgrade: Ray plans to drop Pydantic V1 support starting in version 2.56.0, per the project’s GitHub release notes (this may change – check releases before upgrading). If your codebase pins pydantic<2, the next minor Ray bump will break Ray Serve. Worth auditing before you upgrade.

Also from the recent changelog: Ray patched CVE-2025-62593, a dashboard vulnerability related to browser header rejection logic. If you’ve been running Ray 2.54 or earlier with the dashboard exposed, this upgrade is a security fix, not optional.

Clean uninstall:

ray stop --force
pip uninstall -y ray
rm -rf /tmp/ray # session logs and stale sockets
deactivate && rm -rf ray-env # if you used a venv

The catch: ray.shutdown() from inside a Python script does not terminate a remote cluster – it only disconnects the client. If you connected via ray.init(address="auto") or ray.init(address="ray://..."), you still need ray stop on the head node to actually kill processes. Assuming otherwise leaves orphaned workers running until the machine restarts.

FAQ

Do I need a GPU to run Ray?

No. Ray Core, Ray Data, and Ray Tune work fine on CPU-only machines. GPUs only matter if you’re running Ray Train or Ray Serve with deep learning models.

Can I run Ray inside a Jupyter notebook on the same machine as the cluster?

A common mistake here: calling plain ray.init() from a notebook. That spins up a brand new local cluster every time the kernel restarts – fragmenting your object store and leaking worker processes between sessions. Start the cluster in a terminal with ray start --head, then in the notebook use ray.init(address="auto"). That reuses the existing cluster instead of spawning a fresh one.

What’s the difference between ray[default] and ray[all]?

ray[default] gives you Ray Core plus the dashboard and cluster launcher – the tools you need to actually run and monitor a cluster. ray[all] layers on every ML library (Train, Tune, Serve, RLlib, Data) and their full dependency trees, which can add hundreds of megabytes. In practice, most teams install ray[default] first, confirm the cluster runs, then add specific extras like ray[train,serve] only for the services they deploy. This keeps dependency conflicts predictable and build times short – especially in CI where ray[all] can meaningfully slow down image builds.

Once ray status shows a healthy node, the Ray getting-started guide has a Ray Train PyTorch example that’s worth running next. It’s the fastest confirmation that your install can do distributed training, not just start.