Skip to content

Install Great Expectations 1.17: ML Data Quality Guide

Deploy Great Expectations 1.17.0 for ML data quality checks - real commands, the Windows trap, and the breaking change from the pre-1.0 CLI world.

8 min readIntermediate

Two ways to install Great Expectations for ML data quality checks. Only one of them won’t waste your afternoon.

Path A:pip install great_expectations. Fast. Works. Breaks the minute you connect to Snowflake, Postgres, or S3.

Path B:pip install 'great_expectations[snowflake,s3]' (or whatever you actually use). Installs the matching driver extras up front. No ImportError surprises at 11pm.

Path B wins. The PyPI package (as of mid-2025) lists extras for redshift, excel, vertica, spark, hive, spark-connect, databricks, trino, cloud, postgresql, dremio, azure, singlestore, mysql, snowflake, teradata, sql-server, athena, clickhouse, bigquery, fabric, gcp, and s3 – pick yours now instead of patching later. This guide installs GX Core 1.17.0 the right way.

System requirements

GX is a Python library, not a daemon or a server. You’re not standing up infrastructure – you’re adding a validation layer to a pipeline that already runs. That distinction matters when you’re deciding whether to install it in an existing project venv or spin up a dedicated one.

Requirement Details
Python 3.10-3.13 (as of GX 1.17.0); experimental 3.14 support available
OS macOS or Linux – Windows officially unsupported (see below)

According to the GX Core GitHub readme, supported versions are Python 3.10 through 3.13 (as of 1.17.0). Python 3.14 can be force-enabled via the GX_PYTHON_EXPERIMENTAL environment variable – that’s only documented in the readme, not in the main docs site.

Windows users: the official docs state that support for the open source Python version is currently unavailable and that you might experience errors or performance issues. That’s a polite way of saying don’t bother. Use WSL2 – the Linux subsystem works fine.

Download source and install

PyPI is the right source. One thing to note: the GitHub releases page lags behind PyPI – as of April 2025 the latest tagged release there is 1.16.1, while PyPI already has 1.17.0. Install from PyPI, not from GitHub tarballs.

Create a virtualenv first. GX pulls in a significant number of transitive dependencies and you don’t want that in your system Python.

python3 -m venv gx-env
source gx-env/bin/activate
pip install --upgrade pip
pip install 'great_expectations[postgresql,s3]'

Swap the extras for what you’ll actually validate against. For a pure pandas/CSV workflow, plain pip install great_expectations is fine. For a data warehouse setup, add the driver now – retrofitting extras after the fact works but often drags in version conflicts.

Pro tip: Pin the version in your requirements.txt as great_expectations==1.17.0. The 0.x → 1.x jump broke a lot of existing projects, and the ecosystem has learned to pin hard.

How heavy is the install, really? It depends on which extras you pull in. A plain pip install great_expectations lands a meaningful dependency tree; add [spark] and you’re also pulling in PySpark. There’s no official published install size in the docs – so profile your own venv with pip list and du -sh gx-env/lib after installing. Don’t trust the numbers floating around in blog posts from 2022.

First-time configuration with GX Core

Here’s where older tutorials fail you. The whole great_expectations init CLI flow that dominates Google results? Gone in 1.0 (released August 2024).

The current workflow is Python-first. Turns out the new entry point is just two lines: import great_expectations as gx then context = gx.get_context(), per the GX Core docs. That single call either loads an existing project folder or spins up an ephemeral in-memory context.

import great_expectations as gx

# Ephemeral context - lives in memory, nothing written to disk
context = gx.get_context(mode="ephemeral")

# File-backed context - creates ./gx/ project directory on first run
context = gx.get_context(mode="file", project_root_dir="./my_gx_project")

print(context)
# <great_expectations.data_context.data_context.EphemeralDataContext object ...>

For ML pipelines I default to mode="file". It writes a gx/ folder with expectations/, checkpoints/, validation_definitions/, and uncommitted/. Commit the first three, gitignore the last one.

The V1 checkpoint layout is also worth flagging: configs are JSON now, living in gx/checkpoints/<CHECKPOINT_NAME> and gx/validation_definitions/<VALIDATION_DEFINITION_NAME>, per the V0-to-V1 migration guide. If you’re reading a 2023-era tutorial that talks about YAML checkpoints, close that tab.

Verify the install

One-liner sanity check:

python -c "import great_expectations as gx; print(gx.__version__); print(gx.get_context())"

Expected output: 1.17.0 followed by an EphemeralDataContext repr. If you get 1.17.0 but no context line, your Python found the package but something upstream (usually jsonschema or marshmallow) pulled an incompatible pin.

A tighter end-to-end smoke test – validates one row against one expectation:

import pandas as pd
import great_expectations as gx
from great_expectations.expectations import ExpectColumnValuesToNotBeNull

context = gx.get_context()
df = pd.DataFrame({"user_id": [1, 2, 3, None]})

ds = context.data_sources.add_pandas("smoke")
asset = ds.add_dataframe_asset("df")
batch_def = asset.add_batch_definition_whole_dataframe("b")
batch = batch_def.get_batch(batch_parameters={"dataframe": df})

result = batch.validate(ExpectColumnValuesToNotBeNull(column="user_id"))
print(result.success) # False - one null in user_id

If that prints False, your install is healthy and the API contract for 1.x is wired up correctly. A False result here is success. Which raises a question worth sitting with: if your real pipeline runs this same check against production data and also prints False – is your install broken, or is your data? That ambiguity is exactly why a known-bad test DataFrame is worth keeping around.

Common errors and fixes

Three install-stage failures that keep showing up in the community tracker.

  • 'great_expectations' is not recognized as an internal or external command – After a successful pip install, great_expectations --version fails on both macOS and Windows (tracked in GitHub issue #4990). The CLI entry point isn’t on PATH – and on 1.x, the CLI is gutted anyway. Fix: skip the CLI entirely. Use python -c "import great_expectations; print(great_expectations.__version__)".
  • The command datasource does not exist – Reported on GX 0.18.0: great_expectations datasource new returns that error even though ‘datasource’ appears in the help text. The CLI signatures shifted across 0.15 → 0.18 → 1.x. On 1.17, stop using the CLI for datasource creation – do it in Python via context.data_sources.add_pandas(...).
  • Slow pip resolve / pip hangs for minutes – GX pulls heavy transitive deps and the pip resolver can thrash. Fix: upgrade pip before installing, and pass extras explicitly ([postgresql]) rather than letting the resolver guess.

Upgrading from 0.x and uninstalling

Already have a pre-1.0 project? The risk isn’t pip – it’s what happens after. Old YAML checkpoints, BaseDataContext, RuntimeBatchRequest, and the CLI init flow all changed shape in 1.0. Blindly running pip install -U will get you a working package that can’t read any of your existing config files.

Follow the official V0-to-V1 migration guide (linked above) before upgrading. At minimum: freeze your 0.x expectation suites, install 1.17 in a fresh venv, and re-create suites using the Python API. Most 0.x tutorials – including half the Stack Overflow answers – are now wrong code.

To remove cleanly:

pip uninstall great_expectations
rm -rf ./gx # the project directory, if you used mode="file"
deactivate && rm -rf gx-env # nuke the venv

GX doesn’t install system services, cron jobs, or daemons – it’s pure Python, so uninstall is pip plus directory cleanup.

FAQ

Is the old `great_expectations init` CLI gone for good?

Yes. gx.get_context() replaced it in 1.0.

Should I install GX Core locally or use GX Cloud?

If your pipeline lives in Airflow, dbt, or a self-hosted orchestrator and you want expectation suites in git alongside your DAGs – GX Core (the pip package this guide installs) is the right call. GX Cloud is a separate managed product that requires an account, not a pip flag. It exists for teams that want a hosted UI where non-engineers author expectations and don’t want to maintain stores or Data Docs hosting themselves. For most ML engineering workflows, Core is the answer.

Can I install GX on an Apple Silicon Mac without issues?

For Python 3.11+, yes – native arm64 wheels for common dependencies like pyarrow have been available since early 2025. If you hit a wheel-build error, it’s almost always a sub-dependency, not GX itself. Install the offending package (usually pyarrow) separately first, then retry.

Next: write your first expectation suite. Pick one table from your ML training data, add three expectations (ExpectColumnValuesToNotBeNull, ExpectColumnValuesToBeBetween, ExpectColumnValuesToBeInSet), wire them into a Checkpoint, and run it against last week’s data versus this week’s. That’s the shortest path to catching a real data-quality regression before it hits model training.