Two ways to install Great Expectations for ML data quality checks. Only one of them won’t waste your afternoon.
Path A:pip install great_expectations. Fast. Works. Breaks the minute you connect to Snowflake, Postgres, or S3.
Path B:pip install 'great_expectations[snowflake,s3]' (or whatever you actually use). Installs the matching driver extras up front. No ImportError surprises at 11pm.
Path B wins. The PyPI package (as of mid-2025) lists extras for redshift, excel, vertica, spark, hive, spark-connect, databricks, trino, cloud, postgresql, dremio, azure, singlestore, mysql, snowflake, teradata, sql-server, athena, clickhouse, bigquery, fabric, gcp, and s3 – pick yours now instead of patching later. This guide installs GX Core 1.17.0 the right way.
System requirements
GX is a Python library, not a daemon or a server. You’re not standing up infrastructure – you’re adding a validation layer to a pipeline that already runs. That distinction matters when you’re deciding whether to install it in an existing project venv or spin up a dedicated one.
| Requirement | Details |
|---|---|
| Python | 3.10-3.13 (as of GX 1.17.0); experimental 3.14 support available |
| OS | macOS or Linux – Windows officially unsupported (see below) |
According to the GX Core GitHub readme, supported versions are Python 3.10 through 3.13 (as of 1.17.0). Python 3.14 can be force-enabled via the GX_PYTHON_EXPERIMENTAL environment variable – that’s only documented in the readme, not in the main docs site.
Windows users: the official docs state that support for the open source Python version is currently unavailable and that you might experience errors or performance issues. That’s a polite way of saying don’t bother. Use WSL2 – the Linux subsystem works fine.
Download source and install
PyPI is the right source. One thing to note: the GitHub releases page lags behind PyPI – as of April 2025 the latest tagged release there is 1.16.1, while PyPI already has 1.17.0. Install from PyPI, not from GitHub tarballs.
Create a virtualenv first. GX pulls in a significant number of transitive dependencies and you don’t want that in your system Python.
python3 -m venv gx-env
source gx-env/bin/activate
pip install --upgrade pip
pip install 'great_expectations[postgresql,s3]'
Swap the extras for what you’ll actually validate against. For a pure pandas/CSV workflow, plain pip install great_expectations is fine. For a data warehouse setup, add the driver now – retrofitting extras after the fact works but often drags in version conflicts.
Pro tip: Pin the version in your
requirements.txtasgreat_expectations==1.17.0. The 0.x → 1.x jump broke a lot of existing projects, and the ecosystem has learned to pin hard.
How heavy is the install, really? It depends on which extras you pull in. A plain pip install great_expectations lands a meaningful dependency tree; add [spark] and you’re also pulling in PySpark. There’s no official published install size in the docs – so profile your own venv with pip list and du -sh gx-env/lib after installing. Don’t trust the numbers floating around in blog posts from 2022.
First-time configuration with GX Core
Here’s where older tutorials fail you. The whole great_expectations init CLI flow that dominates Google results? Gone in 1.0 (released August 2024).
The current workflow is Python-first. Turns out the new entry point is just two lines: import great_expectations as gx then context = gx.get_context(), per the GX Core docs. That single call either loads an existing project folder or spins up an ephemeral in-memory context.
import great_expectations as gx
# Ephemeral context - lives in memory, nothing written to disk
context = gx.get_context(mode="ephemeral")
# File-backed context - creates ./gx/ project directory on first run
context = gx.get_context(mode="file", project_root_dir="./my_gx_project")
print(context)
# <great_expectations.data_context.data_context.EphemeralDataContext object ...>
For ML pipelines I default to mode="file". It writes a gx/ folder with expectations/, checkpoints/, validation_definitions/, and uncommitted/. Commit the first three, gitignore the last one.
The V1 checkpoint layout is also worth flagging: configs are JSON now, living in gx/checkpoints/<CHECKPOINT_NAME> and gx/validation_definitions/<VALIDATION_DEFINITION_NAME>, per the V0-to-V1 migration guide. If you’re reading a 2023-era tutorial that talks about YAML checkpoints, close that tab.
Verify the install
One-liner sanity check:
python -c "import great_expectations as gx; print(gx.__version__); print(gx.get_context())"
Expected output: 1.17.0 followed by an EphemeralDataContext repr. If you get 1.17.0 but no context line, your Python found the package but something upstream (usually jsonschema or marshmallow) pulled an incompatible pin.
A tighter end-to-end smoke test – validates one row against one expectation:
import pandas as pd
import great_expectations as gx
from great_expectations.expectations import ExpectColumnValuesToNotBeNull
context = gx.get_context()
df = pd.DataFrame({"user_id": [1, 2, 3, None]})
ds = context.data_sources.add_pandas("smoke")
asset = ds.add_dataframe_asset("df")
batch_def = asset.add_batch_definition_whole_dataframe("b")
batch = batch_def.get_batch(batch_parameters={"dataframe": df})
result = batch.validate(ExpectColumnValuesToNotBeNull(column="user_id"))
print(result.success) # False - one null in user_id
If that prints False, your install is healthy and the API contract for 1.x is wired up correctly. A False result here is success. Which raises a question worth sitting with: if your real pipeline runs this same check against production data and also prints False – is your install broken, or is your data? That ambiguity is exactly why a known-bad test DataFrame is worth keeping around.
Common errors and fixes
Three install-stage failures that keep showing up in the community tracker.
'great_expectations' is not recognized as an internal or external command– After a successful pip install,great_expectations --versionfails on both macOS and Windows (tracked in GitHub issue #4990). The CLI entry point isn’t on PATH – and on 1.x, the CLI is gutted anyway. Fix: skip the CLI entirely. Usepython -c "import great_expectations; print(great_expectations.__version__)".The command datasource does not exist– Reported on GX 0.18.0:great_expectations datasource newreturns that error even though ‘datasource’ appears in the help text. The CLI signatures shifted across 0.15 → 0.18 → 1.x. On 1.17, stop using the CLI for datasource creation – do it in Python viacontext.data_sources.add_pandas(...).- Slow pip resolve / pip hangs for minutes – GX pulls heavy transitive deps and the pip resolver can thrash. Fix: upgrade pip before installing, and pass extras explicitly (
[postgresql]) rather than letting the resolver guess.
Upgrading from 0.x and uninstalling
Already have a pre-1.0 project? The risk isn’t pip – it’s what happens after. Old YAML checkpoints, BaseDataContext, RuntimeBatchRequest, and the CLI init flow all changed shape in 1.0. Blindly running pip install -U will get you a working package that can’t read any of your existing config files.
Follow the official V0-to-V1 migration guide (linked above) before upgrading. At minimum: freeze your 0.x expectation suites, install 1.17 in a fresh venv, and re-create suites using the Python API. Most 0.x tutorials – including half the Stack Overflow answers – are now wrong code.
To remove cleanly:
pip uninstall great_expectations
rm -rf ./gx # the project directory, if you used mode="file"
deactivate && rm -rf gx-env # nuke the venv
GX doesn’t install system services, cron jobs, or daemons – it’s pure Python, so uninstall is pip plus directory cleanup.
FAQ
Is the old `great_expectations init` CLI gone for good?
Yes. gx.get_context() replaced it in 1.0.
Should I install GX Core locally or use GX Cloud?
If your pipeline lives in Airflow, dbt, or a self-hosted orchestrator and you want expectation suites in git alongside your DAGs – GX Core (the pip package this guide installs) is the right call. GX Cloud is a separate managed product that requires an account, not a pip flag. It exists for teams that want a hosted UI where non-engineers author expectations and don’t want to maintain stores or Data Docs hosting themselves. For most ML engineering workflows, Core is the answer.
Can I install GX on an Apple Silicon Mac without issues?
For Python 3.11+, yes – native arm64 wheels for common dependencies like pyarrow have been available since early 2025. If you hit a wheel-build error, it’s almost always a sub-dependency, not GX itself. Install the offending package (usually pyarrow) separately first, then retry.
Next: write your first expectation suite. Pick one table from your ML training data, add three expectations (ExpectColumnValuesToNotBeNull, ExpectColumnValuesToBeBetween, ExpectColumnValuesToBeInSet), wire them into a Checkpoint, and run it against last week’s data versus this week’s. That’s the shortest path to catching a real data-quality regression before it hits model training.