Best AI Tools for Time Series Data Forecasting (2026)

A working engineer's guide to the best AI tools for time series data forecasting - foundation models, classical libraries, and when each one actually wins.

Riley Brooks2026-05-208 min readAdvanced

Picture this: you’ve got 18 months of hourly sales data across 4,000 SKUs, three known promotional events coming up next quarter, and a Monday deadline. You want forecasts with confidence intervals, not a single point estimate. Six years ago, this meant fitting SARIMA per series, hand-engineering features, and praying. In 2026, you can get a probabilistic forecast in roughly fifteen lines of Python – if you pick the right tool. That’s what this guide is about.

The honest answer to “what are the best AI tools for time series data forecasting?” depends on one question almost nobody leads with: do you have exogenous variables? Everything downstream of that splits the field in half.

The scenario that drives the tool choice

Let’s stay with the 4,000-SKU example. You have:

Multiple series – 4,000 of them, often called panel data
Known future events – promotions you’ve already scheduled (exogenous, known-in-advance)
Probabilistic output needed – inventory planning cares about the 90th percentile, not the mean
No time to fine-tune – Monday deadline, remember

That fourth point is what makes foundation models interesting. They’re pretrained on huge corpora of time series, so they can zero-shot forecast a series they’ve never seen. The first three points narrow the candidate list dramatically.

Foundation models in 2025-2026: four names, very different tools

The wave that hit in 2024-2025 is what most tutorials are still catching up on. Four names matter most: TimeGPT (Nixtla), Chronos (Amazon), Moirai (Salesforce), and TimesFM (Google). They all do zero-shot forecasting on unseen series. They are not interchangeable.

Here’s the table I wish I’d had when I started:

Model	Open source?	Univariate	Multivariate	Exogenous vars
TimeGPT-1	No (API only)	Yes	Limited	Yes
Chronos-Bolt	Yes	Yes	No	No
Chronos-2	Yes	Yes	Yes	Yes
Moirai	Yes	Yes	Yes	Yes
TimesFM	Yes	Yes	No	No

That single column “exogenous vars” eliminates half the field for our 4,000-SKU scenario the moment you remember the promotions. Per the Manning livebook on foundation models, Moirai was specifically built as one of the first open-source TSFMs to support exogenous features out of the box.

What changed in late 2025 (and most tutorials missed)

For most of 2024, Chronos was the open-source darling – but it was strictly univariate. If you wanted multivariate plus open weights, your only real option was Moirai. Then in October 2025, Amazon released Chronos-2, which changed that. According to Amazon’s official announcement, Chronos-2 offers zero-shot support for univariate, multivariate, and covariate-informed forecasting tasks, and delivers state-of-the-art performance across benchmarks including GIFT-Eval, with the largest improvements specifically on tasks that include exogenous features.

The numbers behind it are loud. On the GIFT-Eval benchmark, Chronos-2 ranks first among pretrained models, with a win rate of over 90% against its predecessor Chronos-Bolt in head-to-head comparisons. And the older Chronos-Bolt itself wasn’t slow: models are up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size – benchmarked on 1,024 time series with a context length of 512 and a prediction horizon of 64 steps.

The practical translation: if you’re starting a project today, Chronos-2 is the default starting point for open-source multivariate work. If you want closed-source API simplicity, TimeGPT. If you want fine-grained control over patch sizes, Moirai. The rest is preference.

Setting up Chronos-2 the actual way

The 4,000-SKU example, end to end. First, your data needs to be a continuous, date-indexed structure without gaps or duplicate timestamps – that’s the part where most people lose half a day.

from chronos import ChronosPipeline
import pandas as pd
import torch

# Load your panel data: columns = [item_id, timestamp, target, promo_flag]
df = pd.read_parquet("sku_sales.parquet")

pipeline = ChronosPipeline.from_pretrained(
 "amazon/chronos-2",
 device_map="cuda",
 torch_dtype=torch.bfloat16,
)

# Predict 168 hours (one week) ahead with the promo flag as covariate
forecast = pipeline.predict(
 context=df,
 prediction_length=168,
 covariates=["promo_flag"],
)

That’s it. No feature engineering, no train loop, no scaling code. The model handles tokenization and quantization internally.

For Chronos-Bolt specifically, models come in four sizes on Hugging Face (as of late 2024): Tiny (9M parameters), Mini (21M), Small (48M), and Base (205M) – and can run on CPU. Tiny on a laptop is genuinely usable for prototyping. Chronos-Bolt was also added to Amazon SageMaker JumpStart in February 2025; as of this writing, check the SageMaker JumpStart catalog for current Chronos-2 availability there.

Pro tip: Don’t assume a transformer beats a simple baseline. Always run a Seasonal Naive forecast first. If your fancy model can’t beat last-week’s-value-this-week, your problem is data, not architecture.

When you actually want TimeGPT instead

TimeGPT is the odd one out – it’s an API, not weights you download. As of early 2026, TimeGPT-1 is a pre-trained foundation model for forecasting and anomaly detection trained on over 100 billion data points, accessible through Nixtla’s Python client. Setup is brutally simple:

from nixtla import NixtlaClient

client = NixtlaClient(api_key="your_key")
forecast = client.forecast(
 df=df,
 h=168,
 X_df=future_promo_df, # exogenous variables for the forecast horizon
 level=[80, 95], # prediction intervals
)

Three things make TimeGPT attractive: anomaly detection is built in, exogenous variables work cleanly, and there’s no GPU bill. As of early 2026, when you create an account you get a 30-day free trial with no credit card required; access expires after 30 days unless you upgrade to a paid plan (enterprise pricing is custom – contact sales).

Two things give me pause. First, Nixtla’s terms forbid using output from TimeGPT to develop models that compete with Nixtla – a clause that matters if you’re building a forecasting product. Second, G2 reviewers note that TimeGPT doesn’t yet provide feature importance or interpretability diagnostics, making it hard to identify which variables drive the forecasts. For regulated industries, that’s a real blocker.

The classical libraries still earn their keep

Foundation models are not always the right answer. For a single well-behaved series with strong seasonality and a clear trend, a SARIMAX from statsmodels will train in 200 milliseconds and explain itself to your boss in plain English. A transformer won’t.

The Python stack worth knowing, ranked by how often I reach for them:

Darts – high-level API, classical and deep learning under one roof, painless N-BEATS
statsmodels – SARIMAX, explicit control over seasonality and exogenous regressors
NeuralForecast (Nixtla) – NHITS, TFT, well-tuned implementations
GluonTS – DeepAR for probabilistic forecasting at scale
sktime – if you live in scikit-learn-land already

One quick warning while we’re listing tools: as of 2025, Amazon Forecast is no longer available to new customers; existing customers can continue using it as normal. Half the 2025 “best of” lists still recommend it. They’re wrong.

The honest limitations nobody puts in the marketing

Foundation models look magical on the demos. In production you’ll hit four walls.

Wall 1: data quality is still the bottleneck. Missing timestamps, duplicate rows, mixed frequencies – the model doesn’t fix any of it. Most failed pilots I’ve seen failed at preprocessing, not modeling.

Wall 2: zero-shot ≠ correct. Zero-shot means “the model gives you an answer without training.” It does not mean “the answer is good.” On highly idiosyncratic series – niche chemicals, rare-event finance – a fine-tuned classical model still wins. The community guidance is consistent: always benchmark against Seasonal Naive and AutoARIMA before trusting the transformer.

Wall 3: API rate limits and batching. Real reviewers report having to split large forecasting jobs into batches because of API limits. If you’re forecasting hundreds of thousands of series, factor that into your architecture from day one – open-weight models on your own GPU don’t have this problem.

Wall 4: interpretability gaps. Most TSFMs can tell you what they predicted but not why. If your auditor or supply chain manager needs the “why,” you’ll be bolting on SHAP-like analysis after the fact, or staying with classical models.

FAQ

Is Prophet still worth using in 2026?

For quick business forecasts on a single series with strong seasonality and holiday effects, yes. For anything involving panels, multivariate dependencies, or modern accuracy expectations, no – pick a foundation model.

Which foundation model should I try first if I have exogenous variables and open-source matters?

Chronos-2, released by Amazon in October 2025. It’s the first in the Chronos family to natively handle covariates, it tops GIFT-Eval among pretrained models, and you can deploy it directly from Hugging Face. The only reason to start with Moirai instead is if you specifically want its variable patch sizes for unusual seasonal periods (e.g., a 17-day biological cycle), or if you’re already on the Salesforce stack.

Can I use a general LLM like GPT-4 or Claude for forecasting?

You can, but you shouldn’t for anything serious. LLMs hallucinate trends and don’t produce calibrated probabilistic forecasts. There’s a niche research direction (Time-LLM, Lag-Llama) that adapts LLMs to time series, and it’s interesting – but for production, a dedicated TSFM will be cheaper, faster, and more accurate.

Your next move

Pick the smallest version of Chronos-Bolt – the 9M-parameter Tiny model – load it on your laptop CPU, and run a zero-shot forecast against one of your real datasets this afternoon. Compare it to a Seasonal Naive baseline. Whichever wins tells you whether the foundation model wave applies to your problem or not. That’s a 30-minute experiment that will change how you approach the next forecasting project.