Skip to content

Rio-3.5 LLM Drama: What the Merge Means for You

Rio de Janeiro's Rio-3.5-Open-397B turned out to be a Nex + Qwen merge. Here's how to spot merged models and use them honestly.

7 min readBeginner

The #1 mistake people are making with the Rio de Janeiro LLM story: treating it as a political scandal instead of a practical lesson. Yes, Nex-AGI’s GitHub issue blew up last week showing Rio-3.5-Open-397B is a weight merge, not original training. But if you’re a developer or AI user, the interesting question isn’t “who lied?” – it’s “how do I tell the next time, and should I still use the model?”

This post walks through what actually happened, how to verify a merged model yourself in about ten minutes, and how to reproduce a similar merge with open tools. The Rio de Janeiro LLM is a useful teaching example precisely because the evidence is all public.

What Rio-3.5 actually is

IplanRIO uploaded Rio-3.5-Open-397B to Hugging Face on June 13, 2026 under the MIT license. The initial pitch: a Brazilian municipal IT company shipping a frontier-class open model. The Hugging Face card now tells a different story.

The current README – updated after the community flagged the issue – states the model is built via a merge of Nex-N2-Pro and Qwen3.5-397B-A17B, followed by on-policy distillation from a stronger model, with an apology that the base merged version was uploaded instead of the final distilled model. The architecture itself is borrowed wholesale: qwen3_5_moe_text, 60 layers, 512 experts, 10 selected experts per token, 397B total parameters with roughly 17B active at inference time, and a 1,010,000-token context window.

Translation: it’s a Qwen-shaped MoE with most of its behavior coming from Nex.

How the community caught it (and how you can repeat the test)

Two pieces of evidence carried the case. Both are reproducible.

1. The weight similarity check. Nex-AGI found every weight tensor in the model matches a blend of approximately 60% Nex-N2-Pro and 40% Alibaba’s Qwen 3.5 across all 60 layers, with no anomalies. If you have the safetensors files for both parents and the suspect model, you can compute this yourself with about twenty lines of Python:

import torch
from safetensors.torch import load_file

rio = load_file("rio-3.5/model-00001.safetensors")
nex = load_file("nex-n2-pro/model-00001.safetensors")
qwen = load_file("qwen3.5-397b/model-00001.safetensors")

for key in rio:
 if key not in nex or key not in qwen:
 continue
 predicted = 0.6 * nex[key].float() + 0.4 * qwen[key].float()
 diff = (rio[key].float() - predicted).abs().mean().item()
 print(f"{key}: mean abs diff = {diff:.6f}")

If the diff is near zero across every layer, you’re looking at a linear merge. No clever statistics required.

What’s strange about the Rio case is that the math was always going to surface. Safetensors files are just dictionaries of tensors – anyone with disk space and a Python session could run this check. Which raises an honest question: was the plan always to disclose after launch, or did no one expect the open-weights community to look this closely this fast? There’s no good answer publicly available yet.

2. The identity probe. With Rio’s hard-coded “You are Rio” system prompt removed, its own deployed model identifies itself as “Nex, from Nex-AGI” 79% of the time – and as “Rio” 0% of the time (per Nex-AGI’s GitHub issue #4, published June 2026). The responses closely mirrored Nex’s trained persona, including backstory details that couldn’t have been independently generated.

Use this with caution though. Identity prompts alone are not proof – base models routinely impersonate each other because of training-data contamination. The weight math is what closes the argument. The identity probe just tells you where to look.

Reproducing the merge yourself with mergekit

Here’s the part most write-ups skip. You can replicate the technique behind Rio-3.5 with arcee-ai/mergekit. The core idea: merging takes two trained models and arithmetically combines their weight tensors. No GPUs needed, no training loop, just disk I/O and addition. You get a new checkpoint in minutes – and as NVIDIA’s developer documentation on merge techniques notes, linear interpolation is one of several standard approaches alongside SLERP, TIES, and DARE.

Install:

git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .

A linear-merge config that mirrors the Rio recipe (substituting smaller models so it actually fits on a workstation):

models:
 - model: nex-agi/Nex-N2-Pro
 parameters:
 weight: 0.6
 - model: Qwen/Qwen3.5-397B-A17B
 parameters:
 weight: 0.4
merge_method: linear
dtype: bfloat16

Run with mergekit-yaml config.yml ./output and you get a Rio-3.5-style checkpoint. The technique itself is legitimate – published, widely used, and entirely within the license terms of both parent models. The problem in Rio’s case was the marketing, not the math.

Pro tip: When merging models with different tokenizers, set tokenizer_source in the YAML – otherwise mergekit truncates the lm_head and embed_tokens matrices to the smallest vocabulary, which silently kills performance on the dropped tokens.

Common pitfalls when consuming or making merges

A few things the takedown threads gloss over but matter if you actually want to use a model like this.

  • Merges rarely beat their best parent. As of June 2026, Decrypt’s testing showed Nex-N2-Pro scoring 75.3% on Terminal-Bench 2.1 versus 70.8% for Rio-3.5, and 1585 vs 1533 on GDPval. If you have a choice, run the parent.
  • “Open” doesn’t mean “runnable.” GGUF quants of Rio-3.5-Open-397B (as listed in the foxipanda repo at time of writing) clock in at 215 GB for IQ4_XS and 331 GB for Q6_K. Citizens of Rio are not running this on a laptop. The sovereignty argument quietly assumes you own a small cluster.
  • The “wrong upload” claim is unfalsifiable until a corrected file appears. The README has been updated but the weights on the repo are still the merged ones. As of this writing, no replacement has been pushed – and the README still points to the merged checkpoint.
  • Identity probes can mislead. If you’re auditing a model, never rely on a single “who are you?” question. Combine it with weight inspection or behavioral fingerprinting on niche tasks the parent was known for.

Merging vs. fine-tuning vs. training from scratch

Quick map so you know what you’re actually looking at when someone announces a model:

Approach Compute Time Result
Train from scratch Millions of GPU-hours Months Genuinely new model
Full fine-tune Thousands of GPU-hours Days-weeks Specialized variant
LoRA / PEFT Single GPU possible Hours Lightweight adapter
Weight merge No GPU required Minutes Blend of parents

What Rio shipped is the bottom row. Presenting it as the top row was the issue. Both Qwen3.5 and Nex-N2-Pro are openly licensed – the problem was disclosure, not legality.

So should you use Rio-3.5?

Honestly? If you need a Qwen-derivative with mild Portuguese tuning and you have the hardware, sure. If you want the strongest behavior in the blend, run Nex-N2-Pro directly. If you want a real Brazilian-Portuguese model, neither is the answer – both are general-purpose models with broad multilingual coverage, not purpose-built Portuguese systems. The Rio episode is more interesting as a template for how to check these claims going forward than as a model to actually deploy.

FAQ

Is the Rio-3.5 model now safe to download from Hugging Face?

The file is intact and MIT-licensed. Nothing dangerous about it. The model card now discloses the merge – so using it is fine, just don’t cite it as a from-scratch Brazilian model.

Why did the merged version slightly underperform Nex-N2-Pro on benchmarks?

Linear merges trade peak capability for averaging. When you blend 60% Nex with 40% Qwen, you’re moving the weights partway toward Qwen’s loss landscape, which dilutes whatever post-training pushed Nex above Qwen in the first place. That’s why on Terminal-Bench 2.1 you see 75.3% for Nex but 70.8% for Rio – exactly the kind of regression you’d predict from interpolation theory. The distillation step IplanRIO claims to have added later could in principle recover some of that gap, but the published weights don’t include it.

How do I disclose a merge correctly if I publish one?

List every parent model in your README front matter under base_model, state the merge method and weights, and link to the source repositories. Mergekit will even auto-generate a model card section for you. That’s the whole bar – and it’s the bar Rio’s initial release didn’t clear.

Next: Pick any “homegrown” open model announced in the last six months. Clone the safetensors index, find the most likely base, and run the diff script above. You’ll be surprised how often the math is already telling.