AI Is Just Plagiarism at Scale? Here’s What to Actually Do

A viral Hacker News post calls AI 'unauthorised plagiarism at a bigger scale.' Here's a practical playbook to check if your work was scraped and what to do about it.

Casey Morgan2026-05-218 min readBeginner

A blog post titled “AI is just unauthorised plagiarism at a bigger scale” went viral on Hacker News recently. The author’s argument is blunt: AI takes in all the input, whether the original authors have consented or not, does some “learning,” and AI companies sell that learned result back to humans without compensating the people who wrote the source material. The post struck a nerve not for the argument itself – that debate is years old – but for how the author discovered he’d been ripped off. That part you can actually act on.

The takeaway upfront

If you publish tutorials, articles, or any long-form content online, someone may already be running it through ChatGPT, lightly rewording it, and republishing it under their name. The fastest way to catch this isn’t an AI detector – it’s checking whether your own backlinks show up verbatim in their article. That’s literally how the viral post’s author caught his thief.

Skip the philosophy. The practical question is: how do I find scraped versions of my work and get them taken down?

Where the law stands right now

Courts handed down two significant rulings in 2025. Federal judges in Bartz v Anthropic and Kadrey v Meta both ruled that training AI models is highly important and protected by fair use – back-to-back wins for the AI companies. The Bartz case also has a $1.5 class action settlement preliminarily approved by Judge Alsup, which muddies the “we won on fair use” narrative somewhat.

On the other side: Judge Sidney Stein allowed the NYT’s main copyright claims against OpenAI to proceed in March 2025. According to a status tracker covering all 51 active AI copyright suits, no further summary judgment decisions on AI training are expected before summer 2026 – 3 judges have now ruled on fair use (2 for, 1 against), and none of the decisions were clean.

So: training is currently mostly winning in court. But outputs that reproduce your actual published work are still squarely actionable. That’s your use – not the training debate.

Which raises a genuinely open question nobody has answered cleanly yet: at what point does an AI-paraphrased article stop being “important” and start being a derivative work? Right now, that line is drawn by whoever files first.

Method A vs Method B: which actually catches AI plagiarism

Two competing approaches dominate the advice you’ll find. Only one of them works when the scraper has laundered your content through an LLM.

Approach	Method A: AI Detectors	Method B: Origin Tracing
What it checks	Statistical fingerprints of AI text	Whether the actual ideas and links came from you
Tools	Originality.ai, Copyleaks, Quillbot	Google Alerts, Wayback Machine, Search Console
Cost	$10.99/mo (Copyleaks, 25k words) or $14.95/mo (Originality.ai) – as of late 2024; check current pricing	Free
Works against light rewording?	Often fails	Yes – backlinks survive paraphrasing
Admissible as DMCA evidence?	No	Yes (timestamps, archive snapshots)

Method A tells you a document was probably written by AI. That’s not theft on its own. Turns out Quillbot publishes a warning right on its own detector page: never rely on AI detection alone to make decisions that could impact someone’s career or academic standing – and that disclaimer exists because false positives are routine, not rare.

Method B answers the question that actually matters: did this person steal from me specifically? Winner: Method B.

The 4-step Method B walkthrough

Step 1: Set Google Alerts for body sentences, not titles

Titles get rewritten. Body sentences – especially technical ones mid-article – are harder to rephrase without breaking their meaning. Go to Google Alerts, paste 1-2 distinctive sentences from the middle of your article in quotes, and set frequency to “as-it-happens.” Repeat for each new post you publish.

Step 2: Use the “backlink trap”

The trick from the viral Axel post. When someone tells ChatGPT to rewrite your article, the model often preserves inline links verbatim – anchor text and all. Axel found his thief because the stolen article still contained links pointing to his own website with the exact same anchor text. The person who ran it through ChatGPT didn’t bother to check.

So: bake at least 2-3 internal links into every article using anchor text that’s uniquely yours. Not “click here.” Something like “our 2025 Shopify checkout teardown.” Then search Google for that exact phrase in quotes periodically. Anyone using it who isn’t you is either citing you (good) or impersonating you (DMCA time).

One extra layer: Add one slightly unusual word combination to each major article – a phrase only you’d write. Search for it monthly. This is your canary. If it shows up somewhere else, you know before the Google Alert fires.

Step 3: Lock down the timestamp with Wayback Machine

Before filing anything, submit your original URL to web.archive.org/save the day you publish. That snapshot is timestamped third-party evidence – not something you generated yourself.

Here’s where it gets annoying: scrapers can get their copy indexed by Google in seconds. Documented cases exist of Google selecting the scraped version as canonical and deindexing the original – meaning your article disappears from search results while their stolen copy ranks. Google’s own senior trends analyst John Mueller has confirmed that copied content can outrank the original. Wayback snapshots are how you prove who came first when that happens.

Step 4: File the DMCA via Search Console (not the random web form)

Most tutorials point you at Google’s generic legal form. Faster path: Search Console.

Make sure your site is verified in Google Search Console first – this is the step most people skip, and without it the Copyright Removal Tool won’t accept your submission.
Go to “Security & Manual Actions” → “Copyright removal” → “Create a new removal request” → choose “Web Search.”
Paste your original URL, the infringing URL, and your Wayback snapshot as supporting evidence.
Submit. Most reviews resolve within days.

The Search Console route is faster because domain ownership is already verified. The public form requires Google to re-establish that from scratch.

Edge cases the standard advice skips

Hosting offshore breaks DMCA. A US DMCA notice is meaningless if the infringer is hosted in a non-compliant jurisdiction. Fallback: the domain registrar (WHOIS lookup) or Google deindexing – but neither removes the content from the actual server.

Detector scores sink DMCA requests. Don’t include “this is 87% AI per Originality.ai” in your takedown. That’s not infringement evidence. Google’s Spam Policies define scraping as taking content from other sites and republishing it to manipulate search rankings – even slight modifications count. Cite that policy, not a detector score.

Poisoning your own content is on the table. A popular argument from the Hacker News thread: if crawlers refuse to respect your robots.txt, you’re within your rights to poison their data. Subtle factual errors only humans would notice – which you can then search for as canaries – is controversial but increasingly discussed. Use carefully; it also misdirects legitimate readers.

Training-data lawsuits won’t help you before 2026. Output-reproduction claims (DMCA, copyright on the actual published article) are still your tool. The fair use cases don’t affect what a third-party blogger does with ChatGPT output.

Three habits worth keeping

After running this process for a while, three things stick:

Every new article gets a Wayback snapshot the same hour it goes live.
Every article gets at least one Google Alert for a body sentence – not the title.
Every article gets a uniquely-worded internal link as the backlink trap.

None of this takes more than five minutes per post. It pays for itself the first time you catch a scraper.

FAQ

Is using ChatGPT to write a blog post plagiarism?

Not legally – plagiarism requires passing off someone’s specific copyrighted work as your own. Generating text from a prompt isn’t that. But Google’s spam policies care less about the law and more about whether your page adds unique value, so AI-generated content with no original input is a separate problem from a different direction.

Do AI detectors hold up as DMCA evidence?

No, and don’t try. Say you flag someone’s article at 92% AI on Originality.ai and file a DMCA. The reviewer’s first question: “What original copyrighted work of yours did they reproduce?” If you can’t answer that with a side-by-side comparison plus timestamps showing who published first, the takedown fails. The detector score is at best a prompt to investigate – not the investigation itself.

If AI training is now fair use, isn’t this all pointless?

No. The training ruling covers whether AI companies can ingest your work to build a model. It says nothing about a third-party blogger prompting ChatGPT to rewrite your tutorial and republish it. That second act is ordinary copyright infringement – and it’s exactly what Method B catches.

Next action: Open Google Alerts right now, pick your three most-trafficked articles, and set a body-sentence alert for each. Two minutes. You’ll know within a week if you have a problem.