How do I find out which products share copy?

Script it. No Admin API query searches description text or scores similarity, and duplicate-finder apps generally match record fields like SKU and handle, not prose. With the catalog as local files, the question is a few minutes of scripting for your agent, and the answer comes back as an index of clusters.

Scratch • How to rewrite duplicate supplier descriptions on Shopify with AI

Q: Will the rewrites all sound the same as each other?

They can. Bulk AI rewrites have a known pattern: the same opening sentence surfacing on page after page. Key each opening to a product fact, hand the agent your banned-phrase list, and scan the diffs in batches, where 10 identical openings in a row are obvious.

Q: Is this the same as merging duplicate product records?

No. Records and prose are different problems. Merging or deleting duplicate product records is not something Scratch does. This page is about 5,000 distinct products wearing the same paragraphs, which no record merge fixes.

Q: Do I need to be technical to run the similarity pass?

No. The agent writes the script. You write the brief in plain English, the agent builds the index and the rewrites, and your part is reading diffs and clicking approve or reject.

Paste a sentence from one of your product descriptions into Google, in quotes. The results are the wall: forty other storefronts, give or take, dropshipping the same product under the same paragraph, because every store that imported the supplier feed got the same copy you did. Google has crawled all of them. For any search that matches that paragraph, it clusters the lookalikes, shows one page, and files the rest. There is no penalty in this. There is also no reason the page it shows would be yours.

The rewriting is the easy half of this job; the hard half comes first. You need to know which of your 5,000 products share copy and how much: exact duplicates, near-duplicates, the products quietly copying each other inside your own store. No Shopify report answers that, and no Admin API query scores it. Then the flagged products need new copy in your voice with every spec intact, at a scale no one-at-a-time tool survives. The fix that holds runs on local files: the duplication becomes an index your agent can script, the rewrites touch only what the index flags, and nothing reaches the storefront that you have not read as a diff.

Your options

Leave it and hope

It is the default, and it is cheaper than it sounds. Google's own guidance is blunt: there is no duplicate content penalty, and a store that lives on ads, social, or searches for its brand name loses little by carrying the supplier copy. The mechanics bite somewhere else. When a query matches text that exists only in the shared description, Google shows one of the stores carrying it and filters the rest, and you did not pick the odds. Pages can sit in "Crawled - currently not indexed" indefinitely, with duplicate copy one common cause. Standing pat means staying entered in a lottery you never chose, once per long-tail query.

Hand-rewriting the top sellers

For the products that earn it, nothing beats a writer. Real voice, claims a human checked, and a sensible 80/20 triage: your 50 heroes deserve hand-written pages. Past the heroes, the math stops working. An experienced writer produces roughly 10 short descriptions an hour, so 5,000 products is on the order of 500 writer-hours, at anywhere from a few dollars to a few hundred per description. Meanwhile the duplication problem lives in the long tail you will never reach, and Google's filter applies page by page. There is also a hidden prerequisite: knowing which products share copy in the first place, which by hand is its own project.

Spinner tools

Spinners and paraphrasers pitch exactly this catalog: feed in the supplier copy, get back text that is literally different, instantly and for almost nothing. The output passes a copy-paste test. It fails the policy that matters. Google's scaled content abuse rules name "automated transformations like synonymizing, translating, or other obfuscation techniques" as spam, "no matter how it's created", and the stated consequence is ranking lower or not appearing at all. A spinner trades a duplication problem, which Google merely filters, for a spam problem, which Google acts on. Your customers read the swapped synonyms too.

AI in a spreadsheet, via CSV

Export the catalog, run an AI formula down the description column, re-import. Credit where due: the model brings real judgment to each row, and the output sits in cells you can read before anything ships. Two things break. The spreadsheet cannot score near-duplicates or tell you which product copies which, so all 5,000 get rewritten whether they needed it or not, and spreadsheet AI formulas are prone to timing out on long prompts mid-column. Then the re-import: a blank cell in a non-required column overwrites the live value with blank, the import writes straight to the store, and the only undo is the backup you exported first. The flagship guide walks those edges in full.

Scratch

Scratch pulls the catalog into local files and hands your own agent, Claude, Codex, Cursor, or Copilot, both halves of the job. The half no API offers: on a folder, the agent scripts the similarity pass itself, shared n-grams, near-duplicate clusters, which product copies which, then rewrites only what it flagged, in your voice, with the facts kept. The half a live store needs: every rewrite comes back as a word-level diff next to the original, prices, variants, inventory, and metafields are locked at the connector, and any published product reverts per row if you change your mind. The cost is stated plainly: the review step is you, reading. A few hundred flagged products is a real afternoon, on purpose.

Option	Finds the duplicates	Unique copy, facts kept	Review before live	Undo after publish
Leave it and hope	No	No	Nothing changes	Nothing to undo
Hand-rewriting top sellers	By hand, slowly	For the few you reach	You wrote it	No version history
Spinner tools	No	Different words, spam risk	No	No
AI in a spreadsheet	No, it rewrites everything	Per cell, unchecked	In cells, then a blind import	The export you remembered to make
Scratch	Yes, scripted across the catalog	Yes, only where needed	Every change, as a diff	Per product, even after publish

How the loop works on a duplicated catalog

Scratch pulls your catalog into files. The similarity pass needs all 5,000 descriptions in one place, and the pull puts them there: one JSON file per product, shaped the way the GraphQL Admin API returns it, sitting in a folder on your laptop. The supplier copy is the field this job edits, descriptionHtml; prices, variants, and inventory ride along locked, and metafields stay locked at the connector. Nothing has shipped yet, and nothing will until step 3.
Your agent maps the duplication, then rewrites it. This is the part no endpoint offers. Page through the catalog at 250 products a call and you still end up holding data the API cannot compare; Shopify's bulk export concedes the point by answering with a file. But no export answers the question. There is no API call that returns which products share copy, no matter how many calls you make. On files, it is a small script: tokenize 5,000 descriptions, count shared n-grams, cluster the near-duplicates. That is about 12.5 million pairwise comparisons, a few minutes on a laptop, with the answer saved as an index you keep. Then the brief: Score every description for shared text and write the clusters to an index. Rewrite each flagged product in our voice: change the sentences, keep every spec, measurement, and material, and say who the product is for. The agent works through the flagged files and never holds your store credential; until you publish, the rewrites are just edits on disk. It does 99% of the job, the mapping and the rewriting. The other 1% is deciding what your storefront says, and that is step 3.
You review the diffs in batches and publish. A catalog-wide rewrite is a ranking bet, so do not place it as one bet. Approve a cluster at a time, watch Search Console between batches, and let optional validators screen the mechanical part first: length caps, banned phrases, a spec string that must survive. Scratch lays every rewrite next to the original, word by word, and writes back only what you approve, through the Admin API, with a log of what went out. A batch that lands wrong reverts per product, even after publish.

Here is the pull, edit, review loop running on a live Shopify catalog:

Claude5:24

Claude5:57

The brief that keeps you out of the spam policy

Google's scaled content abuse policy does not key on whether AI wrote the text. It keys on whether many pages were generated without adding value. That is the line your brief has to hold, and it happens to be plain good copy sense.

Add what the supplier left out: who the product is for, how it fits, what it pairs with. The facts are the value; the voice is the differentiation.
Keep every spec, measurement, and material from the original. A rewrite that loses the data is worse than the duplicate it replaced.
Vary the openings. Bulk AI rewrites have a known failure mode: the same opening sentence on hundreds of pages, which recreates the duplication you paid to remove. Key each opening to a product fact and the pattern disappears.
Read the diffs for sameness, not just correctness. 10 rewrites in a row that start the same way is a brief problem, and far cheaper to fix in the brief than on the storefront.

Questions people ask

Does Google penalize duplicate product descriptions?

No. Google has said for years that there is no duplicate content penalty, and sharing a manufacturer description does not demote a store whose site otherwise differs. What Google does is filter: pages carrying the same text get clustered, one is shown, the rest are hidden for the queries that text would have matched. The cost is not a demotion. It is invisibility, which shows up in no report except as traffic that never arrives.

Script it. No Admin API query searches description text or scores similarity, and duplicate-finder apps in the app store generally match record fields like SKU, barcode, and handle, not prose. A desktop SEO crawler can flag near-duplicate pages on the rendered storefront, but it reads pages rather than catalog fields, and it rewrites nothing. With the catalog as local files, the same question is a few minutes of scripting for your agent, and the answer comes back as an index of clusters you keep.

Will Google treat 5,000 AI rewrites as spam too?

It can. The scaled content abuse policy applies no matter how the text is created, and generating many pages without adding value is on its list of examples. The policy keys on value, not on the tool. A rewrite that adds the facts a buyer needs and reads like your store is differentiation; 5,000 reshuffles of the same sentences is spinning with better grammar. The brief above is the difference, and the diff review is where you check that it held.

Will the rewrites all sound the same as each other?

They can. That failure recreates the problem you started with. Bulk AI rewrites have a known pattern: the same confident opening sentence surfacing on page after page. The fixes are mechanical. Key each opening to a product fact, hand the agent your banned-phrase list, and scan the diffs in batches, where 10 identical openings in a row are obvious in a way they never are in the admin.

Will the specs survive the rewrite?

Mostly. The brief locks the facts: keep every measurement, material, and compatibility claim, change only the sentences around them. Validators earn their keep twice on this job: one rule pins the spec string that must survive, another bans the supplier's stock phrasing so a lazy rewrite cannot smuggle the duplicate sentence back in. The word-level diff handles whatever else moves; a dropped measurement reads as a deletion, and you reject it before the cluster ships.

Can I undo a batch if rankings dip?

Yes, and shipping in clusters is what makes the undo usable. Watch Search Console between batches; if a cluster lands wrong, walk that cluster back row by row in Scratch, which keeps the pre-rewrite description on file for every product it published. Do not look to the store for help: Shopify has no version history for descriptions, so the way back exists only on the Scratch side.

Will this touch prices, variants, or inventory?

No. They are locked at the connector level, with no write path back; the worst the similarity script can do is propose more prose. On products, Scratch edits descriptions, titles, handles, vendor, type, tags, and the SEO fields; articles, blogs, and pages are editable too. Need a field that is not on the list? Tell Curtis.

Is this the same as merging duplicate product records?

No. Records and prose are different problems. If the same item exists as 2 product entries, a duplicate-finder app that matches on SKU or handle is the right category, and merging or deleting records is not something Scratch does. This page is about 5,000 distinct products wearing the same paragraphs, which no record merge fixes.

Do I need to be technical to run the similarity pass?

No. The agent writes the script; that is what it is for. You write the brief in plain English, the agent builds the index and the rewrites, and your part is reading diffs and clicking approve or reject. The first cluster report is usually the moment the problem stops being abstract.

See it on your own catalog

The paste test takes 10 seconds; the fix takes an afternoon. Book a 30-minute demo on your duplicates →, or try Scratch free, pull your catalog, and let your agent score the first 500 descriptions before you decide anything ships.

How to rewrite duplicate supplier descriptions on Shopify with AI