Humanizer for German: Why AI Text Detection Is Language-Specific
AI-generated text has telltale patterns. But those patterns aren't universal.
When you read a text that feels wrong—too smooth, perfectly structured, every paragraph the same length—you can often sense that an LLM wrote it. But what you're sensing is mostly English. German AI text sounds different. It has its own tells.
The problem: most AI detection tools and guides are designed for English. They catch English patterns beautifully. They miss German ones entirely.
I built the Humanizer (Deutsch) to fix that. It's a free, open-source Claude Code skill that detects 45 German-specific AI writing patterns, ranks them by severity, and walks you through a structured 2-pass cleanup. New in v3: optional voice calibration that matches your personal writing style from samples.
Try it
github.com/marmbiz/humanizer-de — MIT licensed, free, works as a Claude Code skill.
git clone https://github.com/marmbiz/humanizer-de ~/.claude/skills/humanizer-de
Then in Claude Code: /humanizer — done.
What the Humanizer Does
You call it with /humanizer or just say "Humanize this text for me." It gives you:
- A draft rewrite with the obvious AI patterns removed
- A quick anti-AI audit flagging remaining tells
- A final version after the second pass
New in v3: Voice calibration. Provide a sample of your own writing and the skill analyzes your sentence rhythm, word choices, and quirks — then applies them to the rewrite. Instead of generic "clean" output, you get text that sounds like you.
Three modes adjust the correction to your context:
| Mode | When | What happens |
|---|---|---|
| Casual | Blog posts, social media, newsletters | Adds personality and rhythm |
| Neutral | Business reports, product docs, emails | Removes AI tells, keeps tone neutral |
| Formal | Academic papers, legal texts, technical docs | Only removes tells, preserves structure |
Default is Neutral when the context is unclear.
New in v3.2: Four additional patterns and structural guardrails:
- Source incongruence (HIGH): The source exists, but doesn't actually support the claim in the text. Classic LLM hallucination signal.
- Hidden Unicode characters (HIGH): Zero-Width Space, Soft Hyphen, bidi controls — invisible artifacts from LLM output.
- Standard chapters without substance (MEDIUM): Generic headings like "Future perspectives" with unsourced filler. Don't shorten — concretize, integrate, or re-title.
- Anglicism structures (MEDIUM): Hard calques ("am Ende des Tages") and false friends ("eventuell" in the sense of "eventually/finally" rather than "possibly").
Plus harmonized guardrails: "Never shorten substance" instead of "Never shorten" (artifacts may be removed), the 3+-clustering rule applies only to soft stylistic patterns, and mode-dependent voice calibration is now consistent throughout.
Severity ranking (HIGH / MEDIUM / LOW) for each pattern lets you focus on what matters most. HIGH patterns are almost always AI. LOW patterns only stand out when they cluster.
Why German AI Text Is Different
English and German diverge in their vulnerabilities to LLMs. The same model that produces flawless English can betray itself immediately in German through patterns that native English speakers don't notice.
Take these examples:
- Participle-I constructions like "gewährleistend" or "hervorhebend" (ensuring, highlighting). In English, "-ing" forms are natural everywhere. In German, this construction screams LLM.
- Overused transition phrases like "Darüber hinaus" (furthermore) appearing three times per paragraph. Native German writers vary their transitions. LLMs repeat the same mechanical connectors.
- Em-dashes everywhere — a punctuation habit from English that German doesn't share natively.
- Vague authorities like "Experten sagen" (experts say) with no sources attached.
- Symbolic overload like "steht als Zeugnis für" (stands as testimony to) — nobody writes like this.
- Promotional tone with "atemberaubend" (breathtaking) in contexts where it doesn't belong.
- Chatbot artifacts like "Stand Januar 2024" (as of January 2024) appearing in articles written months later.
Before (LLM):
Die atemberaubende Stadt mit ihrem reichen kulturellen Erbe steht als Zeugnis für die künstlerische Brillanz vergangener Generationen.
"The breathtaking city with its rich cultural heritage stands as testimony to the artistic brilliance of past generations."
After (human):
Die Stadt hat eine lange Geschichte. Ihre Denkmäler zeigen die Handwerkskunst des Mittelalters.
"The city has a long history. Its monuments show medieval craftsmanship."
Less decoration, more substance.
45 Patterns in 8 Categories
The Humanizer detects patterns across eight categories:
1. Language & Tone (12 patterns, mostly HIGH)
Symbolic overload, promotional language, editorial comments, mechanical conjunctions, section summaries, participle-I constructions, vague authorities, forced conclusions, negative parallelisms (now including clipped negation fragments like "kein Raten.", "keine Kompromisse."), tricolon overuse, false extensions, misplaced "Fazit" sections.
2. Style (4 patterns, MEDIUM/LOW)
Excessive bold text, false lists, emojis before headings, em-dash overuse (now with a replacement hierarchy: period > comma > colon > semicolon > parentheses > rephrase, plus detection of paired inserts and dash variants).
3. Communication (6 patterns, all HIGH)
Letter-style writing, collaborative chatbot phrases ("I hope this helps!"), knowledge cutoff references, prompt refusals, placeholder text, links to search queries.
4. Markup (6 patterns, all MEDIUM)
Markdown instead of wikitext, broken wikitext and AI tool artifacts (oaicite tags, contentReference spans, turn0search0 references), dead links, full citation fabrication (hallucinated publications, non-existent journals, utm_source parameters), incorrect reference formats, wrong categories.
5. Miscellaneous (3 patterns, LOW/MEDIUM)
Abrupt cutoffs, style shifts mid-text, first-person edit summaries.
6. Rhetoric & Structure (7 patterns)
| Pattern | Severity | Example |
|---|---|---|
| Persuasive authority phrases | MEDIUM | "Im Kern" (at its core), "In Wirklichkeit" (in reality) |
| Signposting | MEDIUM | "Schauen wir uns an" (let's look at), "Here's what you need to know" |
| Fragmented headings | LOW | Generic one-liner immediately after a heading |
| Rhetorical questions as fake engagement | MEDIUM | "Aber was bedeutet das?" (But what does this mean?) |
| Universal human experience opener | MEDIUM | "Seit jeher" (since time immemorial), "Seit Anbeginn der Zivilisation" |
| "In today's X world" framing | MEDIUM | "In der heutigen digitalen Welt" (in today's digital world) |
| Aspirational corporate closing | MEDIUM | "bestens aufgestellt" (well-positioned), "die Möglichkeiten sind grenzenlos" |
7. Argumentation & Evidence (3 patterns)
| Pattern | Severity | Example |
|---|---|---|
| Passive constructions and subjectless fragments | MEDIUM | "wurde durchgeführt" (was carried out), "Keine Konfiguration nötig." (No configuration needed.) — hides the actor |
| Conditional stacking | MEDIUM | Piled-up "wenn/falls/sofern" (if/in case/provided that) clauses instead of stating what the analysis found |
| Miscalibrated epistemic confidence | MEDIUM | Swings between over-assertion ("grundlegend verändert", "zweifellos") and over-hedging ("scheint möglicherweise", "könnte eventuell") |
LLMs hide the actor behind passive voice and subjectless sentences. They stack conditionals where a direct statement would do. Most telling: the swing between over-assertion ("revolutionary", "without doubt") and over-hedging ("seems possibly", "could perhaps") within the same paragraph.
Patterns 32–34 were adapted from upstream PR #39. Patterns 35–38 were adapted from upstream PR #67. Patterns 39–41 are from v3.1, adapted from upstream PRs #79, #80, #84, #85, #94, #96. All have German-specific phrasing and examples.
8. Additions (4 patterns, new in v3.2)
Four patterns drawn from the German Wikipedia's Erkennung KI-Einsatz guideline and its Schnelltest KI companion:
| Pattern | Severity | Example |
|---|---|---|
| Source incongruence | HIGH | Source exists but doesn't support the claim |
| Hidden Unicode characters | HIGH | Zero-Width Space (U+200B), Soft Hyphen, BOM, bidi controls |
| Standard chapters without substance | MEDIUM | "Future perspectives" + unsourced filler; don't shorten — concretize/integrate |
| Anglicism structures | MEDIUM | Hard calques & false friends: "am Ende des Tages", "eventuell" = "eventually/finally" (not "possibly"), "aktuell" = "actually" (not "currently") |
Source incongruence is particularly tricky: the source exists, the DOI validates, the author did publish — only the paper doesn't actually support the claim. A classic LLM hallucination pattern that simple fact-checking tools miss. False friends like "eventuell" (eventually = finally, not "maybe") are corrected regardless of mode because they are semantic errors.
Why I Created the German Humanizer
I discovered Siqi Chen's original Humanizer and immediately saw the gap: it worked brilliantly for English, but German AI had different patterns. Testing it on German text was like using an English spell-checker on German — not wrong, just missing the point.
The German Wikipedia maintains its own guide to AI-generated content indicators. The English Wikipedia has a comparable resource. Siqi's original pulls from the English one; the German version documents something different. I used both as the foundation.
The philosophy is the same as Siqi's tool — analysis, not auto-rewriting. But the patterns are German-specific.
Working with English content? Use Siqi Chen's original Humanizer. It's excellent for English text.
Working with German content? That's what the German adaptation is for.
If your goal is to disguise AI use, this is the wrong tool. The point is better writing, not camouflage.
Who Needs This
- German content creators using AI who want their writing to sound authentic
- Marketing teams reviewing copy for AI artifacts
- Wikipedia editors evaluating German submissions
- Bilingual teams where English editors need to catch German AI patterns
- Anyone learning how to recognize German AI-generated text
Credits and Open Source
The tool is MIT licensed and open source. It builds on:
- Pattern research: German Wikipedia's AI detection guide + English Wikipedia's AI writing signs
- Original concept & English Humanizer: Siqi Chen (blader)
- German adaptation: github.com/marmbiz/humanizer-de
I built the German version. Siqi built the original. Both Wikipedias documented the patterns.
Changelog
v3.2.4-de.1 (April 2026)
- 4 new patterns (42–45): Source incongruence, Hidden Unicode characters, Standard chapters without substance, Anglicism structures — new category "Additions"
- Additional sources: Now also builds on the Wikipedia guidelines Erkennung KI-Einsatz and Schnelltest KI
- Guardrails harmonized: "Never shorten substance" (instead of "Never shorten") with an explicit exception list for artifact cleanup
- 3+-clustering rule limited to soft stylistic patterns; HIGH patterns, structural findings, source-based findings, and false friends are corrected on every occurrence
- Mode system made consistent: "Add voice" full in Casual, moderate in Neutral, none in Formal
- Operational precisions in patterns 21, 22, 25, 26: External research is out of scope for the skill; mark instead
- 45 patterns across 8 categories
v3.1.0-de.1 (April 2026)
- 3 new patterns (39–41): Passive constructions, Conditional stacking, Miscalibrated epistemic confidence — new category "Argumentation & Evidence"
- 4 expanded patterns: Negative parallelisms (+clipped negation fragments), Dashes (replacement hierarchy), Broken wikitext (+AI artifacts), DOIs (→full citation fabrication)
- "Never shorten" rule: Output must cover everything the original contains
- Dash scan: Dedicated workflow step
- Quick checklist: 7-point pre-output audit
- 41 patterns in 7 categories
- Integrates 6 PRs from the English original (blader/humanizer): #79, #80, #84, #85, #94, #96
v3.0.0-de.1 (March 2026)
- Voice calibration: Match the user's personal writing style from samples
- 4 new patterns (35–38): Rhetorical fake questions, Universal human experience openers, "In today's X world" framing, Aspirational corporate closings (adapted from upstream PR #67)
- 38 patterns total
v2.3.0-de.1 (March 2026)
- 3 new patterns (32–34): Persuasive authority phrases, Signposting, Fragmented headings (adapted from upstream PR #39)
- Severity ranking (HIGH / MEDIUM / LOW) for all 34 patterns (inspired by upstream PR #51)
- Mode system: Casual / Neutral / Formal
- Quick reference table for fast scanning
- "Don't touch" rules and guardrails
v2.2.0-de.2 (February 2026)
- 2-pass workflow instead of one-shot cleanup: Draft -> Quick audit -> Final
- More emphasis on voice: rhythm, perspective, natural variation
- Cleaner review format: three separated output blocks
This post was written with AI assistance. But reviewed with language-specific awareness. Because the patterns that reveal AI aren't just in what you write — they're in which language you write it in.