6-step workflow to detect AI-generated text. Includes manual indicators, Python code for stylometric analysis, detection tool comparison, and decision framework.

To detect AI-generated text, apply a six-step multi-signal evaluation: (1) establish context and consequences, (2) inspect for stylistic uniformity, hedging language, and factual fabrication, (3) run automated detection tools as supplementary signals, (4) compare stylometric fingerprints against author baselines, (5) verify factual claims against primary sources, and (6) assess the totality before making any consequential decision. No single signal is reliable — convergence across multiple indicators is the standard.

Who this is for: Educators evaluating student submissions, editors reviewing contributed content, compliance teams assessing documentation, and anyone who needs to determine whether specific text was AI-generated.

Critical caveat: No AI text detection method is reliable enough for high-stakes decisions in isolation. False positives disproportionately affect non-native English speakers and formal writers. This guide provides a multi-signal evaluation framework — never base consequential decisions on a single indicator or tool score.

What AI-Generated Text Is and Why Detection Matters

AI-generated text is content produced by large language models (LLMs) — systems like GPT-4, Claude, Gemini, and their derivatives. The detection challenge arises because LLM output is grammatically correct, topically relevant, and stylistically variable — it does not contain the “tells” of earlier machine-generated text.

Detection matters in specific contexts:

Academic integrity — evaluating whether student work is original
Content authenticity — verifying that published content was written by the attributed author
Scientific publishing — identifying AI-generated manuscripts that bypass peer review. The ‘vegetative electron microscopy’ incident demonstrated how AI-generated content contaminated at least 22 scientific papers
Disinformation — detecting AI-generated content in coordinated manipulation campaigns

For the underlying science — how detection methods work and where they fail — see the AI-Generated Text Detection Methods reference page.

Threat patterns this guide addresses

Disinformation Campaigns — coordinated AI-generated content campaigns
Misinformation & Hallucinated Content — AI-generated text presenting fabricated claims as factual

Step 1: Establish the Context

Before analyzing the text, understand what question you are actually asking:

What is the consequence of a false positive? If wrongly accusing someone of AI use could result in disciplinary action, job loss, or reputational damage — apply a higher evidence threshold What is the consequence of a false negative? If undetected AI generation could cause harm (fraudulent credentials, contaminated research) — invest more in verification Is AI use prohibited or just undisclosed? Many contexts allow AI assistance with disclosure. The question may be "was AI used without attribution?" rather than "was AI used at all?" Do you have a writing baseline? Detection is more reliable when you can compare against known authentic writing by the same author

Step 2: Manual Inspection Checklist

Examine the text for indicators of AI generation. Each is suggestive, not conclusive.

Stylistic indicators

Uniform sentence complexity — human writing naturally alternates between short and long sentences; AI tends toward more consistent structure Formulaic transitions — overuse of "Furthermore," "Moreover," "It is worth noting that," "In conclusion" at predictable intervals Hedging language — excessive qualifiers: "It is important to note," "While there are various perspectives," "This is a complex issue" Absence of personal voice — no idioms, humor, self-reference, anecdotes, or distinctive perspective Perfect structure — text follows a textbook organizational pattern (introduction → body paragraphs → conclusion) without deviation or digression

Content indicators

Confident claims without citations — presents specific facts, statistics, or dates without sources (LLMs generate plausible-sounding claims that may be fabricated) Fabricated references — citations that look plausible but don't exist when checked. Verify at least 2–3 cited sources Superficial depth — covers a topic broadly but lacks the specific insights, examples, or nuances that domain expertise produces Absence of errors — no typos, no grammatical mistakes, no self-corrections. Professional human writing contains occasional imperfections Generic examples — illustrative examples are textbook-typical rather than drawn from specific experience

Contextual mismatch indicators

Skill level mismatch — writing quality dramatically exceeds the author's demonstrated ability in prior work or verbal communication Knowledge mismatch — text demonstrates knowledge of topics the author has no documented background in Style discontinuity — dramatic change in writing style from the author's previous work (vocabulary, sentence structure, tone) Impossible production speed — work submitted implausibly quickly relative to its length, quality, and complexity

Automate the manual checklist with Python

The stylistic and content indicators above can be partially automated. The following script computes measurable proxies for several manual inspection signals — sentence length uniformity, vocabulary diversity, hedging frequency, and contraction absence:

import re

HEDGING_PHRASES = [
    "it is important to note", "it is worth noting",
    "furthermore", "moreover", "in conclusion",
    "while there are various", "this is a complex issue",
    "it should be noted", "generally speaking",
    "there are several", "it is essential to",
]

def analyze_text_signals(text: str) -> dict:
    """Compute statistical signals that may indicate AI-generated text.
    These are suggestive indicators, not conclusive evidence."""
    sentences = [s.strip() for s in re.split(r'[.!?]+', text) if s.strip()]
    words = text.lower().split()
    unique_words = set(words)

    # Sentence length variance — low variance suggests uniform AI structure
    lengths = [len(s.split()) for s in sentences]
    mean_len = sum(lengths) / len(lengths) if lengths else 0
    variance = (
        sum((l - mean_len) ** 2 for l in lengths) / len(lengths)
        if lengths else 0
    )

    hedging_count = sum(1 for p in HEDGING_PHRASES if p in text.lower())

    return {
        "sentence_count": len(sentences),
        "avg_sentence_length": round(mean_len, 1),
        "sentence_length_variance": round(variance, 1),
        "type_token_ratio": round(len(unique_words) / len(words), 3) if words else 0,
        "hedging_phrase_count": hedging_count,
        "contraction_count": len(re.findall(r"\b\w+'\w+\b", text)),
        "word_count": len(words),
    }

# Usage
text = open("suspect_document.txt").read()
signals = analyze_text_signals(text)
print(signals)
# Low variance + low TTR + high hedging + zero contractions → investigate further

Interpreting the output: No single metric is diagnostic. Low sentence length variance (below ~15) combined with zero contractions and 3+ hedging phrases warrants further investigation, but each can also appear in legitimate formal writing.

Step 3: Run Automated Detection Tools

Use one or more AI text detection tools as a supplementary signal. Never treat a tool score as a verdict.

Tool	Approach	Best for
GPTZero	Multi-feature (perplexity, burstiness)	Academic integrity
Originality.ai	Neural classifier + plagiarism	Content publishing
Turnitin AI Detection	Integrated with plagiarism infrastructure	Academic institutions
Copyleaks	Multi-lingual detection	Enterprise compliance

Submit the full text (detection accuracy degrades below ~250 words) Record the score, tool name, and date (tools update regularly) If multiple tools are available, run more than one — concordance increases confidence Check whether the text is in a language the tool supports well — accuracy varies by language Be aware of known false positive patterns: non-native English speakers, highly technical writing, formulaic genres (legal briefs, grant applications), and translated text all trigger elevated false positive rates

Step 4: Stylometric Comparison (When Baseline Exists)

If you have authenticated writing samples from the purported author, compare:

Sentence length distribution — does the statistical distribution of sentence lengths match the author's baseline? Vocabulary richness — does the type-token ratio (unique words / total words) match the author's typical range? Punctuation patterns — does the use of semicolons, dashes, parentheses, and ellipses match? Contraction frequency — does the rate of contractions (don't, can't, it's) match the author's informal vs formal ratio? Paragraph structure — does the paragraph length distribution and topic development pattern match?

Stylometric comparison is the most reliable detection method when a baseline exists. It is the least reliable when no baseline exists or when the author has limited prior writing.

Automate stylometric comparison with Python

When you have authenticated writing samples from the same author, use the following script to build a stylometric fingerprint and flag deviations:

import re
import statistics

def stylometric_profile(text: str) -> dict:
    """Build a stylometric fingerprint from a text sample."""
    sentences = [s.strip() for s in re.split(r'[.!?]+', text) if s.strip()]
    words = text.lower().split()
    lengths = [len(s.split()) for s in sentences]

    return {
        "mean_sentence_length": round(statistics.mean(lengths), 2),
        "stdev_sentence_length": round(
            statistics.stdev(lengths) if len(lengths) > 1 else 0, 2
        ),
        "type_token_ratio": round(
            len(set(words)) / len(words) if words else 0, 3
        ),
        "contraction_rate": round(
            len(re.findall(r"\b\w+'\w+\b", text)) / len(words) if words else 0, 4
        ),
        "semicolon_rate": round(
            text.count(";") / len(sentences) if sentences else 0, 4
        ),
        "question_rate": round(
            text.count("?") / len(sentences) if sentences else 0, 4
        ),
    }

def compare_profiles(baseline: dict, suspect: dict) -> list[str]:
    """Flag metrics where suspect deviates from baseline by more than 50%."""
    flags = []
    for key in baseline:
        if baseline[key] == 0:
            continue
        relative_diff = abs(baseline[key] - suspect[key]) / baseline[key]
        if relative_diff > 0.5:
            flags.append(
                f"{key}: baseline={baseline[key]}, suspect={suspect[key]} "
                f"(diff={relative_diff:.0%})"
            )
    return flags

# Compare known author writing against suspect document
baseline = stylometric_profile(open("known_author_samples.txt").read())
suspect = stylometric_profile(open("suspect_document.txt").read())

flags = compare_profiles(baseline, suspect)
if flags:
    print("Stylometric deviations detected:")
    for f in flags:
        print(f"  - {f}")
else:
    print("No significant deviations from author baseline.")

Requirements: Python 3.10+ with only standard library modules. For more robust analysis, consider NLTK for tokenization or spaCy for part-of-speech distributions, which provide stronger stylometric features than word-level statistics.

Step 5: Verify Factual Claims

AI-generated text frequently contains fabricated facts that sound plausible:

Verify at least 3 specific factual claims (dates, statistics, proper nouns) Check all cited sources — do they exist? Do they say what the text claims? Look for known AI contamination markers — phrases like "as of my last training data," "I don't have access to," or the use of nonsense technical phrases (like "vegetative electron microscopy") Check for temporal impossibilities — references to events after the text's claimed creation date, or knowledge the author couldn't have had

Step 6: Make a Responsible Decision

After gathering evidence from Steps 2–5, assess the totality:

Multiple converging signals? Manual indicators + tool results + stylometric mismatch together provide reasonable confidence. A single signal does not. Consider alternative explanations. Could the text be written by a careful non-native speaker? Produced through legitimate AI-assisted drafting and editing? Formal by genre convention? If accusing an individual: provide the opportunity to explain. Many legitimate workflows involve AI assistance. Ask about the writing process rather than making an accusation. Document your analysis. Record which indicators were present, which tools were used, and what scores they produced. This supports defensible decision-making.

What This Guide Does Not Cover

How AI text detection methods work and fail — see AI-Generated Text Detection Methods for perplexity analysis, watermarking, neural classifiers, and their structural limitations
AI-generated phishing detection — see How to Detect AI Phishing
Content provenance and watermarking — see Content Provenance & Watermarking
Organizational AI usage policies — see Model Governance Controls