text-similarity-checker

Text similarity measures how close two pieces of text are. Different metrics capture different aspects: Levenshtein (edit distance — number of single-char changes to transform one into the other), Jaccard (word-set overlap — independent of order), Cosine (vector-space angle — accounts for term frequency). Useful for plagiarism / duplicate detection, near-duplicate finding in documents, and similarity-based search. The ZTools Text Similarity Checker runs all three metrics in the browser, displays the score (0-1) for each, and highlights the matching / different parts.

Use cases

Detect near-duplicate articles

Two scraped articles — are they the same content with minor edits? Cosine similarity > 0.9 suggests yes.

Compare draft revisions

How much did the revision change? Levenshtein gives a precise edit distance.

Plagiarism check at the paragraph level

Paragraph A vs paragraph B — Jaccard score reveals significant word overlap.

Find similar customer-support tickets

New ticket has cosine similarity > 0.7 with an old resolved one — surface the resolution.

How it works

Paste text A and text B — Two text areas. Any size up to several MB.
Compute metrics — Levenshtein (edit distance), normalized to 0-1; Jaccard (intersection / union of word sets); Cosine (TF vectors of term frequencies).
Display — Each metric with score + interpretation. Highlighted diff showing matching tokens.

Examples

Input: "hello world" vs "hello world!"

Output: Levenshtein 0.92 (1 edit). Jaccard 1.0 (same word set). Cosine 0.99.

Input: Two paraphrased paragraphs

Output: Levenshtein low (~0.4 — many edits). Jaccard moderate (~0.6 — some shared words). Cosine moderate (~0.7 — similar topic).

Input: Identical text

Output: All metrics = 1.0.

Frequently asked questions

Which metric for plagiarism?

Cosine — captures topic + word frequency. Levenshtein flags too many false negatives on paraphrased content.

Will it detect AI-generated text?

No — these are similarity metrics between two texts, not AI-detection. Use a dedicated AI detector for that (and they're unreliable).

Maximum text size?

Levenshtein is O(n*m) — slows down for million-char texts. Jaccard / cosine are O(n+m) — handle large inputs.

Privacy?

All client-side.

Tips

For short text, Levenshtein is most informative.
For long documents, cosine + Jaccard scale better; ignore Levenshtein.
For semantic similarity (paraphrases), embedding-based tools (sentence-transformers) beat simple word-overlap metrics.
For plagiarism detection at scale, dedicated services (Turnitin, Grammarly) compare against vast corpora — this tool is for pairwise comparison only.

Try it now

The full text-similarity-checker runs in your browser at https://ztools.zaions.com/text-similarity-checker — no signup, no upload, no data leaves your device.

Open the tool ↗

Last updated: 2026-05-06 · Author: Ahsan Mahmood · Edit this page on GitHub

Use cases​

Detect near-duplicate articles​

Compare draft revisions​

Plagiarism check at the paragraph level​

Find similar customer-support tickets​

How it works​

Examples​

Frequently asked questions​

Tips​

Try it now​