Skip to main content

url-extractor

A URL extractor scans text or HTML and pulls every URL it contains β€” http(s), ftp, mailto, tel, and custom schemes β€” producing a clean, deduplicated list ready for link-checking, archival, redirect testing, or competitive analysis. The ZTools URL Extractor runs entirely in the browser, recognises both bare URLs and HTML <a href="…">, deduplicates exact matches and normalised matches (trailing slash, scheme), and exports plain text or CSV with domain classification.

Use cases​

Link audit on a long article​

Paste the full HTML or markdown of a published article; extractor lists every outgoing link. Run them through a link-checker tool to catch 404s before readers do.

Forum thread, comments page, or directory listing with many references to other sites. Extract for SEO research or partner outreach.

Archiving research bookmarks​

Long email thread with embedded URLs. Extract once for a flat reading list rather than scrolling back through quoted replies.

Convert mixed [text](url) and bare URLs into a single normalised list for migration to a different format.

How it works​

  1. Paste source β€” Plain text, HTML, markdown, JSON. Tool tokenises preserving URL boundaries.
  2. Match URL patterns β€” Regex matches: scheme://host[:port][/path][?query][#fragment]. Plus mailto:, tel:, ftp:, ws://, wss:// schemes.
  3. Normalise β€” Optional: collapse trailing slashes, downcase host, strip default ports (:80, :443). Normalisation increases dedup hits.
  4. Classify by domain β€” Each URL tagged with its registrable domain (example.com from sub.example.com/path). Useful for grouping.
  5. Export β€” Plain list, CSV with domain + path + query columns, or JSON for downstream processing.

Examples​

Input: "See https://a.com and visit b.com/page or read www.c.org/article"

Output: 3 URLs: https://a.com, b.com/page, www.c.org/article (with optional auto-prefix to https://).


Input: Markdown: "[click](https://x.com) or https://y.com"

Output: https://x.com, https://y.com.


Input: HTML: <a href="https://a.com"&gt;a&lt;/a> + plain "see https://a.com"

Output: https://a.com (deduplicated).

Frequently asked questions​

Does it find bare domains without scheme?

Optional. "Strict" mode requires explicit scheme; "loose" mode catches "example.com/path". Loose mode has more false positives (e.g. file paths matching domain pattern).

How are tracking parameters handled?

Optional UTM stripping to canonicalise URLs (drop utm_source, utm_medium, etc.). Useful for dedup and clean archives.

Is the input uploaded?

No β€” client-side only. Privacy by design.

Can I extract from a live URL?

No β€” extractor reads pasted text. To extract from a live page, save the page HTML and paste it.

Why do mailto: links show up?

They are URIs (mailto: is a URI scheme). Filter by scheme if you only want web URLs (http/https).

How do I sort by domain frequency?

CSV export includes domain column; sort in spreadsheet to find most-linked sites.

Tips​

  • Run extracted lists through a link-checker before publishing β€” broken outbound links hurt SEO and reader trust.
  • Strip UTM parameters before archival; they obscure the canonical resource.
  • For competitive analysis, extract URLs from competitor blog posts and find common backlink targets.
  • When extracting from forums, dedupe before opening β€” many threads quote the same URL repeatedly.
  • Combine with domain-extractor for high-level domain frequency analysis.

Try it now​

The full url-extractor runs in your browser at https://ztools.zaions.com/url-extractor β€” no signup, no upload, no data leaves your device.

Open the tool β†—


Last updated: 2026-05-05 Β· Author: Ahsan Mahmood Β· Edit this page on GitHub