url-extractor
A URL extractor scans text or HTML and pulls every URL it contains β http(s), ftp, mailto, tel, and custom schemes β producing a clean, deduplicated list ready for link-checking, archival, redirect testing, or competitive analysis. The ZTools URL Extractor runs entirely in the browser, recognises both bare URLs and HTML <a href="β¦">, deduplicates exact matches and normalised matches (trailing slash, scheme), and exports plain text or CSV with domain classification.
Use casesβ
Link audit on a long articleβ
Paste the full HTML or markdown of a published article; extractor lists every outgoing link. Run them through a link-checker tool to catch 404s before readers do.
Backlink list assemblyβ
Forum thread, comments page, or directory listing with many references to other sites. Extract for SEO research or partner outreach.
Archiving research bookmarksβ
Long email thread with embedded URLs. Extract once for a flat reading list rather than scrolling back through quoted replies.
Cleaning markdown / converting to plain linksβ
Convert mixed [text](url) and bare URLs into a single normalised list for migration to a different format.
How it worksβ
- Paste source β Plain text, HTML, markdown, JSON. Tool tokenises preserving URL boundaries.
- Match URL patterns β Regex matches: scheme://host[:port][/path][?query][#fragment]. Plus mailto:, tel:, ftp:, ws://, wss:// schemes.
- Normalise β Optional: collapse trailing slashes, downcase host, strip default ports (:80, :443). Normalisation increases dedup hits.
- Classify by domain β Each URL tagged with its registrable domain (example.com from sub.example.com/path). Useful for grouping.
- Export β Plain list, CSV with domain + path + query columns, or JSON for downstream processing.
Examplesβ
Input: "See https://a.com and visit b.com/page or read www.c.org/article"
Output: 3 URLs: https://a.com, b.com/page, www.c.org/article (with optional auto-prefix to https://).
Input: Markdown: "[click](https://x.com) or https://y.com"
Output: https://x.com, https://y.com.
Input: HTML: <a href="https://a.com">a</a> + plain "see https://a.com"
Output: https://a.com (deduplicated).
Frequently asked questionsβ
Does it find bare domains without scheme?
Optional. "Strict" mode requires explicit scheme; "loose" mode catches "example.com/path". Loose mode has more false positives (e.g. file paths matching domain pattern).
How are tracking parameters handled?
Optional UTM stripping to canonicalise URLs (drop utm_source, utm_medium, etc.). Useful for dedup and clean archives.
Is the input uploaded?
No β client-side only. Privacy by design.
Can I extract from a live URL?
No β extractor reads pasted text. To extract from a live page, save the page HTML and paste it.
Why do mailto: links show up?
They are URIs (mailto: is a URI scheme). Filter by scheme if you only want web URLs (http/https).
How do I sort by domain frequency?
CSV export includes domain column; sort in spreadsheet to find most-linked sites.
Tipsβ
- Run extracted lists through a link-checker before publishing β broken outbound links hurt SEO and reader trust.
- Strip UTM parameters before archival; they obscure the canonical resource.
- For competitive analysis, extract URLs from competitor blog posts and find common backlink targets.
- When extracting from forums, dedupe before opening β many threads quote the same URL repeatedly.
- Combine with domain-extractor for high-level domain frequency analysis.
Try it nowβ
The full url-extractor runs in your browser at https://ztools.zaions.com/url-extractor β no signup, no upload, no data leaves your device.
Last updated: 2026-05-05 Β· Author: Ahsan Mahmood Β· Edit this page on GitHub