Extract URLs from Text β Free Online Tool
A URL extractor finds and pulls all web addresses from any block of text. Paste your text below to extract, deduplicate, and copy all URLs instantly.
What Is URL Extraction?
URL extraction scans a body of text and identifies all strings that match URL patterns β primarily http:// and https:// prefixed strings, plus ftp:// and bare www. patterns. The extracted URLs are collected into a deduplicated list. Good extractors handle query strings, fragments, paths with special characters, and URLs embedded in HTML attributes.
You would use URL extraction for link auditing (finding all external links in a document), site migration (building a complete URL inventory), SEO analysis (mapping internal link structure), web scraping (extracting resource URLs from HTML), and content review (verifying all referenced links still work).
Code Examples for URL Extraction
JavaScript
function extractUrls(text) {
const regex = /https?:\/\/[^\s<>"')\]]+/gi;
const matches = text.match(regex) || [];
// Remove trailing punctuation
const cleaned = matches.map(url => url.replace(/[.,;:!?)}\]]+$/, ''));
// Deduplicate
return [...new Set(cleaned)];
}
const sample = 'See https://example.com. Also http://other.org/page?q=test';
console.log(extractUrls(sample));
// ['https://example.com', 'http://other.org/page?q=test']
// Extract href attributes from HTML
function extractHrefs(html) {
const regex = /href=["']([^"']+)["']/gi;
const urls = [];
let match;
while ((match = regex.exec(html)) !== null) {
urls.push(match[1]);
}
return [...new Set(urls)];
}Python
import re
from urllib.parse import urlparse
def extract_urls(text):
pattern = r'https?://[^\s<>"\')\]]+'
matches = re.findall(pattern, text)
# Remove trailing punctuation
cleaned = [re.sub(r'[.,;:!?)\]]+$', '', url) for url in matches]
# Deduplicate preserving order
seen = set()
unique = []
for url in cleaned:
if url not in seen:
seen.add(url)
unique.append(url)
return unique
text = "Visit https://example.com. API docs: https://api.example.com/v2?token=xyz"
urls = extract_urls(text)
for url in urls:
parsed = urlparse(url)
print(f'{parsed.netloc:30s} {url}')Bash
# Extract URLs from a file
grep -oE 'https?://[^ <>]+' document.txt | sort -u
# Extract href values from HTML
grep -oP 'href="\K[^"]+' page.html | sort -u
# Check HTTP status of each extracted URL
while read url; do
status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
echo "$status $url"
done < urls.txt
# Count unique domains
grep -oE 'https?://[^/]+' urls.txt | sort -u | wc -lHow to Use the URL Extractor
1. Paste your text. Drop any text containing URLs into the input box β emails, HTML source, documents, chat logs, or raw data. The character count updates live.
2. Review extracted URLs. All URLs found are listed in the output box immediately. The count shows how many were found.
3. Toggle deduplication. Enable the Deduplicate checkbox to remove duplicate URLs. The tool shows how many duplicates were removed.
4. Check domain breakdown. See which domains appear most frequently, sorted by count. Useful for analyzing link profiles or checking external references.
5. Copy as list. Click Copy as List to copy all extracted URLs to your clipboard, one per line. Paste them into a spreadsheet, document, or any other tool.
Why Extract URLs from Text?
SEO auditing: When reviewing a web page or blog post, extracting all URLs helps you identify every outbound link. You can check that links point to the right destinations, spot broken references, and analyze the link profile. Domain breakdown shows which sites you link to most frequently, which is valuable for SEO link equity analysis.
Content migration: Moving content between platforms often requires updating links. Extract all URLs from your existing content, check each one, and create a redirect map. This prevents broken links after migration and preserves the SEO value of your existing pages.
Research and bookmarking: When reading through long research documents, meeting notes, or email threads, extracting URLs saves you from manually scrolling through looking for links. Pull out every URL at once and organize them into your bookmark manager or reference list.
Data cleaning: Working with scraped data or log files often means dealing with text full of embedded URLs. Extract them for analysis, verification, or removal. The deduplication feature is especially useful when the same link appears dozens of times in a log file.
Security analysis: When reviewing phishing emails or suspicious documents, extracting all URLs lets you quickly identify potentially malicious links. The domain breakdown reveals unfamiliar or suspicious domains that deserve closer inspection before anyone clicks them.
Link validation: Technical writers, documentation teams, and web developers can extract all links from their content and verify each one still works. Combined with a link checker, this process catches 404 errors, moved pages, and expired resources before readers encounter them.
Frequently Asked Questions About Extract Urls
What types of URLs does this tool extract?
The tool extracts URLs starting with http://, https://, ftp://, and www. It recognizes full URLs with paths, query strings, and fragments. URLs starting with www. are automatically prefixed with https:// in the output.
How does the deduplicate toggle work?
When deduplication is enabled, the tool removes duplicate URLs from the results. Comparison is case-insensitive, so https://Example.com and https://example.com are treated as the same URL. The count of removed duplicates is shown in the output header.
What is the domain breakdown?
The domain breakdown shows every unique domain found in the extracted URLs along with how many URLs belong to each domain. Domains are sorted by count (most frequent first), making it easy to see which sites are linked most often in your text.
Can I extract URLs from HTML source code?
Yes. The tool scans the raw text for URL patterns, so it will find URLs inside href attributes, src attributes, or anywhere else they appear in HTML source code. It extracts the URL itself, not the surrounding HTML markup.
Does this tool handle shortened URLs?
Yes, shortened URLs like bit.ly/abc123 or t.co/xyz are extracted as long as they include a protocol prefix (http:// or https://). The tool does not expand shortened URLs β it returns them as found in the text.
Is my text sent to a server?
No. All processing runs entirely in your browser using JavaScript. Your text never leaves your device. Nothing is logged, stored, or transmitted to any server. The tool works offline once loaded.
How do I extract URLs from text?
Paste your text into the input area. The tool uses pattern matching to find all valid URLs starting with http://, https://, or ftp:// and outputs them as a clean, deduplicated list β one per line, ready to copy.
Can I extract only URLs from specific domains?
The tool extracts all URLs from the text. To filter for specific domains, copy the full list and use Find & Replace or a text editor to filter by domain name. The Regex Tester tool can help you build a domain-specific pattern for advanced filtering.
Related Free Online Tools
Extract URLs here, then clean, format, or analyze your content with our other free tools.
Remove HTML
Strip HTML and XML tags from text and decode entities.
Find & Replace
Find and replace text with regex support and live highlighting.
Duplicate Remover
Remove duplicate lines from any list with one click.
Text Cleaner
Remove extra spaces, line breaks, and hidden characters from messy text.
More Free Text Tools
A list of extracted URLs often contains the same link repeated from multiple paragraphs. The duplicate line remover clears those, and the text sorter groups them by domain when sorted alphabetically.