Extract URLs from Text — Free Online Tool

A URL extractor finds and pulls all web addresses from any block of text. Paste your text below to extract, deduplicate, and copy all URLs instantly.

Input Text0 chars

Extracts http, https, ftp & www URLs · All processing in your browser · Ctrl/⌘ + L toggles theme

What Is URL Extraction?

URL extraction scans a body of text and identifies all strings that match URL patterns — primarily http:// and https:// prefixed strings, plus ftp:// and bare www. patterns. The extracted URLs are collected into a deduplicated list. Good extractors handle query strings, fragments, paths with special characters, and URLs embedded in HTML attributes.

You would use URL extraction for link auditing (finding all external links in a document), site migration (building a complete URL inventory), SEO analysis (mapping internal link structure), web scraping (extracting resource URLs from HTML), and content review (verifying all referenced links still work).

Code Examples for URL Extraction

JavaScript

function extractUrls(text) {
  const regex = /https?:\/\/[^\s<>"')\]]+/gi;
  const matches = text.match(regex) || [];
  // Remove trailing punctuation
  const cleaned = matches.map(url => url.replace(/[.,;:!?)}\]]+$/, ''));
  // Deduplicate
  return [...new Set(cleaned)];
}

const sample = 'See https://example.com. Also http://other.org/page?q=test';
console.log(extractUrls(sample));
// ['https://example.com', 'http://other.org/page?q=test']

// Extract href attributes from HTML
function extractHrefs(html) {
  const regex = /href=["']([^"']+)["']/gi;
  const urls = [];
  let match;
  while ((match = regex.exec(html)) !== null) {
    urls.push(match[1]);
  }
  return [...new Set(urls)];
}

Python

import re
from urllib.parse import urlparse

def extract_urls(text):
    pattern = r'https?://[^\s<>"\')\]]+'
    matches = re.findall(pattern, text)
    # Remove trailing punctuation
    cleaned = [re.sub(r'[.,;:!?)\]]+$', '', url) for url in matches]
    # Deduplicate preserving order
    seen = set()
    unique = []
    for url in cleaned:
        if url not in seen:
            seen.add(url)
            unique.append(url)
    return unique

text = "Visit https://example.com. API docs: https://api.example.com/v2?token=xyz"
urls = extract_urls(text)
for url in urls:
    parsed = urlparse(url)
    print(f'{parsed.netloc:30s} {url}')

Bash

# Extract URLs from a file
grep -oE 'https?://[^ <>]+' document.txt | sort -u

# Extract href values from HTML
grep -oP 'href="\K[^"]+' page.html | sort -u

# Check HTTP status of each extracted URL
while read url; do
  status=$(curl -o /dev/null -s -w "%{http_code}" "$url")
  echo "$status $url"
done < urls.txt

# Count unique domains
grep -oE 'https?://[^/]+' urls.txt | sort -u | wc -l

How to Use the URL Extractor

1. Paste your text. Drop any text containing URLs into the input box — emails, HTML source, documents, chat logs, or raw data. The character count updates live.

2. Review extracted URLs. All URLs found are listed in the output box immediately. The count shows how many were found.

3. Toggle deduplication. Enable the Deduplicate checkbox to remove duplicate URLs. The tool shows how many duplicates were removed.

4. Check domain breakdown. See which domains appear most frequently, sorted by count. Useful for analyzing link profiles or checking external references.

5. Copy as list. Click Copy as List to copy all extracted URLs to your clipboard, one per line. Paste them into a spreadsheet, document, or any other tool.

Why Extract URLs from Text?

SEO auditing: When reviewing a web page or blog post, extracting all URLs helps you identify every outbound link. You can check that links point to the right destinations, spot broken references, and analyze the link profile. Domain breakdown shows which sites you link to most frequently, which is valuable for SEO link equity analysis.

Content migration: Moving content between platforms often requires updating links. Extract all URLs from your existing content, check each one, and create a redirect map. This prevents broken links after migration and preserves the SEO value of your existing pages.

Research and bookmarking: When reading through long research documents, meeting notes, or email threads, extracting URLs saves you from manually scrolling through looking for links. Pull out every URL at once and organize them into your bookmark manager or reference list.

Data cleaning: Working with scraped data or log files often means dealing with text full of embedded URLs. Extract them for analysis, verification, or removal. The deduplication feature is especially useful when the same link appears dozens of times in a log file.

Security analysis: When reviewing phishing emails or suspicious documents, extracting all URLs lets you quickly identify potentially malicious links. The domain breakdown reveals unfamiliar or suspicious domains that deserve closer inspection before anyone clicks them.

Link validation: Technical writers, documentation teams, and web developers can extract all links from their content and verify each one still works. Combined with a link checker, this process catches 404 errors, moved pages, and expired resources before readers encounter them.

Frequently Asked Questions About Extract Urls

What types of URLs does this tool extract?

The tool extracts URLs starting with http://, https://, ftp://, and www. It recognizes full URLs with paths, query strings, and fragments. URLs starting with www. are automatically prefixed with https:// in the output.

How does the deduplicate toggle work?

When deduplication is enabled, the tool removes duplicate URLs from the results. Comparison is case-insensitive, so https://Example.com and https://example.com are treated as the same URL. The count of removed duplicates is shown in the output header.

What is the domain breakdown?

The domain breakdown shows every unique domain found in the extracted URLs along with how many URLs belong to each domain. Domains are sorted by count (most frequent first), making it easy to see which sites are linked most often in your text.

Can I extract URLs from HTML source code?

Yes. The tool scans the raw text for URL patterns, so it will find URLs inside href attributes, src attributes, or anywhere else they appear in HTML source code. It extracts the URL itself, not the surrounding HTML markup.

Does this tool handle shortened URLs?

Yes, shortened URLs like bit.ly/abc123 or t.co/xyz are extracted as long as they include a protocol prefix (http:// or https://). The tool does not expand shortened URLs — it returns them as found in the text.

Is my text sent to a server?

No. All processing runs entirely in your browser using JavaScript. Your text never leaves your device. Nothing is logged, stored, or transmitted to any server. The tool works offline once loaded.

How do I extract URLs from text?

Paste your text into the input area. The tool uses pattern matching to find all valid URLs starting with http://, https://, or ftp:// and outputs them as a clean, deduplicated list — one per line, ready to copy.

Can I extract only URLs from specific domains?

The tool extracts all URLs from the text. To filter for specific domains, copy the full list and use Find & Replace or a text editor to filter by domain name. The Regex Tester tool can help you build a domain-specific pattern for advanced filtering.

Related Free Online Tools

Extract URLs here, then clean, format, or analyze your content with our other free tools.

🏷️

More Free Text Tools

A list of extracted URLs often contains the same link repeated from multiple paragraphs. The duplicate line remover clears those, and the text sorter groups them by domain when sorted alphabetically.

🔄 Case Converter 📊 Word Counter 🧹 Text Cleaner 📝 Lorem Ipsum 🗑️ Duplicate Remover 🔐 String Encoder { } JSON Formatter 🔍 Text Diff 🔑 Password Generator ⚙️ Regex Tester #️⃣ Hash Generator 🔄 Toggle Case 🐍 Snake vs Kebab 📘 Underscore Guide 🔠 All Caps Guide ⚖️ camelCase vs snake_case 💻 Developer Tools 📊 JSON vs YAML vs XML 📊 Text Sorter ↔️ Text Reverser 🧽 SpongeBob Case 🔗 Slug Generator 📝 YAML Formatter 📄 XML Formatter 📋 CSV to JSON 🕐 Unix Timestamp 🆔 UUID Generator 🔢 Number Base Converter 📖 Markdown Preview 📊 Word Frequency 🤖 AI Writing Analyzer 📚 Readability Analyzer 📈 Text Statistics 📏 Line Counter 🔁 Text Repeater 📄 Plain Text 🔢 Add Line Numbers 🔎 Find & Replace ➕ Add Prefix/Suffix 𝗕 Bold Text 𝘽 Bold Italic 𝘐 Italic Text ˢ Superscript T̶ Strikethrough T̲ Underline Text 🙃 Upside Down ꜱᴄ Small Caps ₂ Subscript ✨ Fancy Text Ⓑ Bubble Text Ｗ Wide Text 🎨 Color Converter 🎲 Random Number 📡 Morse Code 🔓 ROT13 Cipher 0️⃣ Binary Text 🔣 Hex Text 🎖️ NATO Phonetic 🐷 Pig Latin 🗄️ SQL Formatter 📝 HTML↔Markdown ⏰ Cron Builder 🔑 JWT Decoder 🎨 CSS Formatter 🌐 HTML Formatter ⚡ JS Formatter ↩️ Remove Lines 🧹 Remove Empty Lines 📧 Extract Emails 🔗 Extract URLs 🏷️ Remove HTML 📋 Text to List 📝 List to Text 🏛️ Roman Numerals 📱 QR Code 😀 Emoji Picker ❝ Smart Quotes 🔍 Unicode Lookup