Regex for Text Manipulation — Essential Patterns Every Developer Should Know

Q: What is the regex to extract all email addresses from text?

Use [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} with the global flag. This matches standard email formats including subdomains and plus-addressing. In JavaScript, use text.match(/pattern/g) to get an array of all matches.

Q: How do I convert camelCase to snake_case with regex?

Match uppercase letters with ([A-Z]) and replace with _$1, then lowercase the result. In JavaScript: str.replace(/([A-Z])/g, '_$1').toLowerCase(). This inserts an underscore before each capital letter and converts everything to lowercase.

Q: Can regex handle Unicode text?

Yes, but you need the Unicode flag. In JavaScript, use the u flag and Unicode property escapes like \p{Letter} to match letters from any script. Without the u flag, patterns like \w only match ASCII word characters. Python's re module handles Unicode by default.

Regular expressions are the Swiss army knife of text manipulation. A single pattern can extract every email address from a document, strip HTML tags from a page, or convert variable names between naming conventions. These are tasks that would take dozens of lines of procedural code otherwise.

This guide covers the regex patterns that handle the vast majority of real-world text manipulation work. Every pattern here is tested and ready to drop into your JavaScript, Python, or command-line workflow. If you want to try them interactively, the FlipMyCase Regex Tester lets you test patterns against your own text with real-time match highlighting.

Essential Regex Patterns for Text Work

These are the patterns you will reach for again and again.

Email Extraction

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

This matches the structure of standard email addresses: local part, @ symbol, domain, and top-level domain of at least two characters. Apply it with the global flag to pull every email from a block of text. It handles addresses like jane.doe+work@company.co.uk but intentionally skips malformed ones like user@.com.

URL Extraction

https?://[^\s<>"']+

Matches both http:// and https:// URLs, consuming everything up to the first whitespace or quote character. For stricter matching that captures only well-formed URLs with paths and query strings:

https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(/[^\s]*)?

Phone Numbers (US Format)

\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

Handles (555) 123-4567, 555-123-4567, 555.123.4567, and 555 123 4567. The optional parentheses and flexible separators cover the formats people actually use.

Whitespace Cleanup

Replace multiple spaces with a single space:

Pattern: \s+
Replace: (single space)

This collapses runs of spaces, tabs, and newlines into one space. To preserve line breaks but clean up horizontal whitespace, use [^\S\n]+ instead.

Duplicate Word Removal

\b(\w+)\s+\1\b

Catches repeated words like "the the" or "is is." The \1 backreference matches whatever the first capture group found. Replace with $1 to keep just one instance.

Case Conversion with Regex

Regex capture groups combined with replacement functions handle naming convention conversions that would otherwise require dedicated parsers.

snake_case to camelCase

const snakeStr = 'user_first_name';
const camelStr = snakeStr.replace(/_([a-z])/g, (_, c) => c.toUpperCase());
// Result: userFirstName

The pattern _([a-z]) captures the letter after each underscore. The replacement function uppercases it.

camelCase to snake_case

const camelStr = 'userFirstName';
const snakeStr = camelStr.replace(/([A-Z])/g, '_$1').toLowerCase();
// Result: user_first_name

Insert an underscore before every uppercase letter, then lowercase the entire string.

kebab-case to camelCase

const kebabStr = 'background-color-value';
const camelStr = kebabStr.replace(/-([a-z])/g, (_, c) => c.toUpperCase());
// Result: backgroundColorValue

Title Case with Regex

const title = 'the quick brown fox';
const titleCase = title.replace(/\b\w/g, c => c.toUpperCase());
// Result: The Quick Brown Fox

The \b\w pattern matches the first letter of every word. For proper title case that keeps articles and prepositions lowercase, FlipMyCase handles those rules automatically with its Title Case converter.

How FlipMyCase Helps

Writing regex patterns in isolation is error-prone. You need to see what matches, inspect capture groups, and test replacements before running them on real data.

The FlipMyCase Regex Tester provides real-time match highlighting as you type your pattern, capture group inspection with named group support, and a replace mode for testing find-and-replace operations with $1, $2 group references. It includes presets for common patterns like email, URL, date, and phone number so you do not have to write them from scratch.

For bulk text operations, the Find and Replace tool lets you apply regex patterns to transform entire documents. Toggle between literal and regex mode, preview every match before replacing, and copy the result.

Text Cleanup Patterns

These patterns handle the messy text that shows up in real projects.

Remove HTML Tags

<[^>]*>

Strips all HTML tags, leaving only the text content. Apply globally to clean pasted content from web pages. For nested or malformed HTML, this covers the common case. Use a proper HTML parser for security-critical stripping (preventing XSS).

Strip Leading and Trailing Whitespace Per Line

Pattern: ^\s+|\s+$
Flags: gm (global, multiline)

The multiline flag makes ^ and $ match line boundaries, so this trims every line independently.

Normalize Line Endings

Pattern: \r\n|\r
Replace: \n

Converts Windows (\r\n) and old Mac (\r) line endings to Unix (\n). Essential when processing files from mixed operating systems.

Remove Non-Printable Characters

Pattern: [^\x20-\x7E\n\r\t]
Replace: (empty)

Strips everything outside the printable ASCII range while preserving newlines and tabs. Useful for cleaning text pasted from PDFs or word processors that embed invisible formatting characters.

Extract Data from Structured Text

Pull key-value pairs from configuration-style text:

(\w+)\s*[:=]\s*(.+)

Matches lines like name: John Smith or timeout=30, capturing the key in group 1 and the value in group 2.

Real-World Examples

Messy CSV Cleanup

Before:

John  ,  Doe  ,  john@example.com  ,  555-1234
Jane  ,Doe,   jane@example.com,555-5678

Pattern: \s*,\s* Replace: ,

After:

John,Doe,john@example.com,555-1234
Jane,Doe,jane@example.com,555-5678

Log File Parsing

Input:

[2026-03-18 14:22:01] ERROR: Connection refused on port 5432
[2026-03-18 14:22:03] WARN: Retry attempt 1 of 3

Pattern: \[(\d{4}-\d{2}-\d{2})\s+(\d{2}:\d{2}:\d{2})\]\s+(\w+):\s+(.+)

Captures:

Group 1: 2026-03-18 (date)
Group 2: 14:22:01 (time)
Group 3: ERROR (level)
Group 4: Connection refused on port 5432 (message)

Reformatting Dates

Before: 03/18/2026

Pattern: (\d{2})/(\d{2})/(\d{4}) Replace: $3-$1-$2

After: 2026-03-18

Extracting Data from Emails

Input:

Order #12345 confirmed. Total: $89.99. Shipping to: 90210.

Patterns:

Order number: Order #(\d+) captures 12345
Total: \$(\d+\.\d{2}) captures 89.99
ZIP code: \b(\d{5})\b captures 90210

Three patterns, three data points, zero manual parsing.

Frequently Asked Questions

What is the regex to extract all email addresses from text? Use [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} with the global flag. This matches standard email formats including subdomains and plus-addressing. In JavaScript, use text.match(/pattern/g) to get an array of all matches. For a no-code approach, the FlipMyCase Extract Emails tool does this automatically.

How do I convert camelCase to snake_case with regex? Match uppercase letters with ([A-Z]) and replace with _$1, then lowercase the result. In JavaScript: str.replace(/([A-Z])/g, '_$1').toLowerCase(). This inserts an underscore before each capital letter and converts everything to lowercase. It handles multi-word identifiers like getUserFirstName correctly.

Can regex handle Unicode text? Yes, but you need the Unicode flag. In JavaScript, use the u flag and Unicode property escapes like \p{Letter} to match letters from any script. Without the u flag, patterns like \w only match ASCII word characters. Python's re module handles Unicode by default. Always test Unicode patterns against actual multilingual text before deploying.

How do I test regex patterns before using them in code? Use an interactive regex tester with real-time highlighting. The FlipMyCase Regex Tester shows matches as you type, displays capture groups, and lets you test replacements. Paste your actual data into the test area, not just a simple example. Edge cases in real data are where patterns break.

Conclusion

These patterns cover the text manipulation tasks that come up constantly in development work: extracting structured data, cleaning messy input, converting naming conventions, and reformatting text. The key is testing patterns against real data before deploying them.

Try your patterns in the FlipMyCase Regex Tester to see matches in real time, inspect capture groups, and test replacements. For case conversions that do not require writing regex at all, the FlipMyCase text converters handle UPPERCASE, lowercase, Title Case, camelCase, snake_case, and kebab-case with a single click.