Regular Expressions Syntax Guide and Cheatsheet

Regular Expressions Syntax Guide and Cheatsheet

In software engineering, data scrubbing, and web form design, a common challenge is: "Does this input string match the required format?" Validating whether an email address contains an @ sign, verifying a phone number's length, or checking if a password contains a mix of numbers and symbols using basic if-else conditionals requires hundreds of lines of fragile code.

A Regular Expression (Regex) resolves this by representing complex search patterns in a single, compact line. While its syntax can appear cryptic at first, mastering Regex is one of the most valuable skills for boosting your productivity. This guide details how Regex engines process text, defines core metacharacters, and provides a cheatsheet of common validation patterns.


1. Core Regex Syntax and Metacharacters Cheatsheet

Regex patterns consist of normal characters and special symbols called metacharacters, which define the matching logic:

Anchors

Specify the position of the match within the text:

  • ^: Matches the start of a line (e.g., ^Hello requires the string to begin with "Hello").
  • $: Matches the end of a line (e.g., world$ requires the string to terminate with "world").

Character Classes

Identify the types of characters allowed at a position:

  • .: Matches any single character except newline.
  • \d: Matches any digit (equivalent to [0-9]).
  • \w: Matches any word character (letters, numbers, and underscores).
  • \s: Matches any whitespace (spaces, tabs, newlines).
  • [a-zA-Z]: Matches any uppercase or lowercase letter.

Quantifiers

Specify how many times the preceding character or group should repeat:

  • *: Matches 0 or more times.
  • +: Matches 1 or more times.
  • ?: Matches 0 or 1 time (makes the element optional).
  • {n}: Matches exactly n times.
  • {n,m}: Matches between n and m times.

This cheatsheet summarizes key Regex metacharacters for daily reference:

Metacharacter Pattern Matching Logic Example
Anchor ^ and $ Define start and end boundaries of a line ^test$ (matches only exact "test")
Digit Class \d Matches any single numerical digit \d{3} (matches "123", "999")
Word Class \w Matches any letter, digit, or underscore \w+ (matches a single word)
Quantifier + Matches one or more repetitions of the pattern \d+ (matches integer numbers)
Optional ? Matches the preceding element zero or one time colou?r (matches "color" and "colour")
Group () Combines multiple characters to apply logic (abc)+ (matches "abc", "abcabc")
Alternation | Logical OR; matches either pattern cat|dog (matches "cat" or "dog")

2. Common Validation Patterns

Here are three heavily used Regex patterns analyzed for daily development:

① Email Validation

/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
  • ^[a-zA-Z0-9._%+-]+: Matches the username. Allows letters, numbers, dots, underscores, percents, pluses, and hyphens.
  • @: Requires a literal at sign.
  • [a-zA-Z0-9.-]+: Matches the domain host, permitting letters, numbers, dots, and hyphens.
  • \.[a-zA-Z]{2,}$: Requires a dot followed by a top-level domain (like com or org) of at least two letters at the end.

② Phone Number (Standard Format)

/^\d{3}-\d{3,4}-\d{4}$/
  • Matches numbers separated by hyphens, requiring a 3-digit area code, a 3 or 4-digit middle section, and a 4-digit end.

③ IPv4 Address

/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/
  • Ensures each octet is a valid number between 0 and 255 by grouping ranges (250-255, 200-249, and 0-199) and checking for three dot-separated repetitions.

3. Mastering Regex Flags

Flags modify the search behavior of the pattern:

  • g (Global Match): Finds all matches in the text rather than stopping after the first match.
  • i (Ignore Case): Disables case sensitivity (e.g., /abc/i matches "ABC" and "aBc").
  • m (Multiline): Causes the ^ and $ anchors to match the start and end of individual lines rather than the entire string.

4. Frequently Asked Questions (FAQ)

Q1. How do I match literal metacharacters like ? or *? A1. You must escape them with a **backslash (\)**. For example, to match a literal question mark, write \?.

Q2. What is the difference between greedy and lazy matching? A2. Quantifiers (*, +) are "greedy" by default; they match as many characters as possible. In <div>hello</div>, the pattern /<.+>/ matches the whole string. Appending a ? makes it "lazy" (e.g., /<.+?>/), matching the smallest possible block (<div>).

Q3. Can complex Regex patterns cause performance problems? A3. Yes. Poorly designed patterns with nested quantifiers (like (a+)+) can trigger exponential backtracks when evaluated against non-matching strings. This can lead to CPU exhaustion, a vulnerability known as ReDoS (Regular Expression Denial of Service).


5. Test and Debug Your Patterns Locally

Writing Regex without feedback is error-prone.

Use our free Regex Tester to test your patterns in real time. It highlights matches, extracts capture groups, and runs entirely in your browser to protect your data privacy. If you are cleaning complex API data structures, pair it with our JSON Formatter or utilize our Diff Checker to track text differences.

Recommended Reading

Recommended Articles

Back to List