Mastering Regular Expressions

Powerful Techniques for Perl and Other Tools

Jeffrey E. F. Friedl

Publisher: O'Reilly, 1997, 342 pages

ISBN: 1-56592-257-3

Keywords: Programming, System Administration

Last modified: April 8, 2021, 12:19 p.m.

Regular expressions are a powerful tool for manipulating text and data. If you don't use them yet, you will discover in this book a whole new world of mastery over your data. With its unprecedented detail and breadth of coverage, this book will be an eye-opener even for the grizzled expert. If you already use them, you'll appreciate this book's unprecedented detail and breadth of coverage.

With regular expressions, you can save yourself time and aggravation while dealing with documents, mail messages, log files — you name it — any type of text or data. For example, regular expressions can play a vital role in constructing a World Wide Web CGI script, which can involve text and data of all sorts.

Regular expressions are not a tool in and of themselves, but are included as part of a larger utility. The classic example is grep. These days, regular expressions can be found everywhere, such as in:

  • Scripting languages (including Perl, Tcl, awk, and Python)
  • Editors (including Emacs, vi, and Nisus Writer)
  • Programming environments (including Delphi and Visual C++)
  • Specialized tools (including lex, Expect, and sed)

There can be certain subtle, but valuable, ways to think when you're using regular expressions. In this book,  Jeffrey Friedl leads you through the steps of knowing exactly how to craft a regular expression to get the job done.

Regular expressions are not used in a vacuum. In this book, a variety of tools are examined and used in an extensive array of examples Perl, in particular, very well represented throughout the book, with a major chapter dedicated entirely to it alone. Perl is extremely well endowed with rich and expressive regular expressions. Yet what is power in the hands of an expert can be fraught with peril for the unwary. This book will help you navigate the minefield to becoming an expert.

  1. Introduction to Regular Expressions
    • Solving Real Problems
    • Regular Expressions as a Language
      • The Filename Analogy
      • The Language Analogy
    • The Regular-Expression Frame of Mind
      • Searching Text Files: Egrep
    • Egrep Metacharacters
      • Start and End of the Line
      • Character Classes
      • Matching Any Character — Dot
      • Alternation
      • Word Boundaries
      • In a Nutshell
      • Optional Items
      • Other Quantifiers: Repetition
      • Ignoring Differences in Capitalization
      • Parentheses and Backreferences
      • The Great Escape
    • Expanding the Foundation
      • Linguistic Diversification
      • The Goal of a Regular Expression
      • A Few More Examples
      • Regular Expression Nomenclature
      • Improving on the Status Quo
      • Summary
    • Personal Glimpses
  2. Extended Introductory Examples
    • About the Examples
      • A Short Introduction to Perl
    • Matching Text with Regular Expressions
      • Toward a More Real-World Example
      • Side Effects of a Successful Match
      • Intertwined Regular Expressions
      • Intermission
    • Modifying Text with Regular Expressions
      • Automated Editing
      • A Small Mail Utility
      • Text-to-HTML Conversion
      • That Doubled-Word Thing
  3. Overview of Regular Expression Features and Flavors
    • A Casual Stroll Across the Regex Landscape
      • The World According to Grep
      • The Times They Are a Changin'
    • At a Glance
      • POSIX
    • Care and Handling of Regular Expressions
      • Identifying a Regex
      • Doing Something with the Matched Text
      • Other Examples
      • Care and Handling: Summary
    • Engines and Chrome Finish
      • Chrome and Appearances
      • Engines and Drivers
    • Common Metacharacters
      • Character Shorthand
      • Strings as Regular Expressions
      • Class Shorthands, Dot and Character Classes
      • Anchoring
      • Grouping and Retrieving
      • Quantifiers
      • Alternation
    • Guide to the Advanced Chapters
      • Tool-specific Information
  4. The Mechanics of Expression Processing
    • Start Your Engines!
      • Two Kinds of Engines
      • New Standards
      • Regex Engine Types
      • From the Department of Redundancy Department
    • Match Basics
      • About the Examples
      • Rule 1: The Match That Begins Earliest Wins
      • The "Transmission" and the Bump-Along
      • Engine Pieces and Parts
      • Rule 2: The Standard Quantifiers Are Greedy
    • Regex-Directed Versus Text-Directed
      • NFA Engine: Regex-Directed
      • DFA Engine: Text-Directed
      • The Mysteries of Life Revealed
    • Backtracking
      • A Really Crummy Analogy
      • Two Important Points on Backtracking
      • Saved States
      • Backtracking and Greediness
    • More About Greediness and Backtracking
      • Problems of Greediness
      • Multi-Character “Quotes”
      • Using Lazy Quantifiers
      • Greediness and Laziness Always Favor a Match
      • The Essence of Greediness, Laziness, and Backtracking
      • Possessive Quantifiers and Atomic Grouping
      • Possessive Quantifiers, ?+, *+, ++, and {m,n}+
      • The Backtracking of Lookaround
      • Is Alternation Greedy?
      • Taking Advantage of Ordered Alternation
    • NFA, DFA, and POSIX
      • "The Longest-Leftmost"
      • POSIX and the Longest-Leftmost Rule
      • Speed and Efficiency
      • Summary: NFA and DFA in Comparison
    • Summary
  5. Crafting an Efficient Expression
    • A Sobering Example
      • A Simple Change — Placing Your Best Foot Forward
      • Advancing Further — Localizing the Greediness
      • Reality Check
    • A Global View of Backtracking
      • More Work for a POSIX NFA
      • Work Required During a Non-Match
      • Being More Specific
      • Alternation Can Be Expensive
      • A Strong Lead
      • The Impact of Parentheses
    • Internal Optimization
      • First-Character Discrimination
      • Fixed-String Check
      • Simple Repetition
      • Needless Small Quantifiers
      • Length Cognizance
      • Match Cognizance
      • Need Cognizance
      • String/Line Anchors
      • Compile Caching
    • Testing the Engine Type
      • Basic NFS vs. DFA Testing
      • Traditional NFA vs. POSIX NFA Testing
    • Unrolling the Loop
      • Method 1: Building a Regex From Past Experiences
      • The Real "Unrolling-the-Loop" Pattern
      • Method 2: A Top-Down View
      • Method 3: A Quoted Internet Hostname
      • Observations
    • Unrolling C Comments
      • Regex Headaches
      • A Naïve View
      • Unrolling the C Loop
    • The Freeflowing Regex
      • A Helping Hand to Guide the Match
      • A Well-Guided Regex is a Fast Regex
      • Wrapup
    • Think!
      • The Many Twists and Turns of Optimizations
  6. Tool-Specific Information
    • Questions You Should Be Asking
      • Something as Simple as Grep
      • In This Chapter
    • Awk
      • Differences Among Awk Regex Flavors
      • Awk Regex Functions and Operators
    • Tcl
      • Tcl Regex Operands
      • Using Tcl Regular Expressions
      • Tcl Regex Optimizations
    • GNU Emacs
      • Emacs Strings as Regular Expressions
      • Emacs' Regex Flavor
      • Emacs Match Results
      • Benchmarking in Emacs
      • Emacs Regex Optimizations
  7. Perl Regular Expressions
    • The Perl Way
      • Regular Expressions as a Language Component
      • Perl's Greatest Strength
      • Perl's Greatest Weakness
      • An Introductory Example: Parsing CSV Text
      • Regular Expressions and the Perl Way
      • Perl Unleashed
    • Regex-Related Perlisms
      • Expression Context
      • Dynamic Scope and Regex Match Effects
      • Special Variables Modified by a Match
      • "Doublequotish Processing" and Variable Interpolation
    • Perl's Regex Flavor
      • Quantifiers — Greedy and Lazy
      • Grouping
      • String Anchors
      • Multi-Match Anchors
      • Word Anchors
      • Convenient Shorthands and Other Notations
      • Character Classes
      • Modification with \Q and Friends: True Lies
    • The Match Operator
      • Match-Operands Delimiters
      • Match Modifiers
      • Specifying the Match Target Operand
      • Other Side Effects of the Match Operator
      • Match Operator Return Value
      • Outside Influences on the Match Operator
    • The Substitution Operator
      • The Replacement Operand
      • The /e Modifier
      • Context and Return Value
      • Using /g with a Regex That Can Match Nothingness
    • The Split operator
      • Basic Split
      • Advanced Split
      • Advanced Split's Match Operand
      • Scalar-Context Split
      • Split's Match Operand with Capturing Parentheses
    • Perl Efficiency Issues
      • "There's More Than One Way to Do It"
      • Regex Compilation, the /o Modifier and Efficiency
      • Unsociable $& and Friends
      • The Efficiency Penalty of the /i Modifier
      • Substitution Concerns
      • Benchmarking
      • Regex Debugging Information
      • The Study Function
    • Putting It All Together
      • Stripping Leading and Trailing Whitespace
      • Adding Commas to a Number
      • Removing C Comments
      • Matching an Email Address
    • Final Comments
      • Notes for Perl4


    Mastering Regular Expressions

    Reviewed by Roland Buresund

    Good ******* (7 out of 10)

    Last modified: May 21, 2007, 3:12 a.m.

    I wish I had had this book a bit earlier, like 20 years ago…

    Mandatory reading if you're into computers.


    There are currently no comments

    New Comment


    required (not published)