Regex and Text Processing Tools for Developers

Terminal 2026-02-09 · 6 min read regex text-processing jq sed awk data
By DevTools Guide Editorial Team — Software engineers and developer advocates covering tools, workflows, and productivity for modern development teams.

Regex and Text Processing Tools for Developers

Photo by Ferenc Almasi on Unsplash

Text processing is the unglamorous core of developer work. Parsing logs, reshaping JSON, extracting fields from CSV, cleaning up data before import -- you do it constantly and the right tool turns a 30-minute script into a 10-second one-liner.

This guide covers the tools that matter, the patterns you'll reach for most often, and clear recommendations for which tool to use when.

Regex Testing: regex101

regex101.com is the best regex testing tool. It supports PCRE2, Python, JavaScript, Go, Java, and .NET flavors. The features that make it indispensable:

Real-time match highlighting as you type the pattern
Explanation panel that breaks down your regex into plain English
Substitution testing so you can verify replacements before running them
Saved patterns with shareable permalinks (great for code reviews)

When you're building anything beyond a simple pattern, open regex101 first. Write and verify the regex there, then paste it into your code. RegExr (regexr.com) is a decent alternative with a community pattern library, and Debuggex generates railroad diagrams for visualizing complex patterns. But regex101 handles every flavor and the explanation panel alone makes it the default.

jq: JSON Processing on the Command Line

jq is the single most useful text processing tool for modern development. APIs return JSON, config files are JSON, logs are often JSON.

# Pretty-print JSON
curl -s https://api.example.com/data | jq .

# Extract fields (nested or not)
jq '.name' data.json
jq '.config.database.host' settings.json

# Iterate over arrays and extract fields
jq '.users[] | .name' data.json

# Filter array elements
jq '.events[] | select(.type == "error")' logs.json

# Build new objects from existing data
jq '.users[] | {name: .name, email: .contact.email}' data.json

# Count, sort, unique
jq '.results | length' response.json
jq '[.logs[].level] | unique' app.json
jq '.items | sort_by(.date) | reverse' data.json

Real-World jq Recipes

# Parse AWS CLI output -- get running instance IDs
aws ec2 describe-instances | jq -r \
  '.Reservations[].Instances[] | select(.State.Name == "running") | .InstanceId'

# Convert JSON array to CSV
jq -r '.users[] | [.name, .email, .role] | @csv' users.json

# Merge two JSON files
jq -s '.[0] * .[1]' base.json overrides.json

# Group and count by field
jq 'group_by(.status) | map({status: .[0].status, count: length})' orders.json

The -r flag (raw output) strips quotes from strings -- essential when piping jq output to other commands.

yq: jq but for YAML

yq applies jq-like syntax to YAML. Essential if you work with Kubernetes manifests, GitHub Actions workflows, or Docker Compose files.

There are two tools called yq -- Mike Farah's Go version and a Python wrapper. The Go version is the one you want. It's faster, standalone, and more actively maintained.

brew install yq                          # Mike Farah's Go version

yq '.metadata.name' deployment.yaml       # Read a field
yq -i '.spec.replicas = 3' deployment.yaml  # Update in-place
yq -o=json eval '.' config.yaml          # Convert YAML to JSON
yq eval-all 'select(fileIndex == 0) * select(fileIndex == 1)' base.yaml overlay.yaml  # Merge

Want more terminal guides? Get guides like this in your inbox — DevTools Guide delivers one free deep-dive every week.

sed: Practical Patterns Only

sed has a reputation for being cryptic, but used for what it's good at -- find-and-replace across files -- it's straightforward.

# Replace all occurrences (in-place)
sed -i 's/old/new/g' file.txt
sed -i '' 's/old/new/g' file.txt    # macOS (requires empty backup suffix)

# Delete lines matching a pattern
sed '/^#/d' config.txt               # Remove comment lines
sed '/^$/d' file.txt                 # Remove blank lines

# Replace on specific lines
sed '10,20s/old/new/g' file.txt     # Lines 10-20 only

# Multiple operations
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Replace across all TypeScript files (fd + sed)
fd -e ts --exec sed -i 's/oldFunction/newFunction/g'

When to skip sed: If your replacement involves complex logic, conditionals, or multi-line patterns, switch to awk or a real script. Fighting sed to do things it wasn't designed for wastes time.

awk: One-Liners That Are Actually Useful

awk processes text line by line, splitting each line into fields ($1, $2, etc.) separated by whitespace. You only need 5% of the language.

# Print specific columns
awk '{print $1, $3}' data.txt

# Custom field separator
awk -F',' '{print $1, $2}' data.csv
awk -F':' '{print $1}' /etc/passwd

# Filter by condition
awk '$3 > 100 {print $1, $3}' sales.txt

# Sum a column
awk '{sum += $2} END {print sum}' numbers.txt

# Count matches
awk '/ERROR/ {count++} END {print count}' app.log

# Print unique values in a column (deduplicate)
awk '!seen[$1]++ {print $1}' data.txt

# Print lines longer than 80 characters
awk 'length > 80' code.txt

# Print last field on each line (useful for paths)
awk -F'/' '{print $NF}' paths.txt

# Summarize HTTP status codes from access log
awk '{print $9}' access.log | sort | uniq -c | sort -rn

xsv and csvkit: CSV Done Right

Parsing CSV with awk seems easy until you hit quoted fields containing commas. Use a proper CSV tool.

xsv (Rust, fast) handles large files and basic operations:

xsv table data.csv | head -20           # Aligned column view
xsv select name,email users.csv         # Select columns
xsv search -s status "active" users.csv # Filter rows
xsv sort -s revenue -R sales.csv        # Sort descending
xsv stats data.csv | xsv table          # Column statistics
xsv join id users.csv user_id orders.csv # Join on shared column

csvkit (Python) adds format conversion and SQL:

in2csv data.xlsx > data.csv             # Excel to CSV
csvsql --query "SELECT name, SUM(amount) FROM orders GROUP BY name" orders.csv
csvlook data.csv                        # Pretty-print

Use xsv for speed. Use csvkit when you need SQL queries or format conversion.

Miller (mlr): Format-Aware Data Processing

Miller handles CSV, TSV, JSON, and other structured formats with a single tool. It's what awk would be if awk understood data formats natively.

mlr --icsv --ojson cat data.csv          # CSV to JSON
mlr --ijson --ocsv cat data.json         # JSON to CSV
mlr --csv filter '$age > 30' people.csv  # Filter records
mlr --csv put '$total = $price * $quantity' orders.csv  # Computed fields
mlr --csv stats1 -a sum -f revenue -g region sales.csv  # Group-by aggregation
mlr --csv sort-by -nr revenue data.csv   # Sort descending

Miller's real power is chaining operations and converting between formats in a single pipeline.

fx and gron: JSON Exploration

When you don't know the structure of a JSON blob and need to explore it:

fx gives you an interactive terminal UI. Arrow keys expand and collapse nodes -- great for unfamiliar API responses.

curl -s https://api.example.com/data | fx

gron flattens JSON into discrete assignments, making it greppable:

gron data.json
# json.name = "test";
# json.items[0].id = 1;

gron data.json | grep "name"            # Find fields by name
gron data.json | grep "items" | gron --ungron  # Unflatten back to JSON

gron answers "where in this giant JSON blob is the field I'm looking for?" Flatten, grep, done.

Patterns You'll Actually Use

These come up weekly in real development work.

Log Analysis

# Count errors by type
grep -oP 'ERROR: \K\w+' app.log | sort | uniq -c | sort -rn

# Busiest hour in access logs
awk '{print $4}' access.log | cut -d: -f2 | sort | uniq -c | sort -rn

# All unique IP addresses
grep -oP '\d+\.\d+\.\d+\.\d+' access.log | sort -u

# Tail JSON logs with pretty-printing
tail -f app.log | jq .

Data Transformation

# JSON to CSV
jq -r '.records[] | [.id, .name, .email] | @csv' data.json > output.csv

# CSV to JSON
mlr --icsv --ojson cat data.csv > data.json

# TSV to CSV
mlr --itsv --ocsv cat data.tsv > data.csv

Quick Data Inspection

head -5 data.csv | column -t -s','                           # Peek at structure
awk -F',' '{print $3}' data.csv | sort -u | wc -l            # Unique values in col 3
awk -F',' '{print $3}' data.csv | sort | uniq -c | sort -rn | head -10  # Most common

Tool Recommendations by Use Case

Task	Best Tool	Runner-Up
Test/debug a regex	regex101	RegExr
Parse JSON from APIs	jq	fx (for exploration)
Edit YAML config files	yq	sed (simple replacements)
Find-and-replace across files	sed + fd	ripgrep `--replace`
Columnar text processing	awk	cut (trivial cases)
CSV operations	xsv	csvkit (SQL or conversions)
Format conversion (CSV/JSON/TSV)	miller	jq + csvkit
Explore unknown JSON	gron	fx
Log analysis	awk + grep	jq (JSON logs)

The Bottom Line

Start with jq -- it covers the most common modern use case and the skills transfer to yq for YAML. Add sed and awk patterns to your muscle memory for general text wrangling. Pick up xsv or miller when you're doing serious CSV or data format work.

The key insight is knowing which tool to reach for. Don't write a Python script to extract a field from JSON when jq '.field' does it. Don't fight sed into doing multi-line transformations when awk handles it cleanly. Match the tool to the task and you'll spend less time processing text and more time on the work that actually matters.

Regex and Text Processing Tools for Developers

Regex and Text Processing Tools for Developers

Regex Testing: regex101

jq: JSON Processing on the Command Line

Real-World jq Recipes

yq: jq but for YAML

sed: Practical Patterns Only

awk: One-Liners That Are Actually Useful

xsv and csvkit: CSV Done Right

Miller (mlr): Format-Aware Data Processing

fx and gron: JSON Exploration

Patterns You'll Actually Use

Log Analysis

Data Transformation

Quick Data Inspection

Tool Recommendations by Use Case

The Bottom Line

More terminal guides

Before you go...