366 lines
13 KiB
Groff
366 lines
13 KiB
Groff
.\" Automatically generated by Pandoc 2.11.3
|
|
.\"
|
|
.TH "BP" "1" "May 17 2021" "" ""
|
|
.hy
|
|
.SH NAME
|
|
.PP
|
|
bp - Bruce\[aq]s Parsing Expression Grammar tool
|
|
.SH SYNOPSIS
|
|
.PP
|
|
\f[B]bp\f[R] [\f[I]options\&...\f[R]] \f[I]pattern\f[R] [[\f[B]--\f[R]]
|
|
\f[I]files\&...\f[R]]
|
|
.SH DESCRIPTION
|
|
.PP
|
|
\f[B]bp\f[R] is a tool that matches parsing expression grammars using a
|
|
custom syntax.
|
|
.SH OPTIONS
|
|
.TP
|
|
\f[B]-v\f[R], \f[B]--verbose\f[R]
|
|
Print debugging information.
|
|
.TP
|
|
\f[B]-e\f[R], \f[B]--explain\f[R]
|
|
Print a visual explanation of the matches.
|
|
.TP
|
|
\f[B]-j\f[R], \f[B]--json\f[R]
|
|
Print a JSON list of the matches.
|
|
(Pairs with \f[B]--verbose\f[R] for more detail)
|
|
.TP
|
|
\f[B]-l\f[R], \f[B]--list-files\f[R]
|
|
Print only the names of files containing matches instead of the matches
|
|
themselves.
|
|
.TP
|
|
\f[B]-i\f[R], \f[B]--ignore-case\f[R]
|
|
Perform pattern matching case-insensitively.
|
|
.TP
|
|
\f[B]-I\f[R], \f[B]--inplace\f[R]
|
|
Perform filtering or replacement in-place (i.e.\ overwrite files with
|
|
new content).
|
|
.TP
|
|
\f[B]-C\f[R], \f[B]--confirm\f[R]
|
|
During in-place modification of a file, confirm before each
|
|
modification.
|
|
.TP
|
|
\f[B]-r\f[R], \f[B]--replace\f[R] \f[I]replacement\f[R]
|
|
Replace all occurrences of the main pattern with the given string.
|
|
.TP
|
|
\f[B]-s\f[R], \f[B]--skip\f[R] \f[I]pattern\f[R]
|
|
While looking for matches, skip over \f[I]pattern\f[R] occurrences.
|
|
This can be useful for behavior like \f[B]bp -s string\f[R] (avoiding
|
|
matches inside string literals).
|
|
.TP
|
|
\f[B]-g\f[R], \f[B]--grammar\f[R] \f[I]grammar-file\f[R]
|
|
Load the grammar from the given file.
|
|
See the \f[B]GRAMMAR FILES\f[R] section for more info.
|
|
.TP
|
|
\f[B]-G\f[R], \f[B]--git\f[R]
|
|
Use \f[B]git\f[R] to get a list of files.
|
|
Remaining file arguments (if any) are passed to \f[B]git --ls-files\f[R]
|
|
instead of treated as literal files.
|
|
.TP
|
|
\f[B]-c\f[R], \f[B]--context\f[R] \f[I]N\f[R]
|
|
The number of lines of context to print.
|
|
If \f[I]N\f[R] is 0, print only the exact text of the matches.
|
|
If \f[I]N\f[R] is \f[B]\[lq]all\[rq]\f[R], print the entire file.
|
|
Otherwise, if \f[I]N\f[R] is a positive integer, print the whole line on
|
|
which matches occur, as well as the \f[I]N-1\f[R] lines before and after
|
|
the match.
|
|
The default value for this argument is \f[B]1\f[R] (print whole lines
|
|
where matches occur).
|
|
.TP
|
|
\f[B]-f\f[R], \f[B]--format\f[R] \f[I]auto\f[R]|\f[I]fancy\f[R]|\f[I]plain\f[R]
|
|
Set the output format.
|
|
\f[I]fancy\f[R] includes colors and line numbers, \f[I]plain\f[R]
|
|
includes neither, and \f[I]auto\f[R] (the default) uses \f[I]fancy\f[R]
|
|
formatting only when the output is a TTY.
|
|
.TP
|
|
\f[B]--help\f[R]
|
|
Print the usage and exit.
|
|
.TP
|
|
\f[I]pattern\f[R]
|
|
The main pattern for bp to match.
|
|
By default, this pattern is a string pattern (see the \f[B]STRING
|
|
PATTERNS\f[R] section below).
|
|
.TP
|
|
\f[I]files\&...\f[R]
|
|
The input files to search.
|
|
If no input files are provided and data was piped in, that data will be
|
|
used instead.
|
|
If neither are provided, \f[B]bp\f[R] will search through all files in
|
|
the current directory and its subdirectories (recursively).
|
|
.SH STRING PATTERNS
|
|
.PP
|
|
One of the most common use cases for pattern matching tools is matching
|
|
plain, literal strings, or strings that are primarily plain strings,
|
|
with one or two patterns.
|
|
\f[B]bp\f[R] is designed around this fact.
|
|
The default mode for bp patterns is \[lq]string pattern mode\[rq].
|
|
In string pattern mode, all characters are interpreted literally except
|
|
for the backslash (\f[B]\[rs]\f[R]), which may be followed by a bp
|
|
pattern (see the \f[B]PATTERNS\f[R] section below).
|
|
Optionally, the bp pattern may be terminated by a semicolon
|
|
(\f[B];\f[R]).
|
|
.SH PATTERNS
|
|
.PP
|
|
\f[B]bp\f[R] patterns are based off of a combination of Parsing
|
|
Expression Grammars and regular expression syntax.
|
|
The syntax is designed to map closely to verbal descriptions of the
|
|
patterns, and prefix operators are preferred over suffix operators (as
|
|
is common in regex syntax).
|
|
Patterns are whitespace-agnostic, so they work the same regardless of
|
|
whether whitespace is present or not, except for string literals
|
|
(\f[B]\[aq]...\[aq]\f[R] and \f[B]\[dq]...\[dq]\f[R]), character
|
|
literals (\f[B]\[ga]\f[R]), and escape sequences (\f[B]\[rs]\f[R]).
|
|
Whitespace between patterns or parts of a pattern should be used for
|
|
clarity, but it will not affect the meaning of the pattern.
|
|
.TP
|
|
\f[I]pat1 pat2\f[R]
|
|
A sequence: \f[I]pat1\f[R] followed by \f[I]pat2\f[R]
|
|
.TP
|
|
\f[I]pat1\f[R] \f[B]/\f[R] \f[I]pat2\f[R]
|
|
A choice: \f[I]pat1\f[R], or if it doesn\[aq]t match, then
|
|
\f[I]pat2\f[R]
|
|
.TP
|
|
\f[B].\f[R]
|
|
Any character (excluding newline)
|
|
.TP
|
|
\f[B]\[ha]\f[R]
|
|
Start of a line
|
|
.TP
|
|
\f[B]\[ha]\[ha]\f[R]
|
|
Start of the text
|
|
.TP
|
|
\f[B]$\f[R]
|
|
End of a line (does not include newline character)
|
|
.TP
|
|
\f[B]$$\f[R]
|
|
End of the text
|
|
.TP
|
|
\f[B]_\f[R]
|
|
Zero or more whitespace characters, including spaces and tabs, but not
|
|
newlines.
|
|
.TP
|
|
\f[B]__\f[R]
|
|
Zero or more whitespace characters, including spaces, tabs, newlines,
|
|
and comments.
|
|
Comments are undefined by default, but may be defined by a separate
|
|
grammar file.
|
|
See the \f[B]GRAMMAR FILES\f[R] section for more info.
|
|
.TP
|
|
\f[B]\[dq]foo\[dq]\f[R], \f[B]\[aq]foo\[aq]\f[R]
|
|
The literal string \f[B]\[lq]foo\[rq]\f[R].
|
|
Single and double quotes are treated the same.
|
|
Escape sequences are not allowed.
|
|
.TP
|
|
\f[B]{foo}\f[R]
|
|
The literal string \f[B]\[lq]foo\[rq]\f[R] with word boundaries on
|
|
either end.
|
|
Escape sequences are not allowed.
|
|
.TP
|
|
\f[B]\[ga]\f[R]\f[I]c\f[R]
|
|
The literal character \f[I]c\f[R] (e.g.\ \f[B]\[ga]\[at]\f[R] matches
|
|
the \[lq]\[at]\[rq] character)
|
|
.TP
|
|
\f[B]\[ga]\f[R]\f[I]c1\f[R]\f[B],\f[R]\f[I]c2\f[R]
|
|
The literal character \f[I]c1\f[R] or \f[I]c2\f[R]
|
|
(e.g.\ \f[B]\[ga]a,e,i,o,u\f[R])
|
|
.TP
|
|
\f[B]\[ga]\f[R]\f[I]c1\f[R]\f[B]-\f[R]\f[I]c2\f[R]
|
|
The character range \f[I]c1\f[R] to \f[I]c2\f[R]
|
|
(e.g.\ \f[B]\[ga]a-z\f[R]).
|
|
Multiple ranges can be combined with a comma
|
|
(e.g.\ \f[B]\[ga]a-z,A-Z\f[R]).
|
|
.TP
|
|
\f[B]\[rs]\f[R]\f[I]esc\f[R]
|
|
An escape sequence (e.g.\ \f[B]\[rs]n\f[R], \f[B]\[rs]x1F\f[R],
|
|
\f[B]\[rs]033\f[R], etc.)
|
|
.TP
|
|
\f[B]\[rs]\f[R]\f[I]esc1\f[R]\f[B]-\f[R]\f[I]esc2\f[R]
|
|
An escape sequence range from \f[I]esc1\f[R] to \f[I]esc2\f[R]
|
|
(e.g.\ \f[B]\[rs]x00-x1F\f[R])
|
|
.TP
|
|
\f[B]\[rs]N\f[R]
|
|
A special case escape that matches a \[lq]nodent\[rq]: one or more
|
|
newlines followed by the same indentation that occurs on the current
|
|
line.
|
|
.TP
|
|
\f[B]!\f[R] \f[I]pat\f[R]
|
|
Not \f[I]pat\f[R]
|
|
.TP
|
|
\f[B][\f[R] \f[I]pat\f[R] \f[B]]\f[R]
|
|
Maybe \f[I]pat\f[R]
|
|
.TP
|
|
\f[I]N\f[R] \f[I]pat\f[R]
|
|
Exactly \f[I]N\f[R] repetitions of \f[I]pat\f[R]
|
|
(e.g.\ \f[B]5 \[dq]x\[dq]\f[R] matches \f[B]\[lq]xxxxx\[rq]\f[R])
|
|
.TP
|
|
\f[I]N\f[R] \f[B]-\f[R] \f[I]M\f[R] \f[I]pat\f[R]
|
|
Between \f[I]N\f[R] and \f[I]M\f[R] repetitions of \f[I]pat\f[R]
|
|
(e.g.\ \f[B]2-3 \[dq]x\[dq]\f[R] matches \f[B]\[lq]xx\[rq]\f[R] or
|
|
\f[B]\[lq]xxx\[rq]\f[R])
|
|
.TP
|
|
\f[I]N\f[R]\f[B]+\f[R] \f[I]pat\f[R]
|
|
At least \f[I]N\f[R] or more repetitions of \f[I]pat\f[R]
|
|
(e.g.\ \f[B]2+ \[dq]x\[dq]\f[R] matches \f[B]\[lq]xx\[rq]\f[R],
|
|
\f[B]\[lq]xxx\[rq]\f[R], \f[B]\[lq]xxxx\[rq]\f[R], etc.)
|
|
.TP
|
|
\f[B]*\f[R] \f[I]pat\f[R]
|
|
Some \f[I]pat\f[R]s (zero or more, e.g.\ \f[B]* \[dq]x\[dq]\f[R] matches
|
|
\f[B]\[dq]\[lq]\f[R], \f[B]\[rq]x\[lq]\f[R], \f[B]\[rq]xx\[dq]\f[R],
|
|
etc.)
|
|
.TP
|
|
\f[B]+\f[R] \f[I]pat\f[R]
|
|
At least one \f[I]pat\f[R]s (e.g.\ \f[B]+ \[dq]x\[dq]\f[R] matches
|
|
\f[B]\[lq]x\[rq]\f[R], \f[B]\[lq]xx\[rq]\f[R], \f[B]\[lq]xxx\[rq]\f[R],
|
|
etc.)
|
|
.TP
|
|
\f[I]repeating-pat\f[R] \f[B]%\f[R] \f[I]sep\f[R]
|
|
\f[I]repeating-pat\f[R] (see the examples above) separated by
|
|
\f[I]sep\f[R] (e.g.\ \f[B]*word % \[dq],\[dq]\f[R] matches zero or more
|
|
comma-separated words)
|
|
.TP
|
|
\f[B]..\f[R] \f[I]pat\f[R]
|
|
Any text (except newlines) up to and including \f[I]pat\f[R]
|
|
.TP
|
|
\f[B].. %\f[R] \f[I]skip\f[R] \f[I]pat\f[R]
|
|
Any text (except newlines) up to and including \f[I]pat\f[R], skipping
|
|
over instances of \f[I]skip\f[R]
|
|
(e.g.\ \f[B]\[aq]\[dq]\[aq] ..%(\[aq]\[rs]\[aq] .) \[aq]\[dq]\[aq]\f[R]
|
|
opening quote, up to closing quote, skipping over backslash followed by
|
|
a single character)
|
|
.TP
|
|
\f[B]<\f[R] \f[I]pat\f[R]
|
|
Matches at the current position if \f[I]pat\f[R] matches immediately
|
|
before the current position (lookbehind).
|
|
Conceptually, you can think of this as creating a file containing only
|
|
the \f[I]N\f[R] characters immediately before the current position and
|
|
attempting to match \f[I]pat\f[R] on that file, for all values of
|
|
\f[I]N\f[R] from the minimum number of characters \f[I]pat\f[R] can
|
|
match up to maximum number of characters \f[I]pat\f[R] can match (or the
|
|
length of the current line upto the current position, whichever is
|
|
smaller).
|
|
\f[B]Note:\f[R] For fixed-length lookbehinds, this is quite efficient
|
|
(e.g.\ \f[B]<(100 \[dq]x\[dq])\f[R]), however this could cause
|
|
performance problems with variable-length lookbehinds
|
|
(e.g.\ \f[B]<(\[dq]x\[dq] 0-100\[dq]y\[dq])\f[R]).
|
|
Also, it is worth noting that \f[B]\[ha]\f[R], \f[B]\[ha]\[ha]\f[R],
|
|
\f[B]$\f[R], and \f[B]$$\f[R] all match against the edges of the slice,
|
|
which may give false positives if you were expecting them to match only
|
|
against the edges file or line.
|
|
.TP
|
|
\f[B]>\f[R] \f[I]pat\f[R]
|
|
Matches \f[I]pat\f[R], but does not consume any input (lookahead).
|
|
.TP
|
|
\f[B]\[at]\f[R] \f[I]pat\f[R]
|
|
Capture \f[I]pat\f[R]
|
|
.TP
|
|
\f[B]foo\f[R]
|
|
The named pattern whose name is \f[B]\[lq]foo\[rq]\f[R].
|
|
Pattern names come from definitions in grammar files or from named
|
|
captures.
|
|
Pattern names may contain dashes (\f[B]-\f[R]), but not underscores
|
|
(\f[B]_\f[R]), since the underscore is used to match whitespace.
|
|
See the \f[B]GRAMMAR FILES\f[R] section for more info.
|
|
.TP
|
|
\f[B]\[at]\f[R] \f[I]name\f[R] \f[B]=\f[R] \f[I]pat\f[R]
|
|
Let \f[I]name\f[R] equal \f[I]pat\f[R] (named capture).
|
|
Named captures can be used as backreferences like so:
|
|
\f[B]\[at]foo=word \[ga]( foo \[ga])\f[R] (matches
|
|
\f[B]\[lq]asdf(asdf)\[rq]\f[R] or \f[B]\[lq]baz(baz)\[rq]\f[R], but not
|
|
\f[B]\[lq]foo(baz)\[rq]\f[R])
|
|
.TP
|
|
\f[I]pat\f[R] \f[B]=>\f[R] \f[B]\[dq]\f[R]\f[I]replacement\f[R]\f[B]\[dq]\f[R]
|
|
Replace \f[I]pat\f[R] with \f[I]replacement\f[R].
|
|
Note: \f[I]replacement\f[R] should be a string (single or double
|
|
quoted), and it may contain escape sequences (e.g.\ \f[B]\[rs]n\f[R]) or
|
|
references to captured values: \f[B]\[at]0\f[R] (the whole of
|
|
\f[I]pat\f[R]), \f[B]\[at]1\f[R] (the first capture in \f[I]pat\f[R]),
|
|
\f[B]\[at]\f[R]\f[I]foo\f[R] (the capture named \f[I]foo\f[R] in
|
|
\f[I]pat\f[R]), etc.
|
|
For example,
|
|
\f[B]\[at]word _ \[at]rest=(*word % _) => \[dq]\[at]rest:\[rs]n\[rs]t\[at]1\[dq]\f[R]
|
|
matches a word followed by whitespace, followed by a series of words and
|
|
replaces it with the series of words, a colon, a newline, a tab, and
|
|
then the first word.
|
|
.TP
|
|
\f[I]pat1\f[R] \f[B]\[ti]\f[R] \f[I]pat2\f[R]
|
|
Matches when \f[I]pat1\f[R] matches and \f[I]pat2\f[R] can be found
|
|
within the text of that match.
|
|
(e.g.\ \f[B]comment \[ti] {TODO}\f[R] matches comments that contain the
|
|
word \f[B]\[lq]TODO\[rq]\f[R])
|
|
.TP
|
|
\f[I]pat1\f[R] \f[B]!\[ti]\f[R] \f[I]pat2\f[R]
|
|
Matches when \f[I]pat1\f[R] matches, but \f[I]pat2\f[R] can not be found
|
|
within the text of that match.
|
|
(e.g.\ \f[B]comment \[ti] {IGNORE}\f[R] matches only comments that do
|
|
not contain the word \f[B]\[lq]IGNORE\[rq]\f[R])
|
|
.TP
|
|
\f[I]name\f[R]\f[B]:\f[R] \f[I]pat\f[R]
|
|
Define \f[I]name\f[R] to mean \f[I]pat\f[R] (pattern definition)
|
|
.TP
|
|
\f[B](!)\f[R] \f[I]error-pat\f[R]
|
|
If \f[I]error-pat\f[R] matches, \f[B]bp\f[R] will not print any results
|
|
in this file and instead print an error message to \f[B]STDERR\f[R]
|
|
highlighting the matching position of \f[I]error-pat\f[R] in the file
|
|
and printing the text of \f[I]error-pat\f[R] as an error message.
|
|
Then, \f[B]bp\f[R] will exit with a failure status and not process any
|
|
further files.
|
|
.TP
|
|
\f[B]#\f[R] \f[I]comment\f[R]
|
|
A line comment
|
|
.SH GRAMMAR FILES
|
|
.PP
|
|
\f[B]bp\f[R] allows loading extra grammar files, which define patterns
|
|
which may be used for matching.
|
|
The \f[B]builtins\f[R] grammar file is loaded by default, and it defines
|
|
a few useful general-purpose patterns.
|
|
For example, it defines the \f[B]parens\f[R] rule, which matches pairs
|
|
of matching parentheses, accounting for nested inner parentheses:
|
|
.RS
|
|
.PP
|
|
\f[B]bp -p \[aq]\[dq]my_func\[dq] parens\[aq]\f[R]
|
|
.RE
|
|
.PP
|
|
\f[B]bp\f[R] also comes with a few grammar files for common programming
|
|
languages, which may be loaded on demand.
|
|
These grammar files are not comprehensive syntax definitions, but only
|
|
some common patterns.
|
|
For example, the c++ grammar file contains definitions for
|
|
\f[B]//\f[R]-style line comments as well as \f[B]/*...*/\f[R]-style
|
|
block comments.
|
|
Thus, you can find all comments with the word \[lq]TODO\[rq] with the
|
|
following command:
|
|
.RS
|
|
.PP
|
|
\f[B]bp -g c++ -p \[aq]comment \[ti] {TODO}\[aq] *.cpp\f[R]
|
|
.RE
|
|
.SH EXAMPLES
|
|
.PP
|
|
Find files containing the string \[lq]foo\[rq] (a string pattern):
|
|
.RS
|
|
.PP
|
|
\f[B]ls | bp foo\f[R]
|
|
.RE
|
|
.PP
|
|
Find files ending with \[lq].c\[rq] and print the name with the
|
|
\[lq].c\[rq] replaced with \[lq].h\[rq]:
|
|
.RS
|
|
.PP
|
|
\f[B]ls | bp \[aq].c\[rs]$\[aq] -r \[aq].h\[aq]\f[R]
|
|
.RE
|
|
.PP
|
|
Find the word \[lq]foobar\[rq], followed by a pair of matching
|
|
parentheses in the file \f[I]my_file.py\f[R]:
|
|
.RS
|
|
.PP
|
|
\f[B]bp -p \[aq]{foobar} parens\[aq] my_file.py\f[R]
|
|
.RE
|
|
.PP
|
|
Using the \f[I]html\f[R] grammar, find all \f[I]element\f[R]s matching
|
|
the tag \f[I]a\f[R] in the file \f[I]foo.html\f[R]:
|
|
.RS
|
|
.PP
|
|
\f[B]bp -g html -p \[aq]element \[ti] (\[ha]\[ha]\[dq]<a \[dq])\[aq] foo.html\f[R]
|
|
.RE
|
|
.SH AUTHORS
|
|
Bruce Hill (\f[I]bruce\[at]bruce-hill.com\f[R]).
|