352 lines
12 KiB
Groff
352 lines
12 KiB
Groff
.\" Automatically generated by Pandoc 2.11.3
|
|
.\"
|
|
.TH "BP" "1" "May 17 2021" "" ""
|
|
.hy
|
|
.SH NAME
|
|
.PP
|
|
bp - Bruce\[aq]s Parsing Expression Grammar tool
|
|
.SH SYNOPSIS
|
|
.PP
|
|
\f[B]bp\f[R] [\f[I]options\&...\f[R]] \f[I]pattern\f[R] [[--]
|
|
\f[I]files\&...\f[R]]
|
|
.SH DESCRIPTION
|
|
.PP
|
|
\f[B]bp\f[R] is a tool that matches parsing expression grammars using a
|
|
custom syntax.
|
|
.SH OPTIONS
|
|
.TP
|
|
\f[B]-v\f[R], \f[B]--verbose\f[R]
|
|
Print debugging information.
|
|
.TP
|
|
\f[B]-e\f[R], \f[B]--explain\f[R]
|
|
Print a visual explanation of the matches.
|
|
.TP
|
|
\f[B]-j\f[R], \f[B]--json\f[R]
|
|
Print a JSON list of the matches.
|
|
(Pairs with \f[B]--verbose\f[R] for more detail)
|
|
.TP
|
|
\f[B]-l\f[R], \f[B]--list-files\f[R]
|
|
Print only the names of files containing matches instead of the matches
|
|
themselves.
|
|
.TP
|
|
\f[B]-i\f[R], \f[B]--ignore-case\f[R]
|
|
Perform pattern matching case-insensitively.
|
|
.TP
|
|
\f[B]-I\f[R], \f[B]--inplace\f[R]
|
|
Perform filtering or replacement in-place (i.e.\ overwrite files with
|
|
new content).
|
|
.TP
|
|
\f[B]-C\f[R], \f[B]--confirm\f[R]
|
|
During in-place modification of a file, confirm before each
|
|
modification.
|
|
.TP
|
|
\f[B]-r\f[R], \f[B]--replace\f[R] \f[I]replacement\f[R]
|
|
Replace all occurrences of the main pattern with the given string.
|
|
.TP
|
|
\f[B]-s\f[R], \f[B]--skip\f[R] \f[I]pattern\f[R]
|
|
While looking for matches, skip over \f[I]pattern\f[R] occurrences.
|
|
This can be useful for behavior like \f[B]bp -s string\f[R] (avoiding
|
|
matches inside string literals).
|
|
.TP
|
|
\f[B]-g\f[R], \f[B]--grammar\f[R] \f[I]grammar-file\f[R]
|
|
Load the grammar from the given file.
|
|
See the \f[B]GRAMMAR FILES\f[R] section for more info.
|
|
.TP
|
|
\f[B]-G\f[R], \f[B]--git\f[R]
|
|
Use \f[B]git\f[R] to get a list of files.
|
|
Remaining file arguments (if any) are passed to \f[B]git --ls-files\f[R]
|
|
instead of treated as literal files.
|
|
.TP
|
|
\f[B]-c\f[R], \f[B]--context\f[R] \f[I]N\f[R]
|
|
The number of lines of context to print.
|
|
If \f[I]N\f[R] is 0, print only the exact text of the matches.
|
|
If \f[I]N\f[R] is \f[B]\f[CB]\[dq]all\[dq]\f[B]\f[R], print the entire
|
|
file.
|
|
Otherwise, if \f[I]N\f[R] is a positive integer, print the whole line on
|
|
which matches occur, as well as the \f[I]N-1\f[R] lines before and after
|
|
the match.
|
|
The default value for this argument is \f[B]1\f[R] (print whole lines
|
|
where matches occur).
|
|
.TP
|
|
\f[B]-f\f[R], \f[B]--format\f[R] \f[I]auto\f[R]|\f[I]fancy\f[R]|\f[I]plain\f[R]
|
|
Set the output format.
|
|
\f[I]fancy\f[R] includes colors and line numbers, \f[I]plain\f[R]
|
|
includes neither, and \f[I]auto\f[R] (the default) uses \f[I]fancy\f[R]
|
|
formatting only when the output is a TTY.
|
|
.TP
|
|
\f[B]--help\f[R]
|
|
Print the usage and exit.
|
|
.TP
|
|
\f[I]pattern\f[R]
|
|
The main pattern for bp to match.
|
|
By default, this pattern is a string pattern (see the \f[B]STRING
|
|
PATTERNS\f[R] section below).
|
|
.TP
|
|
\f[I]files\&...\f[R]
|
|
The input files to search.
|
|
If no input files are provided and data was piped in, that data will be
|
|
used instead.
|
|
If neither are provided, \f[B]bp\f[R] will search through all files in
|
|
the current directory and its subdirectories (recursively).
|
|
.SH STRING PATTERNS
|
|
.PP
|
|
One of the most common use cases for pattern matching tools is matching
|
|
plain, literal strings, or strings that are primarily plain strings,
|
|
with one or two patterns.
|
|
\f[B]bp\f[R] is designed around this fact.
|
|
The default mode for bp patterns is \[lq]string pattern mode\[rq].
|
|
In string pattern mode, all characters are interpreted literally except
|
|
for the backslash (\f[B]\[rs]\f[R]), which may be followed by a bp
|
|
pattern (see the \f[B]PATTERNS\f[R] section above).
|
|
Optionally, the bp pattern may be terminated by a semicolon
|
|
(\f[B];\f[R]).
|
|
.SH PATTERNS
|
|
.PP
|
|
\f[B]bp\f[R] patterns are based off of a combination of Parsing
|
|
Expression Grammars and regular expression syntax.
|
|
The syntax is designed to map closely to verbal descriptions of the
|
|
patterns, and prefix operators are preferred over suffix operators (as
|
|
is common in regex syntax).
|
|
.PP
|
|
Some patterns additionally have \[lq]multi-line\[rq] variants, which
|
|
means that they include the newline character.
|
|
.TP
|
|
\f[I]pat1 pat2\f[R]
|
|
A sequence: \f[I]pat1\f[R] followed by \f[I]pat2\f[R]
|
|
.TP
|
|
\f[I]pat1\f[R] \f[B]/\f[R] \f[I]pat2\f[R]
|
|
A choice: \f[I]pat1\f[R], or if it doesn\[aq]t match, then
|
|
\f[I]pat2\f[R]
|
|
.TP
|
|
\f[B].\f[R]
|
|
Any character (excluding newline)
|
|
.TP
|
|
\f[B]\[ha]\f[R]
|
|
Start of a line
|
|
.TP
|
|
\f[B]\[ha]\[ha]\f[R]
|
|
Start of the text
|
|
.TP
|
|
\f[B]$\f[R]
|
|
End of a line (does not include newline character)
|
|
.TP
|
|
\f[B]$$\f[R]
|
|
End of the text
|
|
.TP
|
|
\f[B]_\f[R]
|
|
Zero or more whitespace characters, including spaces and tabs, but not
|
|
newlines.
|
|
.TP
|
|
\f[B]__\f[R]
|
|
Zero or more whitespace characters, including spaces, tabs, newlines,
|
|
and comments.
|
|
Comments are undefined by default, but may be defined by a separate
|
|
grammar file.
|
|
See the \f[B]GRAMMAR FILES\f[R] section for more info.
|
|
.TP
|
|
\f[B]\[dq]foo\[dq]\f[R], \f[B]\[aq]foo\[aq]\f[R]
|
|
The literal string \f[B]\[lq]foo\[rq]\f[R].
|
|
Single and double quotes are treated the same.
|
|
Escape sequences are not allowed.
|
|
.TP
|
|
\f[B]{foo}\f[R]
|
|
The literal string \f[B]\[lq]foo\[rq]\f[R] with word boundaries on
|
|
either end.
|
|
Escape sequences are not allowed.
|
|
.TP
|
|
\f[B]\[ga]\f[R]\f[I]c\f[R]
|
|
The literal character \f[I]c\f[R] (e.g.\ **\[ga]\[at]** matches the
|
|
\[lq]\[at]\[rq] character)
|
|
.TP
|
|
\f[B]\[ga]\f[R]\f[I]c1\f[R]\f[B],\f[R]\f[I]c2\f[R]
|
|
The literal character \f[I]c1\f[R] or \f[I]c2\f[R]
|
|
(e.g.\ \f[B]\[ga]a,e,i,o,u\f[R])
|
|
.TP
|
|
\f[B]\[ga]\f[R]\f[I]c1\f[R]\f[B]-\f[R]\f[I]c2\f[R]
|
|
The character range \f[I]c1\f[R] to \f[I]c2\f[R]
|
|
(e.g.\ \f[B]\[ga]a-z\f[R]).
|
|
Multiple ranges can be combined with a comma
|
|
(e.g.\ \f[B]\[ga]a-z,A-Z\f[R]).
|
|
.TP
|
|
\f[B]\[rs]\f[R]\f[I]esc\f[R]
|
|
An escape sequence (e.g.\ \f[B]\[rs]n\f[R], \f[B]\[rs]x1F\f[R],
|
|
\f[B]\[rs]033\f[R], etc.)
|
|
.TP
|
|
\f[B]\[rs]\f[R]\f[I]esc1\f[R]\f[B]-\f[R]\f[I]esc2\f[R]
|
|
An escape sequence range from \f[I]esc1\f[R] to \f[I]esc2\f[R]
|
|
(e.g.\ \f[B]\[rs]x00-x1F\f[R])
|
|
.TP
|
|
\f[B]\[rs]N\f[R]
|
|
A special case escape that matches a \[lq]nodent\[rq]: one or more
|
|
newlines followed by the same indentation that occurs on the current
|
|
line.
|
|
.TP
|
|
\f[B]!\f[R] \f[I]pat\f[R]
|
|
Not \f[I]pat\f[R]
|
|
.TP
|
|
\f[B][\f[R] \f[I]pat\f[R] \f[B]]\f[R]
|
|
Maybe \f[I]pat\f[R]
|
|
.TP
|
|
\f[I]N\f[R] \f[I]pat\f[R]
|
|
Exactly \f[I]N\f[R] repetitions of \f[I]pat\f[R] (e.g.\ \f[B]5
|
|
\[ga]x\f[R] matches \f[B]\[lq]xxxxx\[rq]\f[R])
|
|
.TP
|
|
\f[I]N\f[R] \f[B]-\f[R] \f[I]M\f[R] \f[I]pat\f[R]
|
|
Between \f[I]N\f[R] and \f[I]M\f[R] repetitions of \f[I]pat\f[R]
|
|
(e.g.\ \f[B]2-3 \[ga]x\f[R] matches \f[B]\[lq]xx\[rq]\f[R] or
|
|
\f[B]\[lq]xxx\[rq]\f[R])
|
|
.TP
|
|
\f[I]N\f[R]\f[B]+\f[R] \f[I]pat\f[R]
|
|
At least \f[I]N\f[R] or more repetitions of \f[I]pat\f[R] (e.g.\ \f[B]2+
|
|
\[ga]x\f[R] matches \f[B]\[lq]xx\[rq]\f[R], \f[B]\[lq]xxx\[rq]\f[R],
|
|
\f[B]\[lq]xxxx\[rq]\f[R], etc.)
|
|
.TP
|
|
\f[B]*\f[R] \f[I]pat\f[R]
|
|
Some \f[I]pat\f[R]s (zero or more, e.g.\ \f[B]* \[ga]x\f[R] matches
|
|
\f[B]\[dq]\[lq]\f[R], \f[B]\[rq]x\[lq]\f[R], \f[B]\[rq]xx\[dq]\f[R],
|
|
etc.)
|
|
.TP
|
|
\f[B]+\f[R] \f[I]pat\f[R]
|
|
At least one \f[I]pat\f[R]s (e.g.\ \f[B]+ \[ga]x\f[R] matches
|
|
\f[B]\[lq]x\[rq]\f[R], \f[B]\[lq]xx\[rq]\f[R], \f[B]\[lq]xxx\[rq]\f[R],
|
|
etc.)
|
|
.TP
|
|
\f[I]repeating-pat\f[R] \f[B]%\f[R] \f[I]sep\f[R]
|
|
\f[I]repeating-pat\f[R] separated by \f[I]sep\f[R] (e.g.\ \f[B]*word %
|
|
\[ga],\f[R] matches zero or more comma-separated words)
|
|
.TP
|
|
\f[B]..\f[R] \f[I]pat\f[R]
|
|
Any text (except newlines) up to and including \f[I]pat\f[R]
|
|
.TP
|
|
\f[B].. %\f[R] \f[I]skip\f[R] \f[I]pat\f[R]
|
|
Any text (except newlines) up to and including \f[I]pat\f[R], skipping
|
|
over instances of \f[I]skip\f[R] (e.g.\ \f[B]\[ga]\[dq]..\[ga]\[dq] %
|
|
(\[ga]\[rs].)\f[R])
|
|
.TP
|
|
\f[B]<\f[R] \f[I]pat\f[R]
|
|
Matches at the current position if \f[I]pat\f[R] matches immediately
|
|
before the current position (lookbehind).
|
|
Conceptually, you can think of this as creating a file containing only
|
|
the \f[I]N\f[R] characters immediately before the current position and
|
|
attempting to match \f[I]pat\f[R] on that file, for all values of
|
|
\f[I]N\f[R] from the minimum number of characters \f[I]pat\f[R] can
|
|
match up to maximum number of characters \f[I]pat\f[R] can match (or the
|
|
length of the current line upto the current position, whichever is
|
|
smaller).
|
|
\f[B]Note:\f[R] For fixed-length lookbehinds, this is quite efficient
|
|
(e.g.\ \f[B]<(100\[ga]x)\f[R]), however this could cause performance
|
|
problems with variable-length lookbehinds (e.g.\ \f[B]<(\[ga]x
|
|
0-100\[ga]y)\f[R]).
|
|
Also, it is not advised to use \f[B]\[ha]\f[R], \f[B]\[ha]\[ha]\f[R],
|
|
\f[B]\[u2005]*\[u2005]*,\f[BI]o\f[B]\f[BI]r\f[B]\[u2005]*\[u2005]*$\f[R]
|
|
inside a lookbehind, as they will match against the edges of the
|
|
lookbehind slice.
|
|
.TP
|
|
\f[B]>\f[R] \f[I]pat\f[R]
|
|
Matches \f[I]pat\f[R], but does not consume any input (lookahead).
|
|
.TP
|
|
\f[B]\[at]\f[R] \f[I]pat\f[R]
|
|
Capture \f[I]pat\f[R]
|
|
.TP
|
|
\f[B]foo\f[R]
|
|
The named pattern whose name is \f[B]\[lq]foo\[rq]\f[R].
|
|
Pattern names come from definitions in grammar files or from named
|
|
captures.
|
|
Pattern names may contain dashes (\f[B]-\f[R]), but not underscores
|
|
(\f[B]_\f[R]), since the underscore is used to match whitespace.
|
|
See the \f[B]GRAMMAR FILES\f[R] section for more info.
|
|
.TP
|
|
\f[B]\[at]\f[R] \f[I]name\f[R] \f[B]=\f[R] \f[I]pat\f[R]
|
|
Let \f[I]name\f[R] equal \f[I]pat\f[R] (named capture).
|
|
Named captures can be used as backreferences like so: \f[B]\[at]foo=word
|
|
\[ga]( foo \[ga])\f[R] (matches \f[B]\[lq]asdf(asdf)\[rq]\f[R] or
|
|
\f[B]\[lq]baz(baz)\[rq]\f[R], but not \f[B]\[lq]foo(baz)\[rq]\f[R])
|
|
.TP
|
|
\f[I]pat\f[R] \f[B]=> \[aq]\f[R]\f[I]replacement\f[R]\f[B]\[aq]\f[R]
|
|
Replace \f[I]pat\f[R] with \f[I]replacement\f[R].
|
|
Note: \f[I]replacement\f[R] should be a string, and it may contain
|
|
references to captured values: \f[B]\[at]0\f[R] (the whole of
|
|
\f[I]pat\f[R]), \f[B]\[at]1\f[R] (the first capture in \f[I]pat\f[R]),
|
|
\f[B]\[at]\f[R]\f[I]foo\f[R] (the capture named \f[I]foo\f[R] in
|
|
\f[I]pat\f[R]), etc.
|
|
For example, \f[B]\[at]word _ \[at]rest=(*word % _) => \[dq]\[at]rest
|
|
\[at]1\[dq]\f[R]
|
|
.TP
|
|
\f[I]pat1\f[R] \f[B]\[ti]\f[R] \f[I]pat2\f[R]
|
|
Matches when \f[I]pat1\f[R] matches and \f[I]pat2\f[R] can be found
|
|
within the text of that match.
|
|
(e.g.\ \f[B]comment \[ti] {TODO}\f[R] matches comments that contain the
|
|
word \f[B]\[lq]TODO\[rq]\f[R])
|
|
.TP
|
|
\f[I]pat1\f[R] \f[B]!\[ti]\f[R] \f[I]pat2\f[R]
|
|
Matches when \f[I]pat1\f[R] matches, but \f[I]pat2\f[R] can not be found
|
|
within the text of that match.
|
|
(e.g.\ \f[B]comment \[ti] {IGNORE}\f[R] matches only comments that do
|
|
not contain the word \f[B]\[lq]IGNORE\[rq]\f[R])
|
|
.TP
|
|
\f[I]name\f[R]\f[B]:\f[R] \f[I]pat\f[R]
|
|
Define \f[I]name\f[R] to mean \f[I]pat\f[R] (pattern definition)
|
|
.TP
|
|
\f[B](!)\f[R] \f[I]error-pat\f[R]
|
|
If \f[I]error-pat\f[R] matches, \f[B]bp\f[R] will not print any results
|
|
in this file and instead print an error message to \f[B]STDERR\f[R]
|
|
highlighting the matching position of \f[I]error-pat\f[R] in the file
|
|
and printing the text of \f[I]error-pat\f[R] as an error message.
|
|
Then, \f[B]bp\f[R] will exit with a failure status and not process any
|
|
further files.
|
|
.TP
|
|
\f[B]#\f[R] \f[I]comment\f[R]
|
|
A line comment
|
|
.SH GRAMMAR FILES
|
|
.PP
|
|
\f[B]bp\f[R] allows loading extra grammar files, which define patterns
|
|
which may be used for matching.
|
|
The \f[B]builtins\f[R] grammar file is loaded by default, and it defines
|
|
a few useful general-purpose patterns.
|
|
For example, it defines the \f[B]parens\f[R] rule, which matches pairs
|
|
of matching parentheses, accounting for nested inner parentheses:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
bp -p \[aq]\[dq]my_func\[dq] parens\[aq]
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
\f[B]bp\f[R] also comes with a few grammar files for common programming
|
|
languages, which may be loaded on demand.
|
|
These grammar files are not comprehensive syntax definitions, but only
|
|
some common patterns.
|
|
For example, the c++ grammar file contains definitions for
|
|
\f[B]//\f[R]-style line comments as well as \f[B]/*\&...*/\f[R]-style
|
|
block comments.
|
|
Thus, you can find all comments with the word \[lq]TODO\[rq] with the
|
|
following command:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
bp -g c++ -p \[aq]comment\[ti]{TODO}\[aq] *.cpp
|
|
\f[R]
|
|
.fi
|
|
.SH EXAMPLES
|
|
.TP
|
|
\f[B]ls | bp foo\f[R]
|
|
Find files containing the string \[dq]foo\[dq] (a string pattern)
|
|
.TP
|
|
\f[B]ls | bp \[aq].c\[rs]$\[aq] -r \[aq].h\[aq]\f[R]
|
|
Find files ending with \[dq].c\[dq] and replace the extension with
|
|
\[dq].h\[dq]
|
|
.TP
|
|
\f[B]bp -p \[aq]{foobar} parens\[aq] my_file.py\f[R]
|
|
Find the word \f[B]\[dq]foobar\[dq]\f[R], followed by a pair of matching
|
|
parentheses in the file \f[I]my_file.py\f[R]
|
|
.TP
|
|
\f[B]bp -g html -p \[aq]element \[ti] (\[ha]\[ha]\[dq]<a \[dq])\[aq] foo.html\f[R]
|
|
Using the \f[I]html\f[R] grammar, find all \f[I]element\f[R]s matching
|
|
the tag \f[I]a\f[R] in the file \f[I]foo.html\f[R]
|
|
.TP
|
|
\f[B]bp -g python -p \[aq]comment\[ti]{TODO}\[aq] *.py\f[R]
|
|
Find all comments with the word \f[B]\[lq]TODO\[rq]\f[R] in local python
|
|
files.
|
|
.SH AUTHORS
|
|
Bruce Hill (\f[I]bruce\[at]bruce-hill.com\f[R]).
|