aboutsummaryrefslogtreecommitdiff
path: root/bp.1
diff options
context:
space:
mode:
authorBruce Hill <bruce@bruce-hill.com>2023-11-25 14:57:19 -0500
committerBruce Hill <bruce@bruce-hill.com>2023-11-25 14:57:19 -0500
commite6e482054de77f3fe5d65344da86065373cf5f23 (patch)
treea6876c73bccf490512be5ff93fa808f3ce8d1c95 /bp.1
parente0a55ba6176df325b65b1768bba929805444bf88 (diff)
Deprecate '-p' flag and replace backslash interpolation with curly brace
interpolation
Diffstat (limited to 'bp.1')
-rw-r--r--bp.1154
1 files changed, 83 insertions, 71 deletions
diff --git a/bp.1 b/bp.1
index 3e585a4..1e7f46e 100644
--- a/bp.1
+++ b/bp.1
@@ -1,41 +1,25 @@
-.\" Automatically generated by Pandoc 2.18
+.\" Automatically generated by Pandoc 3.1.8
.\"
-.\" Define V font for inline verbatim, using C font in formats
-.\" that render this, and otherwise B font.
-.ie "\f[CB]x\f[]"x" \{\
-. ftr V B
-. ftr VI BI
-. ftr VB B
-. ftr VBI BI
-.\}
-.el \{\
-. ftr V CR
-. ftr VI CI
-. ftr VB CB
-. ftr VBI CBI
-.\}
.TH "BP" "1" "May 17 2021" "" ""
-.hy
.SH NAME
-.PP
bp - Bruce\[aq]s Parsing Expression Grammar tool
.SH SYNOPSIS
-.PP
\f[B]bp\f[R] [\f[I]options\&...\f[R]] \f[I]pattern\f[R] [[\f[B]--\f[R]]
\f[I]files\&...\f[R]]
.SH DESCRIPTION
-.PP
\f[B]bp\f[R] is a tool that matches parsing expression grammars using a
custom syntax.
.SH OPTIONS
.TP
-\f[B]-p\f[R], \f[B]--pattern\f[R] \f[I]pat\f[R]
-Give a pattern in BP syntax instead of string syntax (equivalent to
-\f[B]bp \[aq]\[rs](pat)\[aq]\f[R]
+\f[I]pattern\f[R]
+The text to search for.
+The main argument for \f[B]bp\f[R] is a string literals which may
+contain BP syntax patterns.
+See the \f[B]STRING PATTERNS\f[R] section below.
.TP
\f[B]-w\f[R], \f[B]--word\f[R] \f[I]word\f[R]
Surround a string pattern with word boundaries (equivalent to \f[B]bp
-\[aq]\[rs]|word\[rs]|\[aq]\f[R])
+\[aq]{|}word{|}\[aq]\f[R])
.TP
\f[B]-e\f[R], \f[B]--explain\f[R]
Print a visual explanation of the matches.
@@ -101,11 +85,6 @@ formatting otherwise.
\f[B]-h\f[R], \f[B]--help\f[R]
Print the usage and exit.
.TP
-\f[I]pattern\f[R]
-The main pattern for bp to match.
-By default, this pattern is a string pattern (see the \f[B]STRING
-PATTERNS\f[R] section below).
-.TP
\f[I]files\&...\f[R]
The input files to search.
If no input files are provided and data was piped in, that data will be
@@ -113,19 +92,22 @@ used instead.
If neither are provided, \f[B]bp\f[R] will search through all files in
the current directory and its subdirectories (recursively).
.SH STRING PATTERNS
-.PP
One of the most common use cases for pattern matching tools is matching
plain, literal strings, or strings that are primarily plain strings,
with one or two patterns.
\f[B]bp\f[R] is designed around this fact.
The default mode for bp patterns is \[lq]string pattern mode\[rq].
In string pattern mode, all characters are interpreted literally except
-for the backslash (\f[B]\[rs]\f[R]), which may be followed by an escape
-or a bp pattern (see the \f[B]PATTERNS\f[R] section below).
-Optionally, the bp pattern may be terminated by a semicolon
-(\f[B];\f[R]).
+for curly braces \f[B]{}\f[R], which mark a region of BP syntax patterns
+(see the \f[B]PATTERNS\f[R] section below).
+In other words, when passing a search query to \f[B]bp\f[R], you do not
+need to escape periods, quotation marks, backslashes, or any other
+character, as long as it fits inside a shell string literal.
+In order to match a literal \f[B]{\f[R], you can either search for the
+character literal: \f[B]{\[ga]{}\f[R], the string literal:
+\f[B]{\[dq]{\[dq]}\f[R], or a pair of matching curly braces using the
+\f[B]braces\f[R] rule: \f[B]{braces}\f[R].
.SH PATTERNS
-.PP
\f[B]bp\f[R] patterns are based off of a combination of Parsing
Expression Grammars and regular expression syntax.
The syntax is designed to map closely to verbal descriptions of the
@@ -146,7 +128,7 @@ A choice: \f[I]pat1\f[R], or if it doesn\[aq]t match, then
\f[I]pat2\f[R]
.TP
\f[B].\f[R]
-Any character (excluding newline)
+The period pattern matches single character (excluding newline)
.TP
\f[B]\[ha]\f[R]
Start of a line
@@ -227,11 +209,14 @@ A word boundary (i.e.\ the edge of a word).
\f[B]\[rs]b\f[R]
Alias for \f[B]|\f[R] (word boundary)
.TP
+\f[B](\f[R] \f[I]pat\f[R] \f[B])\f[R]
+Parentheses can be used to delineate patterns, as in most languages.
+.TP
\f[B]!\f[R] \f[I]pat\f[R]
-Not \f[I]pat\f[R]
+Not \f[I]pat\f[R] (don\[cq]t match if \f[I]pat\f[R] matches here)
.TP
\f[B][\f[R] \f[I]pat\f[R] \f[B]]\f[R]
-Maybe \f[I]pat\f[R]
+Maybe \f[I]pat\f[R] (match zero or one occurrences of \f[I]pat\f[R])
.TP
\f[I]N\f[R] \f[I]pat\f[R]
Exactly \f[I]N\f[R] repetitions of \f[I]pat\f[R] (e.g.\ \f[B]5
@@ -253,7 +238,7 @@ Any \f[I]pat\f[R]s (zero or more, e.g.\ \f[B]* \[dq]x\[dq]\f[R] matches
etc.)
.TP
\f[B]+\f[R] \f[I]pat\f[R]
-Some \f[I]pat\f[R]s (e.g.\ \f[B]+ \[dq]x\[dq]\f[R] matches
+Some \f[I]pat\f[R]s (one or more, e.g.\ \f[B]+ \[dq]x\[dq]\f[R] matches
\f[B]\[lq]x\[rq]\f[R], \f[B]\[lq]xx\[rq]\f[R], \f[B]\[lq]xxx\[rq]\f[R],
etc.)
.TP
@@ -263,14 +248,18 @@ etc.)
comma-separated words)
.TP
\f[B]..\f[R] \f[I]pat\f[R]
-Any text (except newlines) up to and including \f[I]pat\f[R]
+Any text (except newlines) up to and including \f[I]pat\f[R].
+This is a non-greedy match and does not span newlines.
.TP
\f[B].. %\f[R] \f[I]skip\f[R] \f[I]pat\f[R]
Any text (except newlines) up to and including \f[I]pat\f[R], skipping
over instances of \f[I]skip\f[R] (e.g.\ \f[B]\[aq]\[dq]\[aq]
\&..%(\[aq]\[rs]\[aq] .)
\[aq]\[dq]\[aq]\f[R] opening quote, up to closing quote, skipping over
-backslash followed by a single character)
+backslash followed by a single character).
+A useful application of the \f[B]%\f[R] operator is to skip over
+newlines to perform multi-line matches, e.g.\ \f[B]pat1 ..%\[rs]n
+pat2\f[R]
.TP
\f[B].. =\f[R] \f[I]only\f[R] \f[I]pat\f[R]
Any number of repetitions of the pattern \f[I]only\f[R] up to and
@@ -285,21 +274,14 @@ pat\f[R]
\f[B]<\f[R] \f[I]pat\f[R]
Matches at the current position if \f[I]pat\f[R] matches immediately
before the current position (lookbehind).
-Conceptually, you can think of this as creating a file containing only
-the \f[I]N\f[R] characters immediately before the current position and
-attempting to match \f[I]pat\f[R] on that file, for all values of
-\f[I]N\f[R] from the minimum number of characters \f[I]pat\f[R] can
-match up to maximum number of characters \f[I]pat\f[R] can match (or the
-length of the current line upto the current position, whichever is
-smaller).
\f[B]Note:\f[R] For fixed-length lookbehinds, this is quite efficient
-(e.g.\ \f[B]<(100 \[dq]x\[dq])\f[R]), however this could cause
-performance problems with variable-length lookbehinds
-(e.g.\ \f[B]<(\[dq]x\[dq] 0-100\[dq]y\[dq])\f[R]).
-Also, it is worth noting that \f[B]\[ha]\f[R], \f[B]\[ha]\[ha]\f[R],
-\f[B]$\f[R], and \f[B]$$\f[R] all match against the edges of the slice,
-which may give false positives if you were expecting them to match only
-against the edges file or line.
+(e.g.\ \f[B]<(100 \[dq]x\[dq])\f[R]), however this can cause performance
+problems with variable-length lookbehinds (e.g.\ \f[B]<(\[dq]x\[dq]
+0-100\[dq]y\[dq])\f[R]).
+Also, patterns like \f[B]\[ha]\f[R], \f[B]\[ha]\[ha]\f[R], \f[B]$\f[R],
+and \f[B]$$\f[R] that match against line/file edges will match against
+the edge of the lookbehind window, so they should generally be avoided
+in lookbehinds.
.TP
\f[B]>\f[R] \f[I]pat\f[R]
Matches \f[I]pat\f[R], but does not consume any input (lookahead).
@@ -319,7 +301,7 @@ See the \f[B]GRAMMAR FILES\f[R] section for more info.
\f[B]\[at]\f[R] \f[I]name\f[R] \f[B]:\f[R] \f[I]pat\f[R]
For the rest of the current chain, define \f[I]name\f[R] to match
whatever \f[I]pat\f[R] matches, i.e.\ a backreference.
-For example, \f[B]\[at]foo:word \[ga]( foo \[ga])\f[R] (matches
+For example, \f[B]\[at]my-word:word \[ga]( my-word \[ga])\f[R] (matches
\f[B]\[lq]asdf(asdf)\[rq]\f[R] or \f[B]\[lq]baz(baz)\[rq]\f[R], but not
\f[B]\[lq]foo(baz)\[rq]\f[R])
.TP
@@ -343,17 +325,21 @@ series of words, a colon, a newline, a tab, and then the first word.
\f[I]pat1\f[R] \f[B]\[ti]\f[R] \f[I]pat2\f[R]
Matches when \f[I]pat1\f[R] matches and \f[I]pat2\f[R] can be found
within the text of that match.
-(e.g.\ \f[B]comment \[ti] {TODO}\f[R] matches comments that contain the
-word \f[B]\[lq]TODO\[rq]\f[R])
+(e.g.\ \f[B]comment \[ti] \[dq]TODO\[dq]\f[R] matches comments that
+contain \f[B]\[lq]TODO\[rq]\f[R])
.TP
\f[I]pat1\f[R] \f[B]!\[ti]\f[R] \f[I]pat2\f[R]
Matches when \f[I]pat1\f[R] matches, but \f[I]pat2\f[R] can not be found
within the text of that match.
-(e.g.\ \f[B]comment \[ti] {IGNORE}\f[R] matches only comments that do
-not contain the word \f[B]\[lq]IGNORE\[rq]\f[R])
+(e.g.\ \f[B]comment \[ti] \[dq]IGNORE\[dq]\f[R] matches only comments
+that do not contain \f[B]\[lq]IGNORE\[rq]\f[R])
.TP
-\f[I]name\f[R]\f[B]:\f[R] \f[I]pat\f[R]
-Define \f[I]name\f[R] to mean \f[I]pat\f[R] (pattern definition)
+\f[I]name\f[R]\f[B]:\f[R] \f[I]pat1\f[R]; \f[I]pat2\f[R]
+Define \f[I]name\f[R] to mean \f[I]pat1\f[R] (pattern definition) inside
+the pattern \f[I]pat2\f[R].
+For example, a recursive pattern can be defined and used like this:
+\f[B]paren-comment: \[dq](*\[dq] ..%paren-comment \[dq]*)\[dq];
+paren-comment\f[R]
.TP
\f[B]\[at]:\f[R]\f[I]name\f[R] \f[B]=\f[R] \f[I]pat\f[R]
Match \f[I]pat\f[R] and tag it with the given name as metadata.
@@ -364,9 +350,8 @@ Syntactic sugar for \f[I]name\f[R]\f[B]:\f[R]
that also attaches a metadata tag of the same name)
.TP
\f[B]#\f[R] \f[I]comment\f[R]
-A line comment
+A line comment, ignored by BP
.SH GRAMMAR FILES
-.PP
\f[B]bp\f[R] allows loading extra grammar files, which define patterns
which may be used for matching.
The \f[B]builtins\f[R] grammar file is loaded by default, and it defines
@@ -375,9 +360,36 @@ For example, it defines the \f[B]parens\f[R] rule, which matches pairs
of matching parentheses, accounting for nested inner parentheses:
.RS
.PP
-\f[B]bp -p \[aq]\[dq]my_func\[dq] parens\[aq]\f[R]
+\f[B]bp \[aq]my_func{parens}\[aq]\f[R]
.RE
.PP
+BP\[cq]s builtin grammar file defines a few other commonly used patterns
+such as:
+.IP \[bu] 2
+\f[B]braces\f[R] (matching \f[B]{}\f[R] pairs), \f[B]brackets\f[R]
+(matching \f[B][]\f[R] pairs), \f[B]anglebraces\f[R] (matching
+\f[B]<>\f[R] pairs)
+.IP \[bu] 2
+\f[B]string\f[R]: a single- or double-quote delimited string, including
+standard escape sequences
+.IP \[bu] 2
+\f[B]id\f[R] or \f[B]var\f[R]: an identifier (full UTF-8 support)
+.IP \[bu] 2
+\f[B]word\f[R]: similar to \f[B]id\f[R]/\f[B]var\f[R], but can start
+with a number
+.IP \[bu] 2
+\f[B]Hex\f[R], \f[B]hex\f[R], \f[B]HEX\f[R]: a mixed-case, lowercase, or
+uppercase hex digit
+.IP \[bu] 2
+\f[B]digit\f[R]: a digit from 0-9
+.IP \[bu] 2
+\f[B]int\f[R]: one or more digits
+.IP \[bu] 2
+\f[B]number\f[R]: an int or floating point literal
+.IP \[bu] 2
+\f[B]esc\f[R], \f[B]tab\f[R], \f[B]nl\f[R], \f[B]cr\f[R],
+\f[B]crlf\f[R], \f[B]lf\f[R]: Shorthand for escape sequences
+.PP
\f[B]bp\f[R] also comes with a few grammar files for common programming
languages, which may be loaded on demand.
These grammar files are not comprehensive syntax definitions, but only
@@ -389,35 +401,35 @@ Thus, you can find all comments with the word \[lq]TODO\[rq] with the
following command:
.RS
.PP
-\f[B]bp -g c++ -p \[aq]comment \[ti] {TODO}\[aq] *.cpp\f[R]
+\f[B]bp -g c++ \[aq]{comment \[ti] \[dq]TODO\[dq]}\[aq] *.cpp\f[R]
.RE
.SH EXAMPLES
-.PP
-Find files containing the string \[lq]foo\[rq] (a string pattern):
+Find files containing the literal string \[lq]foo.baz\[rq] (a string
+pattern):
.RS
.PP
-\f[B]ls | bp foo\f[R]
+\f[B]ls | bp foo.baz\f[R]
.RE
.PP
Find files ending with \[lq].c\[rq] and print the name with the
\[lq].c\[rq] replaced with \[lq].h\[rq]:
.RS
.PP
-\f[B]ls | bp \[aq].c\[rs]$\[aq] -r \[aq].h\[aq]\f[R]
+\f[B]ls | bp \[aq].c{$}\[aq] -r \[aq].h\[aq]\f[R]
.RE
.PP
Find the word \[lq]foobar\[rq], followed by a pair of matching
parentheses in the file \f[I]my_file.py\f[R]:
.RS
.PP
-\f[B]bp -p \[aq]{foobar} parens\[aq] my_file.py\f[R]
+\f[B]bp \[aq]foobar{parens}\[aq] my_file.py\f[R]
.RE
.PP
Using the \f[I]html\f[R] grammar, find all \f[I]element\f[R]s matching
the tag \f[I]a\f[R] in the file \f[I]foo.html\f[R]:
.RS
.PP
-\f[B]bp -g html -p \[aq]element \[ti] (\[ha]\[ha]\[dq]<a \[dq])\[aq]
+\f[B]bp -g html \[aq]{element \[ti] (\[ha]\[ha]\[dq]<a \[dq])}\[aq]
foo.html\f[R]
.RE
.SH AUTHORS