diff options
| author | Bruce Hill <bruce@bruce-hill.com> | 2023-11-25 14:57:19 -0500 |
|---|---|---|
| committer | Bruce Hill <bruce@bruce-hill.com> | 2023-11-25 14:57:19 -0500 |
| commit | e6e482054de77f3fe5d65344da86065373cf5f23 (patch) | |
| tree | a6876c73bccf490512be5ff93fa808f3ce8d1c95 /bp.1 | |
| parent | e0a55ba6176df325b65b1768bba929805444bf88 (diff) | |
Deprecate '-p' flag and replace backslash interpolation with curly brace
interpolation
Diffstat (limited to 'bp.1')
| -rw-r--r-- | bp.1 | 154 |
1 files changed, 83 insertions, 71 deletions
@@ -1,41 +1,25 @@ -.\" Automatically generated by Pandoc 2.18 +.\" Automatically generated by Pandoc 3.1.8 .\" -.\" Define V font for inline verbatim, using C font in formats -.\" that render this, and otherwise B font. -.ie "\f[CB]x\f[]"x" \{\ -. ftr V B -. ftr VI BI -. ftr VB B -. ftr VBI BI -.\} -.el \{\ -. ftr V CR -. ftr VI CI -. ftr VB CB -. ftr VBI CBI -.\} .TH "BP" "1" "May 17 2021" "" "" -.hy .SH NAME -.PP bp - Bruce\[aq]s Parsing Expression Grammar tool .SH SYNOPSIS -.PP \f[B]bp\f[R] [\f[I]options\&...\f[R]] \f[I]pattern\f[R] [[\f[B]--\f[R]] \f[I]files\&...\f[R]] .SH DESCRIPTION -.PP \f[B]bp\f[R] is a tool that matches parsing expression grammars using a custom syntax. .SH OPTIONS .TP -\f[B]-p\f[R], \f[B]--pattern\f[R] \f[I]pat\f[R] -Give a pattern in BP syntax instead of string syntax (equivalent to -\f[B]bp \[aq]\[rs](pat)\[aq]\f[R] +\f[I]pattern\f[R] +The text to search for. +The main argument for \f[B]bp\f[R] is a string literals which may +contain BP syntax patterns. +See the \f[B]STRING PATTERNS\f[R] section below. .TP \f[B]-w\f[R], \f[B]--word\f[R] \f[I]word\f[R] Surround a string pattern with word boundaries (equivalent to \f[B]bp -\[aq]\[rs]|word\[rs]|\[aq]\f[R]) +\[aq]{|}word{|}\[aq]\f[R]) .TP \f[B]-e\f[R], \f[B]--explain\f[R] Print a visual explanation of the matches. @@ -101,11 +85,6 @@ formatting otherwise. \f[B]-h\f[R], \f[B]--help\f[R] Print the usage and exit. .TP -\f[I]pattern\f[R] -The main pattern for bp to match. -By default, this pattern is a string pattern (see the \f[B]STRING -PATTERNS\f[R] section below). -.TP \f[I]files\&...\f[R] The input files to search. If no input files are provided and data was piped in, that data will be @@ -113,19 +92,22 @@ used instead. If neither are provided, \f[B]bp\f[R] will search through all files in the current directory and its subdirectories (recursively). .SH STRING PATTERNS -.PP One of the most common use cases for pattern matching tools is matching plain, literal strings, or strings that are primarily plain strings, with one or two patterns. \f[B]bp\f[R] is designed around this fact. The default mode for bp patterns is \[lq]string pattern mode\[rq]. In string pattern mode, all characters are interpreted literally except -for the backslash (\f[B]\[rs]\f[R]), which may be followed by an escape -or a bp pattern (see the \f[B]PATTERNS\f[R] section below). -Optionally, the bp pattern may be terminated by a semicolon -(\f[B];\f[R]). +for curly braces \f[B]{}\f[R], which mark a region of BP syntax patterns +(see the \f[B]PATTERNS\f[R] section below). +In other words, when passing a search query to \f[B]bp\f[R], you do not +need to escape periods, quotation marks, backslashes, or any other +character, as long as it fits inside a shell string literal. +In order to match a literal \f[B]{\f[R], you can either search for the +character literal: \f[B]{\[ga]{}\f[R], the string literal: +\f[B]{\[dq]{\[dq]}\f[R], or a pair of matching curly braces using the +\f[B]braces\f[R] rule: \f[B]{braces}\f[R]. .SH PATTERNS -.PP \f[B]bp\f[R] patterns are based off of a combination of Parsing Expression Grammars and regular expression syntax. The syntax is designed to map closely to verbal descriptions of the @@ -146,7 +128,7 @@ A choice: \f[I]pat1\f[R], or if it doesn\[aq]t match, then \f[I]pat2\f[R] .TP \f[B].\f[R] -Any character (excluding newline) +The period pattern matches single character (excluding newline) .TP \f[B]\[ha]\f[R] Start of a line @@ -227,11 +209,14 @@ A word boundary (i.e.\ the edge of a word). \f[B]\[rs]b\f[R] Alias for \f[B]|\f[R] (word boundary) .TP +\f[B](\f[R] \f[I]pat\f[R] \f[B])\f[R] +Parentheses can be used to delineate patterns, as in most languages. +.TP \f[B]!\f[R] \f[I]pat\f[R] -Not \f[I]pat\f[R] +Not \f[I]pat\f[R] (don\[cq]t match if \f[I]pat\f[R] matches here) .TP \f[B][\f[R] \f[I]pat\f[R] \f[B]]\f[R] -Maybe \f[I]pat\f[R] +Maybe \f[I]pat\f[R] (match zero or one occurrences of \f[I]pat\f[R]) .TP \f[I]N\f[R] \f[I]pat\f[R] Exactly \f[I]N\f[R] repetitions of \f[I]pat\f[R] (e.g.\ \f[B]5 @@ -253,7 +238,7 @@ Any \f[I]pat\f[R]s (zero or more, e.g.\ \f[B]* \[dq]x\[dq]\f[R] matches etc.) .TP \f[B]+\f[R] \f[I]pat\f[R] -Some \f[I]pat\f[R]s (e.g.\ \f[B]+ \[dq]x\[dq]\f[R] matches +Some \f[I]pat\f[R]s (one or more, e.g.\ \f[B]+ \[dq]x\[dq]\f[R] matches \f[B]\[lq]x\[rq]\f[R], \f[B]\[lq]xx\[rq]\f[R], \f[B]\[lq]xxx\[rq]\f[R], etc.) .TP @@ -263,14 +248,18 @@ etc.) comma-separated words) .TP \f[B]..\f[R] \f[I]pat\f[R] -Any text (except newlines) up to and including \f[I]pat\f[R] +Any text (except newlines) up to and including \f[I]pat\f[R]. +This is a non-greedy match and does not span newlines. .TP \f[B].. %\f[R] \f[I]skip\f[R] \f[I]pat\f[R] Any text (except newlines) up to and including \f[I]pat\f[R], skipping over instances of \f[I]skip\f[R] (e.g.\ \f[B]\[aq]\[dq]\[aq] \&..%(\[aq]\[rs]\[aq] .) \[aq]\[dq]\[aq]\f[R] opening quote, up to closing quote, skipping over -backslash followed by a single character) +backslash followed by a single character). +A useful application of the \f[B]%\f[R] operator is to skip over +newlines to perform multi-line matches, e.g.\ \f[B]pat1 ..%\[rs]n +pat2\f[R] .TP \f[B].. =\f[R] \f[I]only\f[R] \f[I]pat\f[R] Any number of repetitions of the pattern \f[I]only\f[R] up to and @@ -285,21 +274,14 @@ pat\f[R] \f[B]<\f[R] \f[I]pat\f[R] Matches at the current position if \f[I]pat\f[R] matches immediately before the current position (lookbehind). -Conceptually, you can think of this as creating a file containing only -the \f[I]N\f[R] characters immediately before the current position and -attempting to match \f[I]pat\f[R] on that file, for all values of -\f[I]N\f[R] from the minimum number of characters \f[I]pat\f[R] can -match up to maximum number of characters \f[I]pat\f[R] can match (or the -length of the current line upto the current position, whichever is -smaller). \f[B]Note:\f[R] For fixed-length lookbehinds, this is quite efficient -(e.g.\ \f[B]<(100 \[dq]x\[dq])\f[R]), however this could cause -performance problems with variable-length lookbehinds -(e.g.\ \f[B]<(\[dq]x\[dq] 0-100\[dq]y\[dq])\f[R]). -Also, it is worth noting that \f[B]\[ha]\f[R], \f[B]\[ha]\[ha]\f[R], -\f[B]$\f[R], and \f[B]$$\f[R] all match against the edges of the slice, -which may give false positives if you were expecting them to match only -against the edges file or line. +(e.g.\ \f[B]<(100 \[dq]x\[dq])\f[R]), however this can cause performance +problems with variable-length lookbehinds (e.g.\ \f[B]<(\[dq]x\[dq] +0-100\[dq]y\[dq])\f[R]). +Also, patterns like \f[B]\[ha]\f[R], \f[B]\[ha]\[ha]\f[R], \f[B]$\f[R], +and \f[B]$$\f[R] that match against line/file edges will match against +the edge of the lookbehind window, so they should generally be avoided +in lookbehinds. .TP \f[B]>\f[R] \f[I]pat\f[R] Matches \f[I]pat\f[R], but does not consume any input (lookahead). @@ -319,7 +301,7 @@ See the \f[B]GRAMMAR FILES\f[R] section for more info. \f[B]\[at]\f[R] \f[I]name\f[R] \f[B]:\f[R] \f[I]pat\f[R] For the rest of the current chain, define \f[I]name\f[R] to match whatever \f[I]pat\f[R] matches, i.e.\ a backreference. -For example, \f[B]\[at]foo:word \[ga]( foo \[ga])\f[R] (matches +For example, \f[B]\[at]my-word:word \[ga]( my-word \[ga])\f[R] (matches \f[B]\[lq]asdf(asdf)\[rq]\f[R] or \f[B]\[lq]baz(baz)\[rq]\f[R], but not \f[B]\[lq]foo(baz)\[rq]\f[R]) .TP @@ -343,17 +325,21 @@ series of words, a colon, a newline, a tab, and then the first word. \f[I]pat1\f[R] \f[B]\[ti]\f[R] \f[I]pat2\f[R] Matches when \f[I]pat1\f[R] matches and \f[I]pat2\f[R] can be found within the text of that match. -(e.g.\ \f[B]comment \[ti] {TODO}\f[R] matches comments that contain the -word \f[B]\[lq]TODO\[rq]\f[R]) +(e.g.\ \f[B]comment \[ti] \[dq]TODO\[dq]\f[R] matches comments that +contain \f[B]\[lq]TODO\[rq]\f[R]) .TP \f[I]pat1\f[R] \f[B]!\[ti]\f[R] \f[I]pat2\f[R] Matches when \f[I]pat1\f[R] matches, but \f[I]pat2\f[R] can not be found within the text of that match. -(e.g.\ \f[B]comment \[ti] {IGNORE}\f[R] matches only comments that do -not contain the word \f[B]\[lq]IGNORE\[rq]\f[R]) +(e.g.\ \f[B]comment \[ti] \[dq]IGNORE\[dq]\f[R] matches only comments +that do not contain \f[B]\[lq]IGNORE\[rq]\f[R]) .TP -\f[I]name\f[R]\f[B]:\f[R] \f[I]pat\f[R] -Define \f[I]name\f[R] to mean \f[I]pat\f[R] (pattern definition) +\f[I]name\f[R]\f[B]:\f[R] \f[I]pat1\f[R]; \f[I]pat2\f[R] +Define \f[I]name\f[R] to mean \f[I]pat1\f[R] (pattern definition) inside +the pattern \f[I]pat2\f[R]. +For example, a recursive pattern can be defined and used like this: +\f[B]paren-comment: \[dq](*\[dq] ..%paren-comment \[dq]*)\[dq]; +paren-comment\f[R] .TP \f[B]\[at]:\f[R]\f[I]name\f[R] \f[B]=\f[R] \f[I]pat\f[R] Match \f[I]pat\f[R] and tag it with the given name as metadata. @@ -364,9 +350,8 @@ Syntactic sugar for \f[I]name\f[R]\f[B]:\f[R] that also attaches a metadata tag of the same name) .TP \f[B]#\f[R] \f[I]comment\f[R] -A line comment +A line comment, ignored by BP .SH GRAMMAR FILES -.PP \f[B]bp\f[R] allows loading extra grammar files, which define patterns which may be used for matching. The \f[B]builtins\f[R] grammar file is loaded by default, and it defines @@ -375,9 +360,36 @@ For example, it defines the \f[B]parens\f[R] rule, which matches pairs of matching parentheses, accounting for nested inner parentheses: .RS .PP -\f[B]bp -p \[aq]\[dq]my_func\[dq] parens\[aq]\f[R] +\f[B]bp \[aq]my_func{parens}\[aq]\f[R] .RE .PP +BP\[cq]s builtin grammar file defines a few other commonly used patterns +such as: +.IP \[bu] 2 +\f[B]braces\f[R] (matching \f[B]{}\f[R] pairs), \f[B]brackets\f[R] +(matching \f[B][]\f[R] pairs), \f[B]anglebraces\f[R] (matching +\f[B]<>\f[R] pairs) +.IP \[bu] 2 +\f[B]string\f[R]: a single- or double-quote delimited string, including +standard escape sequences +.IP \[bu] 2 +\f[B]id\f[R] or \f[B]var\f[R]: an identifier (full UTF-8 support) +.IP \[bu] 2 +\f[B]word\f[R]: similar to \f[B]id\f[R]/\f[B]var\f[R], but can start +with a number +.IP \[bu] 2 +\f[B]Hex\f[R], \f[B]hex\f[R], \f[B]HEX\f[R]: a mixed-case, lowercase, or +uppercase hex digit +.IP \[bu] 2 +\f[B]digit\f[R]: a digit from 0-9 +.IP \[bu] 2 +\f[B]int\f[R]: one or more digits +.IP \[bu] 2 +\f[B]number\f[R]: an int or floating point literal +.IP \[bu] 2 +\f[B]esc\f[R], \f[B]tab\f[R], \f[B]nl\f[R], \f[B]cr\f[R], +\f[B]crlf\f[R], \f[B]lf\f[R]: Shorthand for escape sequences +.PP \f[B]bp\f[R] also comes with a few grammar files for common programming languages, which may be loaded on demand. These grammar files are not comprehensive syntax definitions, but only @@ -389,35 +401,35 @@ Thus, you can find all comments with the word \[lq]TODO\[rq] with the following command: .RS .PP -\f[B]bp -g c++ -p \[aq]comment \[ti] {TODO}\[aq] *.cpp\f[R] +\f[B]bp -g c++ \[aq]{comment \[ti] \[dq]TODO\[dq]}\[aq] *.cpp\f[R] .RE .SH EXAMPLES -.PP -Find files containing the string \[lq]foo\[rq] (a string pattern): +Find files containing the literal string \[lq]foo.baz\[rq] (a string +pattern): .RS .PP -\f[B]ls | bp foo\f[R] +\f[B]ls | bp foo.baz\f[R] .RE .PP Find files ending with \[lq].c\[rq] and print the name with the \[lq].c\[rq] replaced with \[lq].h\[rq]: .RS .PP -\f[B]ls | bp \[aq].c\[rs]$\[aq] -r \[aq].h\[aq]\f[R] +\f[B]ls | bp \[aq].c{$}\[aq] -r \[aq].h\[aq]\f[R] .RE .PP Find the word \[lq]foobar\[rq], followed by a pair of matching parentheses in the file \f[I]my_file.py\f[R]: .RS .PP -\f[B]bp -p \[aq]{foobar} parens\[aq] my_file.py\f[R] +\f[B]bp \[aq]foobar{parens}\[aq] my_file.py\f[R] .RE .PP Using the \f[I]html\f[R] grammar, find all \f[I]element\f[R]s matching the tag \f[I]a\f[R] in the file \f[I]foo.html\f[R]: .RS .PP -\f[B]bp -g html -p \[aq]element \[ti] (\[ha]\[ha]\[dq]<a \[dq])\[aq] +\f[B]bp -g html \[aq]{element \[ti] (\[ha]\[ha]\[dq]<a \[dq])}\[aq] foo.html\f[R] .RE .SH AUTHORS |
