(437 lines)
1 .\" Automatically generated by Pandoc 3.1.82 .\"3 .TH "BP" "1" "May 17 2021" "" ""4 .SH NAME5 bp - Bruce\[aq]s Parsing Expression Grammar tool6 .SH SYNOPSIS7 \f[B]bp\f[R] [\f[I]options\&...\f[R]] \f[I]pattern\f[R] [[\f[B]--\f[R]]8 \f[I]files\&...\f[R]]9 .SH DESCRIPTION10 \f[B]bp\f[R] is a tool that matches parsing expression grammars using a11 custom syntax.12 .SH OPTIONS13 .TP14 \f[I]pattern\f[R]15 The text to search for.16 The main argument for \f[B]bp\f[R] is a string literals which may17 contain BP syntax patterns.18 See the \f[B]STRING PATTERNS\f[R] section below.19 .TP20 \f[B]-w\f[R], \f[B]--word\f[R] \f[I]word\f[R]21 Surround a string pattern with word boundaries (equivalent to \f[B]bp22 \[aq]{|}word{|}\[aq]\f[R])23 .TP24 \f[B]-e\f[R], \f[B]--explain\f[R]25 Print a visual explanation of the matches.26 .TP27 \f[B]-l\f[R], \f[B]--list-files\f[R]28 Print only the names of files containing matches instead of the matches29 themselves.30 .TP31 \f[B]-c\f[R], \f[B]--case\f[R]32 Perform pattern matching with case-sensitivity (the default is smart33 casing, i.e.\ case-insensitive, unless there are any uppercase letters34 present).35 .TP36 \f[B]-i\f[R], \f[B]--ignore-case\f[R]37 Perform pattern matching case-insensitively.38 .TP39 \f[B]-I\f[R], \f[B]--inplace\f[R]40 Perform filtering or replacement in-place (i.e.\ overwrite files with41 new content).42 .TP43 \f[B]-r\f[R], \f[B]--replace\f[R] \f[I]replacement\f[R]44 Replace all occurrences of the main pattern with the given string.45 .TP46 \f[B]-s\f[R], \f[B]--skip\f[R] \f[I]pattern\f[R]47 While looking for matches, skip over \f[I]pattern\f[R] occurrences.48 This can be useful for behavior like \f[B]bp -s string\f[R] (avoiding49 matches inside string literals).50 .TP51 \f[B]-g\f[R], \f[B]--grammar\f[R] \f[I]grammar-file\f[R]52 Load the grammar from the given file.53 See the \f[B]GRAMMAR FILES\f[R] section for more info.54 .TP55 \f[B]-G\f[R], \f[B]--git\f[R]56 Use \f[B]git\f[R] to get a list of files.57 Remaining file arguments (if any) are passed to \f[B]git --ls-files\f[R]58 instead of treated as literal files.59 .TP60 \f[B]-B\f[R], \f[B]--context-before\f[R] \f[I]N\f[R]61 The number of lines of context to print before each match (default: 0).62 See \f[B]--context\f[R] below for details on \f[B]none\f[R] or63 \f[B]all\f[R].64 .TP65 \f[B]-A\f[R], \f[B]--context-after\f[R] \f[I]N\f[R]66 The number of lines of context to print after each match (default: 0).67 See \f[B]--context\f[R] below for details on \f[B]none\f[R] or68 \f[B]all\f[R].69 .TP70 \f[B]-C\f[R], \f[B]--context\f[R] \f[I]N\f[R]71 The number of lines to print before and after each match (default: 0).72 If \f[I]N\f[R] is \f[B]none\f[R], print only the exact text of the73 matches.74 If \f[I]N\f[R] is \f[B]\[lq]all\[rq]\f[R], print all text before and75 after each match.76 .TP77 \f[B]-f\f[R], \f[B]--format\f[R] \f[I]fancy\f[R]|\f[I]plain\f[R]|\f[I]bare\f[R]|\f[I]file:line\f[R]|\f[I]auto\f[R]78 Set the output format.79 \f[I]fancy\f[R] includes colors and line numbers, \f[I]plain\f[R] prints80 line numbers with no coloring, \f[I]bare\f[R] prints only the match81 text, \f[I]file:line\f[R] prints the filename and line number for each82 match (grep-style), and \f[I]auto\f[R] (the default) uses83 \f[I]fancy\f[R] formatting when the output is a TTY and \f[I]bare\f[R]84 formatting otherwise.85 .TP86 \f[B]-h\f[R], \f[B]--help\f[R]87 Print the usage and exit.88 .TP89 \f[I]files\&...\f[R]90 The input files to search.91 If no input files are provided and data was piped in, that data will be92 used instead.93 If neither are provided, \f[B]bp\f[R] will search through all files in94 the current directory and its subdirectories (recursively).95 .SH STRING PATTERNS96 One of the most common use cases for pattern matching tools is matching97 plain, literal strings, or strings that are primarily plain strings,98 with one or two patterns.99 \f[B]bp\f[R] is designed around this fact.100 The default mode for bp patterns is \[lq]string pattern mode\[rq].101 In string pattern mode, all characters are interpreted literally except102 for curly braces \f[B]{}\f[R], which mark a region of BP syntax patterns103 (see the \f[B]PATTERNS\f[R] section below).104 In other words, when passing a search query to \f[B]bp\f[R], you do not105 need to escape periods, quotation marks, backslashes, or any other106 character, as long as it fits inside a shell string literal.107 In order to match a literal \f[B]{\f[R], you can either search for the108 character literal: \f[B]{\[ga]{}\f[R], the string literal:109 \f[B]{\[dq]{\[dq]}\f[R], or a pair of matching curly braces using the110 \f[B]braces\f[R] rule: \f[B]{braces}\f[R].111 .SH PATTERNS112 \f[B]bp\f[R] patterns are based off of a combination of Parsing113 Expression Grammars and regular expression syntax.114 The syntax is designed to map closely to verbal descriptions of the115 patterns, and prefix operators are preferred over suffix operators (as116 is common in regex syntax).117 Patterns are whitespace-agnostic, so they work the same regardless of118 whether whitespace is present or not, except for string literals119 (\f[B]\[aq]...\[aq]\f[R] and \f[B]\[dq]...\[dq]\f[R]), character120 literals (\f[B]\[ga]\f[R]), and escape sequences (\f[B]\[rs]\f[R]).121 Whitespace between patterns or parts of a pattern should be used for122 clarity, but it will not affect the meaning of the pattern.123 .TP124 \f[I]pat1 pat2\f[R]125 A sequence: \f[I]pat1\f[R] followed by \f[I]pat2\f[R]126 .TP127 \f[I]pat1\f[R] \f[B]/\f[R] \f[I]pat2\f[R]128 A choice: \f[I]pat1\f[R], or if it doesn\[aq]t match, then129 \f[I]pat2\f[R]130 .TP131 \f[B].\f[R]132 The period pattern matches single character (excluding newline)133 .TP134 \f[B]\[ha]\f[R]135 Start of a line136 .TP137 \f[B]\[ha]\[ha]\f[R]138 Start of the text139 .TP140 \f[B]$\f[R]141 End of a line (does not include newline character)142 .TP143 \f[B]$$\f[R]144 End of the text145 .TP146 \f[B]_\f[R]147 Zero or more whitespace characters, including spaces and tabs, but not148 newlines.149 .TP150 \f[B]__\f[R]151 Zero or more whitespace characters, including spaces, tabs, newlines,152 and comments.153 Comments are undefined by default, but may be defined by a separate154 grammar file.155 See the \f[B]GRAMMAR FILES\f[R] section for more info.156 .TP157 \f[B]\[dq]foo\[dq]\f[R], \f[B]\[aq]foo\[aq]\f[R]158 The literal string \f[B]\[lq]foo\[rq]\f[R].159 Single and double quotes are treated the same.160 Escape sequences are not allowed.161 .TP162 \f[B]\[ga]\f[R]\f[I]c\f[R]163 The literal character \f[I]c\f[R] (e.g.\ \f[B]\[ga]\[at]\f[R] matches164 the \[lq]\[at]\[rq] character)165 .TP166 \f[B]\[ga]\f[R]\f[I]c1\f[R]\f[B]-\f[R]\f[I]c2\f[R]167 The character range \f[I]c1\f[R] to \f[I]c2\f[R]168 (e.g.\ \f[B]\[ga]a-z\f[R]).169 Multiple ranges can be combined with a comma170 (e.g.\ \f[B]\[ga]a-z,A-Z\f[R]).171 .TP172 \f[B]\[ga]\f[R]\f[I]c1\f[R]\f[B],\f[R]\f[I]c2\f[R]173 Any one of the given character or character ranges \f[I]c1\f[R] or174 \f[I]c2\f[R] (e.g.\ \f[B]\[ga]a,e,i,o,u,0-9\f[R])175 .TP176 \f[B]\[rs]\f[R]\f[I]esc\f[R]177 An escape sequence (e.g.\ \f[B]\[rs]n\f[R], \f[B]\[rs]x1F\f[R],178 \f[B]\[rs]033\f[R], etc.)179 .TP180 \f[B]\[rs]\f[R]\f[I]esc1\f[R]\f[B]-\f[R]\f[I]esc2\f[R]181 An escape sequence range from \f[I]esc1\f[R] to \f[I]esc2\f[R]182 (e.g.\ \f[B]\[rs]x00-x1F\f[R])183 .TP184 \f[B]\[rs]\f[R]\f[I]esc1\f[R]\f[B],\f[R]\f[I]esc2\f[R]185 Any one of the given escape sequences or ranges \f[I]esc1\f[R] or186 \f[I]esc2\f[R] (e.g.\ \f[B]\[rs]r,n,x01-x04\f[R])187 .TP188 \f[B]\[rs]N\f[R]189 A special escape that matches a \[lq]nodent\[rq]: one or more newlines190 followed by the same indentation that occurs on the current line.191 .TP192 \f[B]\[rs]C\f[R]193 A special escape that always matches the empty string and replaces it194 with the indentation of the line on which it matched.195 For example, this pattern would match Bash-style heredocs that start196 with \[lq]<<-FOO\[rq] and end with a line containing only the starting197 indentation and the string \[lq]FOO\[rq]: \f[B]\[dq]<<-\[dq]198 \[at]end=(\[rs]C id) ..%\[rs]n (\[ha]end$)\f[R]199 .TP200 \f[B]\[rs]i\f[R]201 An identifier character (e.g.\ alphanumeric characters or underscores).202 .TP203 \f[B]\[rs]I\f[R]204 An identifier character, not including numbers (e.g.\ alphabetic205 characters or underscores).206 .TP207 \f[B]|\f[R]208 A word boundary (i.e.\ the edge of a word).209 .TP210 \f[B]\[rs]b\f[R]211 Alias for \f[B]|\f[R] (word boundary)212 .TP213 \f[B](\f[R] \f[I]pat\f[R] \f[B])\f[R]214 Parentheses can be used to delineate patterns, as in most languages.215 .TP216 \f[B]!\f[R] \f[I]pat\f[R]217 Not \f[I]pat\f[R] (don\[cq]t match if \f[I]pat\f[R] matches here)218 .TP219 \f[B][\f[R] \f[I]pat\f[R] \f[B]]\f[R]220 Maybe \f[I]pat\f[R] (match zero or one occurrences of \f[I]pat\f[R])221 .TP222 \f[I]N\f[R] \f[I]pat\f[R]223 Exactly \f[I]N\f[R] repetitions of \f[I]pat\f[R] (e.g.\ \f[B]5224 \[dq]x\[dq]\f[R] matches \f[B]\[lq]xxxxx\[rq]\f[R])225 .TP226 \f[I]N\f[R] \f[B]-\f[R] \f[I]M\f[R] \f[I]pat\f[R]227 Between \f[I]N\f[R] and \f[I]M\f[R] repetitions of \f[I]pat\f[R]228 (e.g.\ \f[B]2-3 \[dq]x\[dq]\f[R] matches \f[B]\[lq]xx\[rq]\f[R] or229 \f[B]\[lq]xxx\[rq]\f[R])230 .TP231 \f[I]N\f[R]\f[B]+\f[R] \f[I]pat\f[R]232 At least \f[I]N\f[R] or more repetitions of \f[I]pat\f[R] (e.g.\ \f[B]2+233 \[dq]x\[dq]\f[R] matches \f[B]\[lq]xx\[rq]\f[R],234 \f[B]\[lq]xxx\[rq]\f[R], \f[B]\[lq]xxxx\[rq]\f[R], etc.)235 .TP236 \f[B]*\f[R] \f[I]pat\f[R]237 Any \f[I]pat\f[R]s (zero or more, e.g.\ \f[B]* \[dq]x\[dq]\f[R] matches238 \f[B]\[lq]\[lq]\f[R], \f[B]\[rq]x\[rq]\f[R], \f[B]\[lq]xx\[rq]\f[R],239 etc.)240 .TP241 \f[B]+\f[R] \f[I]pat\f[R]242 Some \f[I]pat\f[R]s (one or more, e.g.\ \f[B]+ \[dq]x\[dq]\f[R] matches243 \f[B]\[lq]x\[rq]\f[R], \f[B]\[lq]xx\[rq]\f[R], \f[B]\[lq]xxx\[rq]\f[R],244 etc.)245 .TP246 \f[I]repeating-pat\f[R] \f[B]%\f[R] \f[I]sep\f[R]247 \f[I]repeating-pat\f[R] (see the examples above) separated by248 \f[I]sep\f[R] (e.g.\ \f[B]*word % \[dq],\[dq]\f[R] matches zero or more249 comma-separated words)250 .TP251 \f[B]..\f[R] \f[I]pat\f[R]252 Any text (except newlines) up to and including \f[I]pat\f[R].253 This is a non-greedy match and does not span newlines.254 .TP255 \f[B].. %\f[R] \f[I]skip\f[R] \f[I]pat\f[R]256 Any text (except newlines) up to and including \f[I]pat\f[R], skipping257 over instances of \f[I]skip\f[R] (e.g.\ \f[B]\[aq]\[dq]\[aq]258 \&..%(\[aq]\[rs]\[aq] .)259 \[aq]\[dq]\[aq]\f[R] opening quote, up to closing quote, skipping over260 backslash followed by a single character).261 A useful application of the \f[B]%\f[R] operator is to skip over262 newlines to perform multi-line matches, e.g.\ \f[B]pat1 ..%\[rs]n263 pat2\f[R]264 .TP265 \f[B].. =\f[R] \f[I]only\f[R] \f[I]pat\f[R]266 Any number of repetitions of the pattern \f[I]only\f[R] up to and267 including \f[I]pat\f[R] (e.g.\ \f[B]\[dq]f\[dq] ..=abc \[dq]k\[dq]\f[R]268 matches the letter \[lq]f\[rq] followed by some alphabetic characters269 and then a \[lq]k\[rq], which would match \[lq]fork\[rq], but not270 \[lq]free kit\[rq]) This is essentially a \[lq]non-greedy\[rq] version271 of \f[B]*\f[R], and \f[B]..272 pat\f[R] can be thought of as the special case of \f[B]..=.273 pat\f[R]274 .TP275 \f[B]<\f[R] \f[I]pat\f[R]276 Matches at the current position if \f[I]pat\f[R] matches immediately277 before the current position (lookbehind).278 \f[B]Note:\f[R] For fixed-length lookbehinds, this is quite efficient279 (e.g.\ \f[B]<(100 \[dq]x\[dq])\f[R]), however this can cause performance280 problems with variable-length lookbehinds (e.g.\ \f[B]<(\[dq]x\[dq]281 0-100\[dq]y\[dq])\f[R]).282 Also, patterns like \f[B]\[ha]\f[R], \f[B]\[ha]\[ha]\f[R], \f[B]$\f[R],283 and \f[B]$$\f[R] that match against line/file edges will match against284 the edge of the lookbehind window, so they should generally be avoided285 in lookbehinds.286 .TP287 \f[B]>\f[R] \f[I]pat\f[R]288 Matches \f[I]pat\f[R], but does not consume any input (lookahead).289 .TP290 \f[B]\[at]\f[R] \f[I]pat\f[R]291 Capture \f[I]pat\f[R].292 Captured patterns can be used in replacements.293 .TP294 \f[B]foo\f[R]295 The named pattern whose name is \f[B]\[lq]foo\[rq]\f[R].296 Pattern names come from definitions in grammar files or from named297 captures.298 Pattern names may contain dashes (\f[B]-\f[R]), but not underscores299 (\f[B]_\f[R]), since the underscore is used to match whitespace.300 See the \f[B]GRAMMAR FILES\f[R] section for more info.301 .TP302 \f[B]\[at]\f[R] \f[I]name\f[R] \f[B]:\f[R] \f[I]pat\f[R]303 For the rest of the current chain, define \f[I]name\f[R] to match304 whatever \f[I]pat\f[R] matches, i.e.\ a backreference.305 For example, \f[B]\[at]my-word:word \[ga]( my-word \[ga])\f[R] (matches306 \f[B]\[lq]asdf(asdf)\[rq]\f[R] or \f[B]\[lq]baz(baz)\[rq]\f[R], but not307 \f[B]\[lq]foo(baz)\[rq]\f[R])308 .TP309 \f[B]\[at]\f[R] \f[I]name\f[R] \f[B]=\f[R] \f[I]pat\f[R]310 Let \f[I]name\f[R] equal \f[I]pat\f[R] (named capture).311 Named captures can be used in text replacements.312 .TP313 \f[I]pat\f[R] \f[B]=>\f[R] \f[B]\[dq]\f[R]\f[I]replacement\f[R]\f[B]\[dq]\f[R]314 Replace \f[I]pat\f[R] with \f[I]replacement\f[R].315 Note: \f[I]replacement\f[R] should be a string (single or double316 quoted), and it may contain escape sequences (e.g.\ \f[B]\[rs]n\f[R]) or317 references to captured values: \f[B]\[at]0\f[R] (the whole of318 \f[I]pat\f[R]), \f[B]\[at]1\f[R] (the first capture in \f[I]pat\f[R]),319 \f[B]\[at]\f[R]\f[I]foo\f[R] (the capture named \f[I]foo\f[R] in320 \f[I]pat\f[R]), etc.321 For example, \f[B]\[at]word _ \[at]rest=(*word % _) =>322 \[dq]\[at]rest:\[rs]n\[rs]t\[at]1\[dq]\f[R] matches a word followed by323 whitespace, followed by a series of words and replaces it with the324 series of words, a colon, a newline, a tab, and then the first word.325 .TP326 \f[I]pat1\f[R] \f[B]\[ti]\f[R] \f[I]pat2\f[R]327 Matches when \f[I]pat1\f[R] matches and \f[I]pat2\f[R] can be found328 within the text of that match.329 (e.g.\ \f[B]comment \[ti] \[dq]TODO\[dq]\f[R] matches comments that330 contain \f[B]\[lq]TODO\[rq]\f[R])331 .TP332 \f[I]pat1\f[R] \f[B]!\[ti]\f[R] \f[I]pat2\f[R]333 Matches when \f[I]pat1\f[R] matches, but \f[I]pat2\f[R] can not be found334 within the text of that match.335 (e.g.\ \f[B]comment \[ti] \[dq]IGNORE\[dq]\f[R] matches only comments336 that do not contain \f[B]\[lq]IGNORE\[rq]\f[R])337 .TP338 \f[I]name\f[R]\f[B]:\f[R] \f[I]pat1\f[R]; \f[I]pat2\f[R]339 Define \f[I]name\f[R] to mean \f[I]pat1\f[R] (pattern definition) inside340 the pattern \f[I]pat2\f[R].341 For example, a recursive pattern can be defined and used like this:342 \f[B]paren-comment: \[dq](*\[dq] ..%paren-comment \[dq]*)\[dq];343 paren-comment\f[R]344 .TP345 \f[B]\[at]:\f[R]\f[I]name\f[R] \f[B]=\f[R] \f[I]pat\f[R]346 Match \f[I]pat\f[R] and tag it with the given name as metadata.347 .TP348 \f[I]name\f[R]\f[B]::\f[R] \f[I]pat\f[R]349 Syntactic sugar for \f[I]name\f[R]\f[B]:\f[R]350 \f[B]\[at]:\f[R]\f[I]name\f[R]\f[B]=\f[R]\f[I]pat\f[R] (define a pattern351 that also attaches a metadata tag of the same name)352 .TP353 \f[B]#\f[R] \f[I]comment\f[R]354 A line comment, ignored by BP355 .SH GRAMMAR FILES356 \f[B]bp\f[R] allows loading extra grammar files, which define patterns357 which may be used for matching.358 The \f[B]builtins\f[R] grammar file is loaded by default, and it defines359 a few useful general-purpose patterns.360 For example, it defines the \f[B]parens\f[R] rule, which matches pairs361 of matching parentheses, accounting for nested inner parentheses:362 .RS363 .PP364 \f[B]bp \[aq]my_func{parens}\[aq]\f[R]365 .RE366 .PP367 BP\[cq]s builtin grammar file defines a few other commonly used patterns368 such as:369 .IP \[bu] 2370 \f[B]braces\f[R] (matching \f[B]{}\f[R] pairs), \f[B]brackets\f[R]371 (matching \f[B][]\f[R] pairs), \f[B]anglebraces\f[R] (matching372 \f[B]<>\f[R] pairs)373 .IP \[bu] 2374 \f[B]string\f[R]: a single- or double-quote delimited string, including375 standard escape sequences376 .IP \[bu] 2377 \f[B]id\f[R] or \f[B]var\f[R]: an identifier (full UTF-8 support)378 .IP \[bu] 2379 \f[B]word\f[R]: similar to \f[B]id\f[R]/\f[B]var\f[R], but can start380 with a number381 .IP \[bu] 2382 \f[B]Hex\f[R], \f[B]hex\f[R], \f[B]HEX\f[R]: a mixed-case, lowercase, or383 uppercase hex digit384 .IP \[bu] 2385 \f[B]digit\f[R]: a digit from 0-9386 .IP \[bu] 2387 \f[B]int\f[R]: one or more digits388 .IP \[bu] 2389 \f[B]number\f[R]: an int or floating point literal390 .IP \[bu] 2391 \f[B]esc\f[R], \f[B]tab\f[R], \f[B]nl\f[R], \f[B]cr\f[R],392 \f[B]crlf\f[R], \f[B]lf\f[R]: Shorthand for escape sequences393 .PP394 \f[B]bp\f[R] also comes with a few grammar files for common programming395 languages, which may be loaded on demand.396 These grammar files are not comprehensive syntax definitions, but only397 some common patterns.398 For example, the c++ grammar file contains definitions for399 \f[B]//\f[R]-style line comments as well as \f[B]/*...*/\f[R]-style400 block comments.401 Thus, you can find all comments with the word \[lq]TODO\[rq] with the402 following command:403 .RS404 .PP405 \f[B]bp -g c++ \[aq]{comment \[ti] \[dq]TODO\[dq]}\[aq] *.cpp\f[R]406 .RE407 .SH EXAMPLES408 Find files containing the literal string \[lq]foo.baz\[rq] (a string409 pattern):410 .RS411 .PP412 \f[B]ls | bp foo.baz\f[R]413 .RE414 .PP415 Find files ending with \[lq].c\[rq] and print the name with the416 \[lq].c\[rq] replaced with \[lq].h\[rq]:417 .RS418 .PP419 \f[B]ls | bp \[aq].c{$}\[aq] -r \[aq].h\[aq]\f[R]420 .RE421 .PP422 Find the word \[lq]foobar\[rq], followed by a pair of matching423 parentheses in the file \f[I]my_file.py\f[R]:424 .RS425 .PP426 \f[B]bp \[aq]foobar{parens}\[aq] my_file.py\f[R]427 .RE428 .PP429 Using the \f[I]html\f[R] grammar, find all \f[I]element\f[R]s matching430 the tag \f[I]a\f[R] in the file \f[I]foo.html\f[R]:431 .RS432 .PP433 \f[B]bp -g html \[aq]{element \[ti] (\[ha]\[ha]\[dq]<a \[dq])}\[aq]434 foo.html\f[R]435 .RE436 .SH AUTHORS437 Bruce Hill (\f[I]bruce\[at]bruce-hill.com\f[R]).