2021-05-19 22:02:45 -07:00
|
|
|
% BP(1)
|
|
|
|
% Bruce Hill (*bruce@bruce-hill.com*)
|
|
|
|
% May 17 2021
|
|
|
|
|
|
|
|
# NAME
|
|
|
|
|
|
|
|
bp - Bruce\'s Parsing Expression Grammar tool
|
|
|
|
|
|
|
|
# SYNOPSIS
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`bp` \[*options...*\] *pattern* \[\[`--`\] *files...*\]
|
2021-05-19 22:02:45 -07:00
|
|
|
|
|
|
|
# DESCRIPTION
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`bp` is a tool that matches parsing expression grammars using a custom
|
2021-05-19 22:02:45 -07:00
|
|
|
syntax.
|
|
|
|
|
|
|
|
# OPTIONS
|
|
|
|
|
2021-07-30 20:46:50 -07:00
|
|
|
`-p`, `--pattern` *pat*
|
|
|
|
: Give a pattern in BP syntax instead of string syntax (equivalent to `bp
|
|
|
|
'\(pat)'`
|
|
|
|
|
|
|
|
`-w`, `--word` *word*
|
|
|
|
: Surround a string pattern with word boundaries (equivalent to `bp
|
|
|
|
'\|word\|'`)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-e`, `--explain`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Print a visual explanation of the matches.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-j`, `--json`
|
|
|
|
: Print a JSON list of the matches. (Pairs with `--verbose` for more detail)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-l`, `--list-files`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Print only the names of files containing matches instead of the matches
|
|
|
|
themselves.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-i`, `--ignore-case`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Perform pattern matching case-insensitively.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-I`, `--inplace`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Perform filtering or replacement in-place (i.e. overwrite files with new
|
|
|
|
content).
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-r`, `--replace` *replacement*
|
2021-05-19 22:02:45 -07:00
|
|
|
: Replace all occurrences of the main pattern with the given string.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-s`, `--skip` *pattern*
|
2021-05-19 22:02:45 -07:00
|
|
|
: While looking for matches, skip over *pattern* occurrences. This can be
|
2021-05-23 15:21:46 -07:00
|
|
|
useful for behavior like `bp -s string` (avoiding matches inside string
|
2021-05-19 22:02:45 -07:00
|
|
|
literals).
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-g`, `--grammar` *grammar-file*
|
|
|
|
: Load the grammar from the given file. See the `GRAMMAR FILES` section
|
2021-05-19 22:02:45 -07:00
|
|
|
for more info.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`-G`, `--git`
|
|
|
|
: Use `git` to get a list of files. Remaining file arguments (if any) are
|
|
|
|
passed to `git --ls-files` instead of treated as literal files.
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-08-02 12:25:52 -07:00
|
|
|
`-B`, `--context-before` *N*
|
|
|
|
: The number of lines of context to print before each match (default: 0). See
|
|
|
|
`--context` below for details on `none` or `all`.
|
|
|
|
|
|
|
|
`-A`, `--context-after` *N*
|
|
|
|
: The number of lines of context to print after each match (default: 0). See
|
|
|
|
`--context` below for details on `none` or `all`.
|
|
|
|
|
|
|
|
|
2021-08-02 11:45:01 -07:00
|
|
|
`-C`, `--context` *N*
|
2021-08-02 12:25:52 -07:00
|
|
|
: The number of lines to print before and after each match (default: 0). If *N*
|
|
|
|
is `none`, print only the exact text of the matches. If *N* is **"all"**, print
|
|
|
|
all text before and after each match.
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-09-02 18:07:18 -07:00
|
|
|
`-f`, `--format` *fancy*\|*plain*\|*bare*\|*file:line*\|*auto*
|
2021-05-19 22:02:45 -07:00
|
|
|
: Set the output format. *fancy* includes colors and line numbers, *plain*
|
2021-09-02 18:07:18 -07:00
|
|
|
prints line numbers with no coloring, *bare* prints only the match text,
|
|
|
|
*file:line* prints the filename and line number for each match (grep-style),
|
|
|
|
and *auto* (the default) uses *fancy* formatting when the output is a TTY and
|
2021-09-04 14:09:20 -07:00
|
|
|
*bare* formatting otherwise.
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-07-30 20:46:50 -07:00
|
|
|
`-h`, `--help`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Print the usage and exit.
|
|
|
|
|
|
|
|
*pattern*
|
|
|
|
: The main pattern for bp to match. By default, this pattern is a string
|
|
|
|
pattern (see the **STRING PATTERNS** section below).
|
|
|
|
|
|
|
|
*files...*
|
|
|
|
: The input files to search. If no input files are provided and data was piped
|
2021-05-23 15:21:46 -07:00
|
|
|
in, that data will be used instead. If neither are provided, `bp` will search
|
2021-05-19 22:02:45 -07:00
|
|
|
through all files in the current directory and its subdirectories
|
|
|
|
(recursively).
|
|
|
|
|
|
|
|
|
|
|
|
# STRING PATTERNS
|
|
|
|
|
|
|
|
One of the most common use cases for pattern matching tools is matching plain,
|
|
|
|
literal strings, or strings that are primarily plain strings, with one or two
|
2021-05-23 15:21:46 -07:00
|
|
|
patterns. `bp` is designed around this fact. The default mode for bp patterns
|
2021-05-19 22:02:45 -07:00
|
|
|
is "string pattern mode". In string pattern mode, all characters are
|
2021-05-23 15:21:46 -07:00
|
|
|
interpreted literally except for the backslash (`\`), which may be followed by
|
2021-07-19 19:57:59 -07:00
|
|
|
an escape or a bp pattern (see the **PATTERNS** section below). Optionally, the
|
|
|
|
bp pattern may be terminated by a semicolon (`;`).
|
2021-05-19 22:02:45 -07:00
|
|
|
|
|
|
|
|
|
|
|
# PATTERNS
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`bp` patterns are based off of a combination of Parsing Expression Grammars and
|
|
|
|
regular expression syntax. The syntax is designed to map closely to verbal
|
2021-05-19 22:02:45 -07:00
|
|
|
descriptions of the patterns, and prefix operators are preferred over suffix
|
2021-05-23 15:21:46 -07:00
|
|
|
operators (as is common in regex syntax). Patterns are whitespace-agnostic, so
|
|
|
|
they work the same regardless of whether whitespace is present or not, except
|
|
|
|
for string literals (`'...'` and `"..."`), character literals (`` ` ``), and
|
|
|
|
escape sequences (`\`). Whitespace between patterns or parts of a pattern
|
|
|
|
should be used for clarity, but it will not affect the meaning of the pattern.
|
2021-05-19 22:02:45 -07:00
|
|
|
|
|
|
|
*pat1 pat2*
|
|
|
|
: A sequence: *pat1* followed by *pat2*
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*pat1* `/` *pat2*
|
2021-05-19 22:02:45 -07:00
|
|
|
: A choice: *pat1*, or if it doesn\'t match, then *pat2*
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`.`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Any character (excluding newline)
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`^`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Start of a line
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`^^`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Start of the text
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`$`
|
2021-05-19 22:02:45 -07:00
|
|
|
: End of a line (does not include newline character)
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`$$`
|
2021-05-19 22:02:45 -07:00
|
|
|
: End of the text
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`_`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Zero or more whitespace characters, including spaces and tabs, but not
|
|
|
|
newlines.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`__`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Zero or more whitespace characters, including spaces, tabs, newlines, and
|
|
|
|
comments. Comments are undefined by default, but may be defined by a separate
|
|
|
|
grammar file. See the **GRAMMAR FILES** section for more info.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`"foo"`, `'foo'`
|
2021-05-19 22:02:45 -07:00
|
|
|
: The literal string **"foo"**. Single and double quotes are treated the same.
|
|
|
|
Escape sequences are not allowed.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`` ` ``*c*
|
|
|
|
: The literal character *c* (e.g. `` `@ `` matches the "@" character)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`` ` ``*c1*`-`*c2*
|
|
|
|
: The character range *c1* to *c2* (e.g. `` `a-z ``). Multiple ranges
|
|
|
|
can be combined with a comma (e.g. `` `a-z,A-Z ``).
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-07-17 14:19:55 -07:00
|
|
|
`` ` ``*c1*`,`*c2*
|
|
|
|
: Any one of the given character or character ranges *c1* or *c2* (e.g. `` `a,e,i,o,u,0-9 ``)
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`\`*esc*
|
|
|
|
: An escape sequence (e.g. `\n`, `\x1F`, `\033`, etc.)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`\`*esc1*`-`*esc2*
|
|
|
|
: An escape sequence range from *esc1* to *esc2* (e.g. `\x00-x1F`)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-07-17 14:19:55 -07:00
|
|
|
`\`*esc1*`,`*esc2*
|
|
|
|
: Any one of the given escape sequences or ranges *esc1* or *esc2* (e.g. `\r,n,x01-x04`)
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`\N`
|
2022-05-02 15:01:45 -07:00
|
|
|
: A special escape that matches a "nodent": one or more newlines followed by
|
|
|
|
the same indentation that occurs on the current line.
|
|
|
|
|
|
|
|
`\C`
|
|
|
|
: A special escape that always matches the empty string and replaces it with
|
|
|
|
the indentation of the line on which it matched. For example, this pattern
|
|
|
|
would match Bash-style heredocs that start with "<<-FOO" and end with a line
|
|
|
|
containing only the starting indentation and the string "FOO":
|
|
|
|
`"<<-" @end=(\C id) ..%\n (^end$)`
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-07-19 19:40:43 -07:00
|
|
|
`\i`
|
|
|
|
: An identifier character (e.g. alphanumeric characters or underscores).
|
|
|
|
|
|
|
|
`\I`
|
|
|
|
: An identifier character, not including numbers (e.g. alphabetic characters or underscores).
|
|
|
|
|
2021-07-30 20:23:18 -07:00
|
|
|
`|`
|
|
|
|
: A word boundary (i.e. the edge of a word).
|
|
|
|
|
2021-07-19 19:40:43 -07:00
|
|
|
`\b`
|
2021-07-30 20:23:18 -07:00
|
|
|
: Alias for `|` (word boundary)
|
2021-07-19 19:40:43 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`!` *pat*
|
2021-05-19 22:02:45 -07:00
|
|
|
: Not *pat*
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`[` *pat* `]`
|
2021-05-19 22:02:45 -07:00
|
|
|
: Maybe *pat*
|
|
|
|
|
|
|
|
*N* *pat*
|
2021-05-23 15:21:46 -07:00
|
|
|
: Exactly *N* repetitions of *pat* (e.g. `5 "x"` matches **"xxxxx"**)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*N* `-` *M* *pat*
|
|
|
|
: Between *N* and *M* repetitions of *pat* (e.g. `2-3 "x"` matches **"xx"** or
|
|
|
|
**"xxx"**)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*N*`+` *pat*
|
|
|
|
: At least *N* or more repetitions of *pat* (e.g. `2+ "x"` matches
|
2021-05-19 22:02:45 -07:00
|
|
|
**"xx"**, **"xxx"**, **"xxxx"**, etc.)
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`*` *pat*
|
2022-04-01 09:25:12 -07:00
|
|
|
: Any *pat*s (zero or more, e.g. `* "x"` matches **""**, **"x"**, **"xx"**,
|
2021-05-23 15:21:46 -07:00
|
|
|
etc.)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`+` *pat*
|
2022-04-01 09:25:12 -07:00
|
|
|
: Some *pat*s (e.g. `+ "x"` matches **"x"**, **"xx"**, **"xxx"**, etc.)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*repeating-pat* `%` *sep*
|
|
|
|
: *repeating-pat* (see the examples above) separated by *sep* (e.g. `*word %
|
|
|
|
","` matches zero or more comma-separated words)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`..` *pat*
|
2021-05-19 22:02:45 -07:00
|
|
|
: Any text (except newlines) up to and including *pat*
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`.. %` *skip* *pat*
|
|
|
|
: Any text (except newlines) up to and including *pat*, skipping over instances
|
|
|
|
of *skip* (e.g. `'"' ..%('\' .) '"'` opening quote, up to closing quote,
|
|
|
|
skipping over backslash followed by a single character)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-07-30 19:24:35 -07:00
|
|
|
`.. =` *only* *pat*
|
|
|
|
: Any number of repetitions of the pattern *only* up to and including *pat*
|
|
|
|
(e.g. `"f" ..=abc "k"` matches the letter "f" followed by some alphabetic
|
|
|
|
characters and then a "k", which would match "fork", but not "free kit") This
|
|
|
|
is essentially a "non-greedy" version of `*`, and `.. pat` can be thought of as
|
|
|
|
the special case of `..=. pat`
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`<` *pat*
|
2021-05-20 17:19:46 -07:00
|
|
|
: Matches at the current position if *pat* matches immediately before the
|
|
|
|
current position (lookbehind). Conceptually, you can think of this as creating
|
|
|
|
a file containing only the *N* characters immediately before the current
|
|
|
|
position and attempting to match *pat* on that file, for all values of *N* from
|
|
|
|
the minimum number of characters *pat* can match up to maximum number of
|
|
|
|
characters *pat* can match (or the length of the current line upto the current
|
|
|
|
position, whichever is smaller). **Note:** For fixed-length lookbehinds, this
|
2021-05-23 15:21:46 -07:00
|
|
|
is quite efficient (e.g. `<(100 "x")`), however this could cause performance
|
|
|
|
problems with variable-length lookbehinds (e.g. `<("x" 0-100"y")`). Also, it is
|
|
|
|
worth noting that `^`, `^^`, `$`, and `$$` all match against the edges of the
|
|
|
|
slice, which may give false positives if you were expecting them to match only
|
|
|
|
against the edges file or line.
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`>` *pat*
|
2021-05-20 17:19:46 -07:00
|
|
|
: Matches *pat*, but does not consume any input (lookahead).
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`@` *pat*
|
2021-05-19 22:02:45 -07:00
|
|
|
: Capture *pat*
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`foo`
|
|
|
|
: The named pattern whose name is **"foo"**. Pattern names come from
|
|
|
|
definitions in grammar files or from named captures. Pattern names may contain
|
|
|
|
dashes (`-`), but not underscores (`_`), since the underscore is used to match
|
|
|
|
whitespace. See the **GRAMMAR FILES** section for more info.
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`@` *name* `=` *pat*
|
2021-05-19 22:02:45 -07:00
|
|
|
: Let *name* equal *pat* (named capture). Named captures can be used as
|
2021-05-23 15:21:46 -07:00
|
|
|
backreferences like so: `` @foo=word `( foo `) `` (matches **"asdf(asdf)"** or
|
2021-05-19 22:02:45 -07:00
|
|
|
**"baz(baz)"**, but not **"foo(baz)"**)
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*pat* `=>` `"`*replacement*`"`
|
|
|
|
: Replace *pat* with *replacement*. Note: *replacement* should be a string
|
|
|
|
(single or double quoted), and it may contain escape sequences (e.g. `\n`) or
|
|
|
|
references to captured values: `@0` (the whole of *pat*), `@1` (the first
|
|
|
|
capture in *pat*), `@`*foo* (the capture named *foo* in *pat*), etc. For
|
|
|
|
example, `@word _ @rest=(*word % _) => "@rest:\n\t@1"` matches a word followed
|
|
|
|
by whitespace, followed by a series of words and replaces it with the series
|
|
|
|
of words, a colon, a newline, a tab, and then the first word.
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*pat1* `~` *pat2*
|
2021-05-19 23:41:57 -07:00
|
|
|
: Matches when *pat1* matches and *pat2* can be found within the text of that
|
2021-05-23 15:21:46 -07:00
|
|
|
match. (e.g. `comment ~ {TODO}` matches comments that contain the word
|
2021-05-19 23:41:57 -07:00
|
|
|
**"TODO"**)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*pat1* `!~` *pat2*
|
2021-05-19 23:41:57 -07:00
|
|
|
: Matches when *pat1* matches, but *pat2* can not be found within the text of
|
2021-05-23 15:21:46 -07:00
|
|
|
that match. (e.g. `comment ~ {IGNORE}` matches only comments that do not
|
2021-05-19 23:41:57 -07:00
|
|
|
contain the word **"IGNORE"**)
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
*name*`:` *pat*
|
2021-05-19 22:02:45 -07:00
|
|
|
: Define *name* to mean *pat* (pattern definition)
|
|
|
|
|
2022-04-30 12:26:58 -07:00
|
|
|
*name*`::` *pat*
|
|
|
|
: Define *name* to be a special tagged pattern *pat*. This is the same as a
|
|
|
|
regular definition, except that a piece of metadata is attached to it
|
|
|
|
associating it with the specified name.
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
`#` *comment*
|
2021-05-19 22:02:45 -07:00
|
|
|
: A line comment
|
|
|
|
|
|
|
|
|
|
|
|
# GRAMMAR FILES
|
|
|
|
|
|
|
|
**bp** allows loading extra grammar files, which define patterns which may be
|
|
|
|
used for matching. The **builtins** grammar file is loaded by default, and it
|
|
|
|
defines a few useful general-purpose patterns. For example, it defines the
|
|
|
|
**parens** rule, which matches pairs of matching parentheses, accounting for
|
|
|
|
nested inner parentheses:
|
|
|
|
|
|
|
|
```
|
|
|
|
bp -p '"my_func" parens'
|
|
|
|
```
|
|
|
|
|
|
|
|
**bp** also comes with a few grammar files for common programming languages,
|
|
|
|
which may be loaded on demand. These grammar files are not comprehensive syntax
|
|
|
|
definitions, but only some common patterns. For example, the c++ grammar file
|
2021-05-23 15:21:46 -07:00
|
|
|
contains definitions for `//`-style line comments as well as `/*...*/`-style
|
|
|
|
block comments. Thus, you can find all comments with the word "TODO" with the
|
|
|
|
following command:
|
2021-05-19 22:02:45 -07:00
|
|
|
|
|
|
|
```
|
2021-05-23 15:21:46 -07:00
|
|
|
bp -g c++ -p 'comment ~ {TODO}' *.cpp
|
2021-05-19 22:02:45 -07:00
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
# EXAMPLES
|
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
Find files containing the string "foo" (a string pattern):
|
|
|
|
```
|
|
|
|
ls | bp foo
|
|
|
|
```
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
Find files ending with ".c" and print the name with the ".c" replaced with ".h":
|
|
|
|
```
|
|
|
|
ls | bp '.c\$' -r '.h'
|
|
|
|
```
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
Find the word "foobar", followed by a pair of matching parentheses in the file
|
|
|
|
*my_file.py*:
|
|
|
|
```
|
|
|
|
bp -p '{foobar} parens' my_file.py
|
|
|
|
```
|
2021-05-19 22:02:45 -07:00
|
|
|
|
2021-05-23 15:21:46 -07:00
|
|
|
Using the *html* grammar, find all *element*s matching the tag *a* in the file
|
|
|
|
*foo.html*:
|
|
|
|
```
|
|
|
|
bp -g html -p 'element ~ (^^"<a ")' foo.html
|
|
|
|
```
|
2021-05-19 23:41:57 -07:00
|
|
|
|