diff options
| author | Bruce Hill <bruce@bruce-hill.com> | 2021-05-19 22:02:45 -0700 |
|---|---|---|
| committer | Bruce Hill <bruce@bruce-hill.com> | 2021-05-19 22:02:45 -0700 |
| commit | f824d3f3e2e3d3f1d441b94e18d7991ff523cef8 (patch) | |
| tree | 99676777dd5550b4a0e6aa849f136769c0b6ff21 /bp.1.md | |
| parent | 5d5817c2a3d21e91db4eaaf607ff881d71b57638 (diff) | |
Added markdown manpage, which converts to roff using pandoc.
Diffstat (limited to 'bp.1.md')
| -rw-r--r-- | bp.1.md | 286 |
1 files changed, 286 insertions, 0 deletions
@@ -0,0 +1,286 @@ +% BP(1) +% Bruce Hill (*bruce@bruce-hill.com*) +% May 17 2021 + +# NAME + +bp - Bruce\'s Parsing Expression Grammar tool + +# SYNOPSIS + +**bp** +\[*options...*\] +*pattern* +\[\[\--\] *files...*\] + +# DESCRIPTION + +**bp** is a tool that matches parsing expression grammars using a custom +syntax. + +# OPTIONS + +**-v**, **\--verbose** +: Print debugging information. + +**-e**, **\--explain** +: Print a visual explanation of the matches. + +**-j**, **\--json** +: Print a JSON list of the matches. (Pairs with **\--verbose** for more detail) + +**-l**, **\--list-files** +: Print only the names of files containing matches instead of the matches +themselves. + +**-i**, **\--ignore-case** +: Perform pattern matching case-insensitively. + +**-I**, **\--inplace** +: Perform filtering or replacement in-place (i.e. overwrite files with new +content). + +**-C**, **\--confirm** +: During in-place modification of a file, confirm before each modification. + +**-r**, **\--replace** *replacement* +: Replace all occurrences of the main pattern with the given string. + +**-s**, **\--skip** *pattern* +: While looking for matches, skip over *pattern* occurrences. This can be +useful for behavior like **bp -s string** (avoiding matches inside string +literals). + +**-g**, **\--grammar** *grammar-file* +: Load the grammar from the given file. See the **GRAMMAR FILES** section +for more info. + +**-G**, **\--git** +: Use **git** to get a list of files. Remaining file arguments (if any) are +passed to **git \--ls-files** instead of treated as literal files. + +**-c**, **\--context** *N* +: The number of lines of context to print. If *N* is 0, print only the +exact text of the matches. If *N* is **`"all"`**, print the entire file. +Otherwise, if *N* is a positive integer, print the whole line on which +matches occur, as well as the *N-1* lines before and after the match. The +default value for this argument is **1** (print whole lines where matches +occur). + +**-f**, **\--format** *auto*\|*fancy*\|*plain* +: Set the output format. *fancy* includes colors and line numbers, *plain* +includes neither, and *auto* (the default) uses *fancy* formatting only when +the output is a TTY. + +**\--help** +: Print the usage and exit. + +*pattern* +: The main pattern for bp to match. By default, this pattern is a string +pattern (see the **STRING PATTERNS** section below). + +*files...* +: The input files to search. If no input files are provided and data was piped +in, that data will be used instead. If neither are provided, **bp** will search +through all files in the current directory and its subdirectories +(recursively). + + +# STRING PATTERNS + +One of the most common use cases for pattern matching tools is matching plain, +literal strings, or strings that are primarily plain strings, with one or two +patterns. **bp** is designed around this fact. The default mode for bp patterns +is "string pattern mode". In string pattern mode, all characters are +interpreted literally except for the backslash (**\\**), which may be followed +by a bp pattern (see the **PATTERNS** section above). Optionally, the bp +pattern may be terminated by a semicolon (**;**). + + +# PATTERNS + +**bp** patterns are based off of a combination of Parsing Expression Grammars +and regular expression syntax. The syntax is designed to map closely to verbal +descriptions of the patterns, and prefix operators are preferred over suffix +operators (as is common in regex syntax). + +Some patterns additionally have "multi-line" variants, which means that they +include the newline character. + +*pat1 pat2* +: A sequence: *pat1* followed by *pat2* + +*pat1* **/** *pat2* +: A choice: *pat1*, or if it doesn\'t match, then *pat2* + +**.** +: Any character (excluding newline) + +**\^** +: Start of a line + +**\^\^** +: Start of the text + +**\$** +: End of a line (does not include newline character) + +**\$\$** +: End of the text + +**\_** +: Zero or more whitespace characters, including spaces and tabs, but not +newlines. + +**\_\_** +: Zero or more whitespace characters, including spaces, tabs, newlines, and +comments. Comments are undefined by default, but may be defined by a separate +grammar file. See the **GRAMMAR FILES** section for more info. + +**\"foo\"**, **\'foo\'** +: The literal string **"foo"**. Single and double quotes are treated the same. +Escape sequences are not allowed. + +**{foo}** +: The literal string **"foo"** with word boundaries on either end. Escape +sequences are not allowed. + +**\`***c* +: The literal character *c* (e.g. **\`@** matches the "@" character) + +**\`***c1***,***c2* +: The literal character *c1* or *c2* (e.g. **\`a,e,i,o,u**) + +**\`***c1***-***c2* +: The character range *c1* to *c2* (e.g. **\`a-z**). Multiple ranges +can be combined with a comma (e.g. **\`a-z,A-Z**). + +**\\***esc* +: An escape sequence (e.g. **\\n**, **\\x1F**, **\\033**, etc.) + +**\\***esc1***-***esc2* +: An escape sequence range from *esc1* to *esc2* (e.g. **\\x00-x1F**) + +**\\N** +: A special case escape that matches a "nodent": one or more newlines followed +by the same indentation that occurs on the current line. + +**!** *pat* +: Not *pat* + +**\[** *pat* **\]** +: Maybe *pat* + +*N* *pat* +: Exactly *N* repetitions of *pat* (e.g. **5 \`x** matches **"xxxxx"**) + +*N* **-** *M* *pat* +: Between *N* and *M* repetitions of *pat* (e.g. **2-3 \`x** +matches **"xx"** or **"xxx"**) + +*N***+** *pat* +: At least *N* or more repetitions of *pat* (e.g. **2+ \`x** matches +**"xx"**, **"xxx"**, **"xxxx"**, etc.) + +**\*** *pat* +: Some *pat*s (zero or more, e.g. **\* \`x** matches **""**, **"x"**, +**"xx"**, etc.) + +**+** *pat* +: At least one *pat*s (e.g. **\+ \`x** matches **"x"**, **"xx"**, +**"xxx"**, etc.) + +*repeating-pat* **%** *sep* +: *repeating-pat* separated by *sep* (e.g. **\*word % \`,** matches +zero or more comma-separated words) + +**..** *pat* +: Any text (except newlines) up to and including *pat* + +**.. %** *skip* *pat* +: Any text (except newlines) up to and including *pat*, skipping over +instances of *skip* (e.g. **\`\"..\`\" % (\`\\.)**) + +**\<** *pat* +: Just after *pat* (lookbehind) + +**\>** *pat* +: Just before *pat* (lookahead) + +**\@** *pat* +: Capture *pat* + +**foo** +: The named pattern whose name is **"foo"**. Pattern names come from definitions in +grammar files or from named captures. Pattern names may contain dashes (**-**), +but not underscores (**\_**), since the underscore is used to match whitespace. +See the **GRAMMAR FILES** section for more info. + +**\@** *name* **=** *pat* +: Let *name* equal *pat* (named capture). Named captures can be used as +backreferences like so: **\@foo=word \`( foo \`)** (matches **"asdf(asdf)"** or +**"baz(baz)"**, but not **"foo(baz)"**) + +*pat* **=\> \'***replacement***\'** +: Replace *pat* with *replacement*. Note: *replacement* should be a +string, and it may contain references to captured values: **\@0** (the whole of +*pat*), **\@1** (the first capture in *pat*), **\@***foo* (the capture +named *foo* in *pat*), etc. For example, **\@word \_ \@rest=(\*word % \_) +=\> \"\@rest \@1\"** + +*pat1* **==** *pat2* +: Matches *pat1*, if and only if *pat2* also matches the text of +*pat1*\'s match. (e.g. **word == (\"foo\_\" \*.)** matches words that start +with **"foo\_"**) + +*pat1* **!=** *pat2* +: Matches *pat1*, if and only if *pat2* does not match the text of +*pat1*\'s match. (e.g. **word == (\"foo\_\" \*.)** matches words that do +not start with **"foo\_"**) + +*name***:** *pat* +: Define *name* to mean *pat* (pattern definition) + +**\#** *comment* +: A line comment + + +# GRAMMAR FILES + +**bp** allows loading extra grammar files, which define patterns which may be +used for matching. The **builtins** grammar file is loaded by default, and it +defines a few useful general-purpose patterns. For example, it defines the +**parens** rule, which matches pairs of matching parentheses, accounting for +nested inner parentheses: + +``` +bp -p '"my_func" parens' +``` + +**bp** also comes with a few grammar files for common programming languages, +which may be loaded on demand. These grammar files are not comprehensive syntax +definitions, but only some common patterns. For example, the c++ grammar file +contains definitions for **//**-style line comments as well as +**/\*...\*/**-style block comments. Thus, you can find all comments with the +string "TODO" with the following command: + +``` +bp -g c++ -p 'comment==(..%\n "TODO" ..%\n$$)' *.cpp +``` + + +# EXAMPLES + +**ls \| bp foo** +: Find files containing the string \"foo\" (a string pattern) + +**ls \| bp \'.c\\\$\' -r \'.h\'** +: Find files ending with \".c\" and replace the extension with \".h\" + +**bp -p \'{foobar} parens\' my_file.py** +: Find the literal string **\"foobar\"**, assuming it\'s a complete word, +followed by a pair of matching parentheses in the file *my_file.py* + +**bp -g html -p 'html-element==(\"\<a \"..%\\n\$\$)' foo.html** +: Using the *html* grammar, find all *html-element*s matching the tag *a* in +the file *foo.html* |
