(190 lines)

BP - Bruce's PEG Tool

BP is a parsing expression grammar (PEG) tool for the command line. It's written in pure C with no dependencies.

Image preview

Tutorial

Run make tutorial to run through the tutorial. It walks through some basic pattern matching.

Usage

bp [flags] <pattern> [<input files>...]

BP is optimized for matching literal strings, so the main pattern argument is interpreted as a string literal. BP pattern syntax is inserted using curly brace interpolations like bp 'foo{..}baz' (match the string literal "foo" up to and including the next occurrence of "baz" on the same line).

Flags

-h --help print the usage and quit
-v --verbose print verbose debugging info
-i --ignore-case perform a case-insensitive match
-I --inplace perform replacements or filtering in-place on files
-e --explain print an explanation of the matches
-l --list-files print only filenames containing matches
-r --replace <replacement> replace the input pattern with the given replacement
-s --skip <skip pattern> skip over the given pattern when looking for matches
-B --context-before <N> change how many lines of context are printed before each match
-B --context-after <N> change how many lines of context are printed after each match
-C --context <N> change how many lines of context are printed before and after each match
-g --grammar <grammar file> use the specified file as a grammar
-G --git get filenames from git
-f --format auto|plain|fancy set the output format (fancy includes colors and line numbers)

See man ./bp.1 for more details.

BP Patterns

BP patterns are a mixture of Parsing Expression Grammar and Regular Expression syntax, with a preference for prefix operators instead of suffix operators.

Pattern	Meaning
`"foo"`, `'foo'`	The literal string `foo`. There are no escape sequences within strings.
`pat1 pat2`	`pat1` followed by `pat2`
`pat1 / pat2`	`pat1` if it matches, otherwise `pat2`
`.. pat`	Any text up to and including `pat` (except newlines)
`.. % skip pat`	Any text up to and including `pat` (except newlines), skipping over instances of `skip`
`.. = repeat pat`	Any number of repetitions of `repeat` up to and including `pat`
`.`	Any single character (except newline)
`^^`	The start of the input
`^`	The start of a line
`$$`	The end of the input
`$`	The end of a line
`__`	Zero or more whitespace characters (including newlines)
`_`	Zero or more whitespace characters (excluding newlines)
`c	The literal character `c`
`a-z	The character range `a` through `z`
`a,b	The character `a` or the character `b`
`\n`, `\033`, `\x0A`, etc.	An escape sequence character
`\x00-xFF`	An escape sequence range (byte `0x00` through `0xFF` here)
`!pat`	`pat` does not match at the current position
`[pat]`	Zero or one occurrences of `pat` (optional pattern)
`5 pat`	Exactly 5 occurrences of `pat`
`2-4 pat`	Between 2 and 4 occurrences of `pat` (inclusive)
`5+ pat`	5 or more occurrences of `pat`
`5+ pat % sep`	5 or more occurrences of `pat`, separated by `sep` (e.g. `0+ int % ","` matches `1,2,3`)
`*pat`	0 or more occurrences of `pat` (shorthand for `0+pat`)
`+pat`	1 or more occurrences of `pat` (shorthand for `1+pat`)
`<pat`	`pat` matches just before the current position (lookbehind)
`>pat`	`pat` matches just in front of the current position (lookahead)
`@pat`	Capture `pat` (used for text replacement)
`@foo=pat`	Capture `pat` with the name `foo` attached (used for text replacement)
`@foo:pat`	Let `foo` be the text of `pat` (used for backreferences)
`pat => "replacement"`	Match `pat` and replace it with `replacement`
`(pat1 @keep=pat2) => "@keep"`	Match `pat1` followed by `pat2` and replace it with the text of `pat2`
`pat1~pat2`	`pat1` when `pat2` can be found within the result
`pat1!~pat2`	`pat1` when `pat2` can not be found within the result
`name: pat2`	`name` is defined to mean `pat`
`name:: pat2`	`name` is defined to mean `pat` and matches have `name` attached to the result as metadata
`# line comment`	A line comment

See man ./bp.1 for more details.

Grammar Files

BP comes packaged with some pattern definitions that can be useful when parsing code of different languages. Firstly, there are a handful of general-purpose patterns like:

Name	Meaning
`string`	A string (either single- or double-quoted)
`parens`	A matched pair of parentheses (`()`)
`braces`	A matched pair of curly braces (`{}`)
`brackets`	A matched pair of square brackets (`[]`)
`anglebraces`	A matched pair of angle braces (`<>`)
`_`	Zero or more whitespace characters (excluding newline)
`__`	Zero or more whitespace characters, including newlines and comments
`Abc`	The characters `a-z` and `A-Z`
`Abc123`	The characters `a-z`, `A-Z`, and `0-9`
`int`	1 or more numeric characters
`number`	An integer or floating point number
`Hex`	A hexadecimal character
`id`	An identifier

As well as these common definitions, BP also comes with a set of language-specific or domain-specific grammars. These are not full language grammars, but only implementation of some language-specific features, like identifier rules (id), string syntax, and comment syntax (which affects __ and other rules). Some of the languages supported are:

BP
C++
C
Go
HTML
Javascript
Lisp
Lua
Python
Rust
shell script

These grammar definitions can be found in grammars. To use a grammar file, use bp -g <path-to-file> or bp --grammar=<path-to-file>. Once BP is installed, however, you can use bp -g <grammar-name> directly, and BP will figure out which grammar you mean (e.g. bp -g lua ...). BP first searches ~/.config/bp/ for any grammar files you keep locally, then searches /etc/bp/ for system-wide grammar files.

Testing for these grammar files (other than builtins) is iffy at this point, so use at your own risk! These grammar files are only approximations of syntax.

Code Layout

File	Description
bp.c	The main program.
files.c	Loading files into memory.
match.c	Pattern matching code (find occurrences of a bp pattern within an input string).
pattern.c	Pattern compiling code (compile a bp pattern from an input string).
printmatch.c	Printing a visual explanation of a match.
utf8.c	UTF-8 helper code.
utils.c	Miscellaneous helper functions.

Lua Bindings

bp also comes with a set of Lua bindings, which can be found in the Lua/ directory. The bindings are currently a work in progress, but are fully usable at this point. Check the Lua bindings README for more details.

Performance

Currently, bp's speed is comparable to hyper-optimized regex tools like grep, ag, and ripgrep when it comes to simple patterns that begin with string literals, but bp's performance may be noticeably slower for complex patterns on large quantities of text. The aforementioned regular expression tools are usually implemented as efficient finite state machines, but bp is more expressive and capable of matching arbitrarily nested patterns, which precludes the possibility of using a finite state machine. Instead, bp uses a fairly simple recursive virtual machine implementation with memoization. bp also has a decent amount of overhead because of the metadata used for visualizing and explaining pattern matches, as well as performing string replacements. Overall, I would say that bp is a great drop-in replacement for common shell scripting tasks, but you may want to keep the other tools around in case you have to search through a truly massive codebase for something complex.

License

BP is provided under the MIT license with the Commons Clause (you can't sell this software without the developer's permission, but you're otherwise free to use, modify, and redistribute it free of charge). See LICENSE for details.

   1 # BP - Bruce's PEG Tool
   2 
   3 BP is a parsing expression grammar (PEG) tool for the command line.
   4 It's written in pure C with no dependencies.
   5 
   6 ![Image preview](bp.png)
   7 
   8 
   9 ## Tutorial
  10 
  11 Run `make tutorial` to run through the tutorial. It walks through some basic pattern matching.
  12 
  13 
  14 ## Usage
  15 
  16 ```
  17 bp [flags] <pattern> [<input files>...]
  18 ```
  19 
  20 BP is optimized for matching literal strings, so the main pattern argument is
  21 interpreted as a string literal. BP pattern syntax is inserted using curly
  22 brace interpolations like `bp 'foo{..}baz'` (match the string literal "foo" up
  23 to and including the next occurrence of "baz" on the same line).
  24 
  25 ### Flags
  26 
  27 * `-h` `--help` print the usage and quit
  28 * `-v` `--verbose` print verbose debugging info
  29 * `-i` `--ignore-case` perform a case-insensitive match
  30 * `-I` `--inplace` perform replacements or filtering in-place on files
  31 * `-e` `--explain` print an explanation of the matches
  32 * `-l` `--list-files` print only filenames containing matches
  33 * `-r` `--replace <replacement>` replace the input pattern with the given replacement
  34 * `-s` `--skip <skip pattern>` skip over the given pattern when looking for matches
  35 * `-B` `--context-before <N>` change how many lines of context are printed before each match
  36 * `-B` `--context-after <N>` change how many lines of context are printed after each match
  37 * `-C` `--context <N>` change how many lines of context are printed before and after each match
  38 * `-g` `--grammar <grammar file>` use the specified file as a grammar
  39 * `-G` `--git` get filenames from git
  40 * `-f` `--format` `auto|plain|fancy` set the output format (`fancy` includes colors and line numbers)
  41 
  42 See `man ./bp.1` for more details.
  43 
  44 
  45 ## BP Patterns
  46 
  47 BP patterns are a mixture of Parsing Expression Grammar and Regular
  48 Expression syntax, with a preference for prefix operators instead of
  49 suffix operators.
  50 
  51 Pattern            | Meaning
  52 -------------------|---------------------
  53 `"foo"`, `'foo'`   | The literal string `foo`. There are no escape sequences within strings.
  54 `pat1 pat2`        | `pat1` followed by `pat2`
  55 `pat1 / pat2`      | `pat1` if it matches, otherwise `pat2`
  56 `.. pat`           | Any text up to and including `pat` (except newlines)
  57 `.. % skip pat`    | Any text up to and including `pat` (except newlines), skipping over instances of `skip`
  58 `.. = repeat pat`  | Any number of repetitions of `repeat` up to and including `pat`
  59 `.`                | Any single character (except newline)
  60 `^^`               | The start of the input
  61 `^`                | The start of a line
  62 `$$`               | The end of the input
  63 `$`                | The end of a line
  64 `__`               | Zero or more whitespace characters (including newlines)
  65 `_`                | Zero or more whitespace characters (excluding newlines)
  66 `` `c ``           | The literal character `c`
  67 `` `a-z ``         | The character range `a` through `z`
  68 `` `a,b ``         | The character `a` or the character `b`
  69 `\n`, `\033`, `\x0A`, etc. | An escape sequence character
  70 `\x00-xFF`         | An escape sequence range (byte `0x00` through `0xFF` here)
  71 `!pat`             | `pat` does not match at the current position
  72 `[pat]`            | Zero or one occurrences of `pat` (optional pattern)
  73 `5 pat`            | Exactly 5 occurrences of `pat`
  74 `2-4 pat`          | Between 2 and 4 occurrences of `pat` (inclusive)
  75 `5+ pat`           | 5 or more occurrences of `pat`
  76 `5+ pat % sep`     | 5 or more occurrences of `pat`, separated by `sep` (e.g. `0+ int % ","` matches `1,2,3`)
  77 `*pat`             | 0 or more occurrences of `pat` (shorthand for `0+pat`)
  78 `+pat`             | 1 or more occurrences of `pat` (shorthand for `1+pat`)
  79 `<pat`             | `pat` matches just before the current position (lookbehind)
  80 `>pat`             | `pat` matches just in front of the current position (lookahead)
  81 `@pat`             | Capture `pat` (used for text replacement)
  82 `@foo=pat`         | Capture `pat` with the name `foo` attached (used for text replacement)
  83 `@foo:pat`         | Let `foo` be the text of `pat` (used for backreferences)
  84 `pat => "replacement"` | Match `pat` and replace it with `replacement`
  85 `(pat1 @keep=pat2) => "@keep"` | Match `pat1` followed by `pat2` and replace it with the text of `pat2`
  86 `pat1~pat2`        | `pat1` when `pat2` can be found within the result
  87 `pat1!~pat2`       | `pat1` when `pat2` can not be found within the result
  88 `name: pat2`       | `name` is defined to mean `pat`
  89 `name:: pat2`      | `name` is defined to mean `pat` and matches have `name` attached to the result as metadata
  90 `# line comment`   | A line comment
  91 
  92 See `man ./bp.1` for more details.
  93 
  94 
  95 ## Grammar Files
  96 
  97 BP comes packaged with some pattern definitions that can be useful when parsing
  98 code of different languages. Firstly, there are a handful of general-purpose
  99 patterns like:
 100 
 101 Name          | Meaning
 102 --------------|--------------------
 103 `string`      | A string (either single- or double-quoted)
 104 `parens`      | A matched pair of parentheses (`()`)
 105 `braces`      | A matched pair of curly braces (`{}`)
 106 `brackets`    | A matched pair of square brackets (`[]`)
 107 `anglebraces` | A matched pair of angle braces (`<>`)
 108 `_`           | Zero or more whitespace characters (excluding newline)
 109 `__`          | Zero or more whitespace characters, including newlines and comments
 110 `Abc`         | The characters `a-z` and `A-Z`
 111 `Abc123`      | The characters `a-z`, `A-Z`, and `0-9`
 112 `int`         | 1 or more numeric characters
 113 `number`      | An integer or floating point number
 114 `Hex`         | A hexadecimal character
 115 `id`          | An identifier
 116 
 117 As well as these common definitions, BP also comes with a set of
 118 language-specific or domain-specific grammars. These are not full language
 119 grammars, but only implementation of some language-specific features, like
 120 identifier rules (`id`), string syntax, and comment syntax (which affects `__`
 121 and other rules). Some of the languages supported are:
 122 
 123 - BP
 124 - C++
 125 - C
 126 - Go
 127 - HTML
 128 - Javascript
 129 - Lisp
 130 - Lua
 131 - Python
 132 - Rust
 133 - shell script
 134 
 135 These grammar definitions can be found in [grammars](/grammars). To use a
 136 grammar file, use `bp -g <path-to-file>` or `bp --grammar=<path-to-file>`. Once
 137 BP is installed, however, you can use `bp -g <grammar-name>` directly, and BP
 138 will figure out which grammar you mean (e.g. `bp -g lua ...`). BP first
 139 searches `~/.config/bp/` for any grammar files you keep locally, then searches
 140 `/etc/bp/` for system-wide grammar files.
 141 
 142 Testing for these grammar files (other than `builtins`) is iffy at this point,
 143 so use at your own risk! These grammar files are only approximations of syntax.
 144 
 145 
 146 ## Code Layout
 147 
 148 File                           | Description
 149 -------------------------------|-----------------------------------------------------
 150 [bp.c](bp.c)                   | The main program.
 151 [files.c](files.c)             | Loading files into memory.
 152 [match.c](match.c)             | Pattern matching code (find occurrences of a bp pattern within an input string).
 153 [pattern.c](pattern.c)         | Pattern compiling code (compile a bp pattern from an input string).
 154 [printmatch.c](printmatch.c)   | Printing a visual explanation of a match.
 155 [utf8.c](utf8.c)               | UTF-8 helper code.
 156 [utils.c](utils.c)             | Miscellaneous helper functions.
 157 
 158 
 159 ## Lua Bindings
 160 
 161 `bp` also comes with a set of Lua bindings, which can be found in the [Lua/
 162 directory](Lua). The bindings are currently a work in progress, but are fully
 163 usable at this point. Check [the Lua bindings README](Lua/README.md) for more
 164 details.
 165 
 166 
 167 ## Performance
 168 
 169 Currently, `bp`'s speed is comparable to hyper-optimized regex tools like
 170 `grep`, `ag`, and `ripgrep` when it comes to simple patterns that begin with
 171 string literals, but `bp`'s performance may be noticeably slower for complex
 172 patterns on large quantities of text. The aforementioned regular expression
 173 tools are usually implemented as efficient finite state machines, but `bp` is
 174 more expressive and capable of matching arbitrarily nested patterns, which
 175 precludes the possibility of using a finite state machine. Instead, `bp` uses a
 176 fairly simple recursive virtual machine implementation with memoization. `bp`
 177 also has a decent amount of overhead because of the metadata used for
 178 visualizing and explaining pattern matches, as well as performing string
 179 replacements. Overall, I would say that `bp` is a great drop-in replacement for
 180 common shell scripting tasks, but you may want to keep the other tools around
 181 in case you have to search through a truly massive codebase for something
 182 complex.
 183 
 184 
 185 ## License
 186 
 187 BP is provided under the MIT license with the [Commons Clause](https://commonsclause.com/)
 188 (you can't sell this software without the developer's permission, but you're
 189 otherwise free to use, modify, and redistribute it free of charge).
 190 See [LICENSE](LICENSE) for details.