2020-12-30 19:42:47 -08:00
# BP - Bruce's PEG Tool
2020-09-14 11:00:03 -07:00
2020-12-30 19:42:47 -08:00
BP is a parsing expression grammar (PEG) tool for the command line.
2020-09-14 11:00:03 -07:00
It's written in pure C with no dependencies.
2021-01-17 22:35:34 -08:00
2020-09-14 11:00:03 -07:00
## Usage
2021-01-17 22:35:34 -08:00
2020-12-12 16:31:53 -08:00
`bp [flags] <pattern> [<input files>...]`
2020-09-14 11:00:03 -07:00
2021-01-17 22:35:34 -08:00
2020-09-14 11:00:03 -07:00
### Flags
2021-01-17 22:35:34 -08:00
2020-09-14 11:00:03 -07:00
* `-h` `--help` print the usage and quit
* `-v` `--verbose` print verbose debugging info
2020-09-14 12:16:15 -07:00
* `-i` `--ignore-case` perform a case-insensitive match
2020-12-27 19:48:52 -08:00
* `-I` `--inplace` perform replacements or filtering in-place on files
2021-01-15 18:23:18 -08:00
* `-C` `--confirm` during replacement, confirm before each replacement
2020-12-14 22:32:47 -08:00
* `-e` `--explain` print an explanation of the matches
* `-j` `--json` print matches as JSON objects
2020-12-17 16:23:45 -08:00
* `-l` `--list-files` print only filenames containing matches
2020-12-12 16:31:53 -08:00
* `-p` `--pattern <pat>` provide a pattern (equivalent to `bp '\(<pat>)'` )
2021-01-20 16:12:46 -08:00
* `-r` `--replace <replacement>` replace the input pattern with the given replacement
* `-s` `--skip <skip pattern>` skip over the given pattern when looking for matches
2021-01-15 01:19:10 -08:00
* `-c` `--context <N>` change how many lines of context are printed (`0`: no context, `all` : the whole file, `<N>` matching lines and `<N-1>` lines before/after)
2021-01-17 09:21:58 -08:00
* `-g` `--grammar <grammar file>` use the specified file as a grammar
* `-G` `--git` get filenames from git
2021-05-12 19:20:58 -07:00
* `--color yes|no|auto` force output to use or not use color
2020-09-14 11:00:03 -07:00
2020-12-12 16:31:53 -08:00
See `man ./bp.1` for more details.
2020-09-14 11:00:03 -07:00
2021-01-17 22:35:34 -08:00
2020-12-30 19:42:47 -08:00
## BP Patterns
2021-01-17 22:35:34 -08:00
2020-12-30 19:42:47 -08:00
BP patterns are a mixture of Parsing Expression Grammar and Regular
2020-09-14 11:00:03 -07:00
Expression syntax, with a preference for prefix operators instead of
suffix operators.
2020-09-28 16:54:17 -07:00
Pattern | Meaning
-------------------|---------------------
2021-05-10 23:49:17 -07:00
`"foo"` , `'foo'` | The literal string `foo` . There are no escape sequences within strings.
2020-09-28 16:54:17 -07:00
`pat1 pat2` | `pat1` followed by `pat2`
`pat1 / pat2` | `pat1` if it matches, otherwise `pat2`
`..pat` | Any text up to and including `pat` (except newlines)
2021-01-20 15:23:57 -08:00
`.. % skip pat` | Any text up to and including `pat` (except newlines), skipping over instances of `skip`
2020-09-28 16:54:17 -07:00
`.` | Any single character (except newline)
`^^` | The start of the input
`^` | The start of a line
`$$` | The end of the input
`$` | The end of a line
`__` | Zero or more whitespace characters (including newlines)
`_` | Zero or more whitespace characters (excluding newlines)
2021-05-11 12:38:58 -07:00
`{foo}` | The literal string `foo` with word boundaries on both ends
2020-09-28 16:54:17 -07:00
`` `c `` | The literal character ` c`
`` `a-z `` | The character range ` a` through `z`
2020-12-19 18:53:51 -08:00
`` `a,b `` | The character ` a` or the character `b`
2020-09-28 16:54:17 -07:00
`\n` , `\033` , `\x0A` , etc. | An escape sequence character
`\x00-xFF` | An escape sequence range (byte `0x00` through `0xFF` here)
`!pat` | `pat` does not match at the current position
2021-01-19 23:30:50 -08:00
`[pat]` | Zero or one occurrences of `pat` (optional pattern)
2020-09-28 16:54:17 -07:00
`5 pat` | Exactly 5 occurrences of `pat`
`2-4 pat` | Between 2 and 4 occurrences of `pat` (inclusive)
`5+ pat` | 5 or more occurrences of `pat`
2020-09-28 17:42:38 -07:00
`5+ pat % sep` | 5 or more occurrences of `pat` , separated by `sep` (e.g. `0+ int % ","` matches `1,2,3` )
2020-09-28 18:08:23 -07:00
`*pat` | 0 or more occurrences of `pat` (shorthand for `0+pat` )
`+pat` | 1 or more occurrences of `pat` (shorthand for `1+pat` )
2020-09-28 16:54:17 -07:00
`<pat` | `pat` matches just before the current position (backref)
`>pat` | `pat` matches just in front of the current position (lookahead)
`@pat` | Capture `pat` (used for text replacement and backreferences)
`@foo=pat` | Let `foo` be the text of `pat` (used for text replacement and backreferences)
2020-12-30 15:30:19 -08:00
`pat => "replacement"` | Match `pat` and replace it with `replacement`
`(pat1 @keep=pat2) => "@keep"` | Match `pat1` followed by `pat2` and replace it with the text of `pat2`
2020-09-28 16:54:17 -07:00
`pat1==pat2` | `pat1` , assuming `pat2` also matches with the same length
2020-09-28 17:56:02 -07:00
`pat1!=pat2` | `pat1` , unless `pat2` also matches with the same length
2021-01-15 12:40:19 -08:00
`name:pat2` | `name` is defined to mean `pat`
2020-09-28 16:54:17 -07:00
`# line comment` | A line comment
2020-12-12 16:31:53 -08:00
See `man ./bp.1` for more details.
2020-09-14 11:00:03 -07:00
2021-01-17 22:35:34 -08:00
## Grammar Files
BP comes packaged with some pattern definitions that can be useful when parsing
code of different languages. Firstly, there are a handful of general-purpose
patterns like:
Name | Meaning
--------------|--------------------
`string` | A string (either single- or double-quoted)
`parens` | A matched pair of parentheses (`()`)
`braces` | A matched pair of curly braces (`{}`)
`brackets` | A matched pair of square brackets (`[]`)
`anglebraces` | A matched pair of angle braces (`< >`)
`_` | Zero or more whitespace characters (excluding newline)
`__` | Zero or more whitespace characters, including newlines and comments
`Abc` | The characters `a-z` and `A-Z`
`Abc123` | The characters `a-z` , `A-Z` , and `0-9`
`int` | 1 or more numeric characters
`number` | An integer or floating point number
`Hex` | A hexadecimal character
`id` | An identifier
As well as these common definitions, BP also comes with a set of
language-specific or domain-specific grammars. These are not full language
grammars, but only implementation of some language-specific features, like
identifier rules (`id`), string syntax, and comment syntax (which affects `__`
and other rules). Some of the languages supported are:
- BP
- C++
- C
- Go
- HTML
- Javascript
- Lisp
- Lua
- Python
- Rust
- shell script
These grammar definitions can be found in [grammars ](/grammars ). To use a
grammar file, use `bp -g <path-to-file>` or `bp --grammar=<path-to-file>` . Once
BP is installed, however, you can use `bp -g <grammar-name>` directly, and BP
will figure out which grammar you mean (e.g. `bp -g lua ...` ). BP first
searches `~/.config/bp/` for any grammar files you keep locally, then searches
`/etc/xdg/bp/` for system-wide grammar files.
Testing for these grammar files (other than `builtins` ) is iffy at this point,
so use at your own risk! These grammar files are only approximations of syntax.
2021-01-18 12:53:44 -08:00
## Performance
Currently, `bp` is super slow compared to hyper-optimized regex tools like
`grep` and `ripgrep` . `bp` is **not** matching regular expressions, so this is
not strictly a fair comparison. By definition, regular expressions can be
implemented using finite state machines, which are very efficient. Most regex
tools also add the additional restriction that matches must be within a single
line. `bp` on the other hand, uses parsing expression grammars, which can match
arbitrarily complicated or nested structures, requiring a dynamic call stack
and potentially unbounded memory use. This makes `bp` patterns much more
expressive, but harder to optimize. At this point in time, `bp` 's
implementation also uses a fairly naive virtual machine written in C, which is
not very heavily optimized. As a result, `bp` runs quite fast over thousands of
lines of code, reasonably fast over tens of thousands of lines of code, and
pretty slow over millions of lines of code.
2020-09-14 11:00:03 -07:00
## License
2021-01-17 22:35:34 -08:00
2020-12-30 19:42:47 -08:00
BP is provided under the MIT license with the [Commons Clause ](https://commonsclause.com/ )
2020-09-28 17:01:53 -07:00
(you can't sell this software without the developer's permission, but you're
otherwise free to use, modify, and redistribute it free of charge).
See [LICENSE ](LICENSE ) for details.