Added documentation for grammars

author: Bruce Hill <bruce@bruce-hill.com> 2021-01-17 22:35:34 -0800
committer: Bruce Hill <bruce@bruce-hill.com> 2021-01-17 22:35:34 -0800
commit: ff928690bbcf2e534a7cc3ddc7583f2ec3f4bb02 (patch)
tree: b97fab3b0648008622804b4dbd69fd9594c67720 /README.md
parent: 1b9e2957c0eeab592285f2cfcfd4e4ffe5586c97 (diff)
1 files changed, 64 insertions, 0 deletions
diff --git a/README.md b/README.md
index d11e534..d948321 100644
--- a/README.md
+++ b/README.md
@@ -3,10 +3,14 @@
 BP is a parsing expression grammar (PEG) tool for the command line.
 It's written in pure C with no dependencies.
 
+
 ## Usage
+
 `bp [flags] <pattern> [<input files>...]`
 
+
 ### Flags
+
 * `-h` `--help` print the usage and quit
 * `-v` `--verbose` print verbose debugging info
 * `-i` `--ignore-case` perform a case-insensitive match
@@ -23,7 +27,9 @@ It's written in pure C with no dependencies.
 
 See `man ./bp.1` for more details.
 
+
 ## BP Patterns
+
 BP patterns are a mixture of Parsing Expression Grammar and Regular
 Expression syntax, with a preference for prefix operators instead of
 suffix operators.
@@ -67,7 +73,65 @@ Pattern            | Meaning
 
 See `man ./bp.1` for more details.
 
+
+## Grammar Files
+
+BP comes packaged with some pattern definitions that can be useful when parsing
+code of different languages. Firstly, there are a handful of general-purpose
+patterns like:
+
+Name          | Meaning
+--------------|--------------------
+`string`      | A string (either single- or double-quoted)
+`parens`      | A matched pair of parentheses (`()`)
+`braces`      | A matched pair of curly braces (`{}`)
+`brackets`    | A matched pair of square brackets (`[]`)
+`anglebraces` | A matched pair of angle braces (`<>`)
+`_`           | Zero or more whitespace characters (excluding newline)
+`__`          | Zero or more whitespace characters, including newlines and comments
+`Abc`         | The characters `a-z` and `A-Z`
+`Abc123`      | The characters `a-z`, `A-Z`, and `0-9`
+`int`         | 1 or more numeric characters
+`number`      | An integer or floating point number
+`Hex`         | A hexadecimal character
+`id`          | An identifier
+`|`           | A word boundary
+`^`           | Start of a line
+`^^`          | Start of a file
+`$`           | End of a line
+`$$`          | End of a file
+
+As well as these common definitions, BP also comes with a set of
+language-specific or domain-specific grammars. These are not full language
+grammars, but only implementation of some language-specific features, like
+identifier rules (`id`), string syntax, and comment syntax (which affects `__`
+and other rules). Some of the languages supported are:
+
+- BP
+- C++
+- C
+- Go
+- HTML
+- Javascript
+- Lisp
+- Lua
+- Python
+- Rust
+- shell script
+
+These grammar definitions can be found in [grammars](/grammars). To use a
+grammar file, use `bp -g <path-to-file>` or `bp --grammar=<path-to-file>`. Once
+BP is installed, however, you can use `bp -g <grammar-name>` directly, and BP
+will figure out which grammar you mean (e.g. `bp -g lua ...`). BP first
+searches `~/.config/bp/` for any grammar files you keep locally, then searches
+`/etc/xdg/bp/` for system-wide grammar files.
+
+Testing for these grammar files (other than `builtins`) is iffy at this point,
+so use at your own risk! These grammar files are only approximations of syntax.
+
+
 ## License
+
 BP is provided under the MIT license with the [Commons Clause](https://commonsclause.com/)
 (you can't sell this software without the developer's permission, but you're
 otherwise free to use, modify, and redistribute it free of charge).
author	Bruce Hill <bruce@bruce-hill.com>	2021-01-17 22:35:34 -0800
committer	Bruce Hill <bruce@bruce-hill.com>	2021-01-17 22:35:34 -0800
commit	ff928690bbcf2e534a7cc3ddc7583f2ec3f4bb02 (patch)
tree	b97fab3b0648008622804b4dbd69fd9594c67720 /README.md
parent	1b9e2957c0eeab592285f2cfcfd4e4ffe5586c97 (diff)