aboutsummaryrefslogtreecommitdiff
path: root/left-recursion.md
diff options
context:
space:
mode:
authorBruce Hill <bruce@bruce-hill.com>2021-01-14 19:21:31 -0800
committerBruce Hill <bruce@bruce-hill.com>2021-01-14 19:21:31 -0800
commit3cd61e3abc79a80d38afa9c633c59585aeb0311a (patch)
treeebc97bfb416d0e9a0a486f754221b21a5a05fbc2 /left-recursion.md
parent9238ffea8819461687e24ea45e466403faa68681 (diff)
Overhaul of memory tracking and left recursion. Added explanation doc
for left recursion and fixed all visible memory leaks.
Diffstat (limited to 'left-recursion.md')
-rw-r--r--left-recursion.md68
1 files changed, 68 insertions, 0 deletions
diff --git a/left-recursion.md b/left-recursion.md
new file mode 100644
index 0000000..3e65e1c
--- /dev/null
+++ b/left-recursion.md
@@ -0,0 +1,68 @@
+# Left Recursion in BP
+
+Left recursion presents a special difficulty for parsing expression grammars.
+A regular recursive rule looks like this:
+
+```
+rec: "{" *rec "}"
+```
+
+This is a relatively simple case to handle, because we're always making forward
+progress. In other words, calling `match(<rec>, str)` might call
+`match(<rec>, str + 1)` but it will never recursively call the same function
+with the same arguments: `match(<rec>, str)`.
+
+Left-recursive rules, on the other hand, are rules that may attempt to match
+themselves at the same position. For example:
+
+```
+laugh: (laugh "ha") / "Ha"
+```
+
+The rule above is functionally equivalent to `laugh: "Ha" (*"ha")`, but it's
+written in a recursive style. It should be obvious by inspection that the
+string `Hahaha` matches both versions of the rule, but a naive implementation
+of `match()` would cause `match(<laugh>, str)` to call `match(<laugh>, str)`
+before doing anything else, which results in infinite recursion and a stack
+overflow. One option is to simply detect left recursion and cause a compilation
+error (this is what LPEG does). However, it is possible to actually match left
+recursive rules without overflowing the stack.
+
+## Matching Left Recursion
+
+The solution used in BP is the following:
+
+1. Whenever matching a named rule, `foo`, against string `str`, first get the
+ originally defined pattern for that rule. Call this
+ `foo-original-definition`.
+2. Temporarily define `match(<foo>, str) := signal left recursion and return
+ Fail` (In other words, `<foo>` is temporarily defined to signal left
+ recursion and then fail to match if matching against the string `str`.)
+3. Run `result := match(<foo-original-definition>, str)`.
+4. If the match failed, then return failure.
+5. If the match was successful and did not trigger left recursion, return `result`.
+6. If the match was successful, but did trigger left recursion, then:
+
+ a. If `result` is not longer than any previous `result`, stop looping and
+ return the longest `result` to avoid an infinite loop.
+ b. Otherwise: temporarily define `match(<foo>, str) := signal left recursion and return <result>`
+ c. Go to step 3.
+
+This handles left recursion as a simple loop and successfully matches left
+recursive patterns, even ones that are indirectly or nontrivially left
+recursive.
+
+## Example
+
+Consider the rule `laugh: laugh "ha" / "Ha"` being applied to the input text `"Hahaha!"`:
+
+|Definition of `match(<laugh>, "Hahaha!")` | Result of `match(<laugh "ha" / "Ha">, "Hahaha!")` |
+|------------------------------------------|---------------------------------------------------|
+|`Fail` | `Match{"Ha"}` |
+|`Match{"Ha"}` | `Match{"Haha"}` |
+|`Match{"Haha"}` | `Match{"Hahaha"}` |
+|`Match{"Hahaha"}` | `Match{"Ha"}` (Forward progress stops being made) |
+
+As you can see in this example, each successive iteration builds up a final
+match incrementally until progress is no longer being made. At that point,
+the longest match is returned: `Match{"Hahaha"}`.