code / bp

Lines4.3K C3.3K Markdown541 YAML273 make110 Shell77 Lua54
(68 lines)

Left Recursion in BP

Left recursion presents a special difficulty for parsing expression grammars. A regular recursive rule looks like this:

rec: "{" *rec "}"

This is a relatively simple case to handle, because we're always making forward progress. In other words, calling match(<rec>, str) might call match(<rec>, str + 1) but it will never recursively call the same function with the same arguments: match(<rec>, str).

Left-recursive rules, on the other hand, are rules that may attempt to match themselves at the same position. For example:

laugh: (laugh "ha") / "Ha"

The rule above is functionally equivalent to laugh: "Ha" (*"ha"), but it's written in a recursive style. It should be obvious by inspection that the string Hahaha matches both versions of the rule, but a naive implementation of match() would cause match(<laugh>, str) to call match(<laugh>, str) before doing anything else, which results in infinite recursion and a stack overflow. One option is to simply detect left recursion and cause a compilation error (this is what LPEG does). However, it is possible to actually match left recursive rules without overflowing the stack.

Matching Left Recursion

The solution used in BP is the following:

  1. Whenever matching a named rule, foo, against string str, first get the originally defined pattern for that rule. Call this foo-original-definition.

  2. Temporarily define match(<foo>, str) := signal left recursion and return Fail (In other words, <foo> is temporarily defined to signal left recursion and then fail to match if matching against the string str.)

  3. Run result := match(<foo-original-definition>, str).

  4. If the match failed, then return failure.

  5. If the match was successful and did not trigger left recursion, return result.

  6. If the match was successful, but did trigger left recursion, then:

    a. If result is not longer than any previous result, stop looping and return the longest result to avoid an infinite loop. b. Otherwise: temporarily define match(<foo>, str) := signal left recursion and return <result> c. Go to step 3.

This handles left recursion as a simple loop and successfully matches left recursive patterns, even ones that are indirectly or nontrivially left recursive.

Example

Consider the rule laugh: laugh "ha" / "Ha" being applied to the input text "Hahaha!":

Temp. definition of match(<laugh>, "Hahaha!") Result of match(<laugh "ha" / "Ha">, "Hahaha!")
Fail Match{"Ha"}
Match{"Ha"} Match{"Haha"}
Match{"Haha"} Match{"Hahaha"}
Match{"Hahaha"} Match{"Ha"} (Forward progress stops being made)

As you can see in this example, each successive iteration builds up a final match incrementally until progress is no longer being made. At that point, the longest match is returned: Match{"Hahaha"}.

1 # Left Recursion in BP
3 Left recursion presents a special difficulty for parsing expression grammars.
4 A regular recursive rule looks like this:
6 ```
7 rec: "{" *rec "}"
8 ```
10 This is a relatively simple case to handle, because we're always making forward
11 progress. In other words, calling `match(<rec>, str)` might call
12 `match(<rec>, str + 1)` but it will never recursively call the same function
13 with the same arguments: `match(<rec>, str)`.
15 Left-recursive rules, on the other hand, are rules that may attempt to match
16 themselves at the same position. For example:
18 ```
19 laugh: (laugh "ha") / "Ha"
20 ```
22 The rule above is functionally equivalent to `laugh: "Ha" (*"ha")`, but it's
23 written in a recursive style. It should be obvious by inspection that the
24 string `Hahaha` matches both versions of the rule, but a naive implementation
25 of `match()` would cause `match(<laugh>, str)` to call `match(<laugh>, str)`
26 before doing anything else, which results in infinite recursion and a stack
27 overflow. One option is to simply detect left recursion and cause a compilation
28 error (this is what LPEG does). However, it is possible to actually match left
29 recursive rules without overflowing the stack.
31 ## Matching Left Recursion
33 The solution used in BP is the following:
35 1. Whenever matching a named rule, `foo`, against string `str`, first get the
36 originally defined pattern for that rule. Call this
37 `foo-original-definition`.
38 2. Temporarily define `match(<foo>, str) := signal left recursion and return
39 Fail` (In other words, `<foo>` is temporarily defined to signal left
40 recursion and then fail to match if matching against the string `str`.)
41 3. Run `result := match(<foo-original-definition>, str)`.
42 4. If the match failed, then return failure.
43 5. If the match was successful and did not trigger left recursion, return `result`.
44 6. If the match was successful, but did trigger left recursion, then:
46 a. If `result` is not longer than any previous `result`, stop looping and
47 return the longest `result` to avoid an infinite loop.
48 b. Otherwise: temporarily define `match(<foo>, str) := signal left recursion and return <result>`
49 c. Go to step 3.
51 This handles left recursion as a simple loop and successfully matches left
52 recursive patterns, even ones that are indirectly or nontrivially left
53 recursive.
55 ## Example
57 Consider the rule `laugh: laugh "ha" / "Ha"` being applied to the input text `"Hahaha!"`:
59 | Temp. definition of `match(<laugh>, "Hahaha!")` | Result of `match(<laugh "ha" / "Ha">, "Hahaha!")` |
60 |------------------------------------------|---------------------------------------------------|
61 |`Fail` | `Match{"Ha"}` |
62 |`Match{"Ha"}` | `Match{"Haha"}` |
63 |`Match{"Haha"}` | `Match{"Hahaha"}` |
64 |`Match{"Hahaha"}` | `Match{"Ha"}` (Forward progress stops being made) |
66 As you can see in this example, each successive iteration builds up a final
67 match incrementally until progress is no longer being made. At that point,
68 the longest match is returned: `Match{"Hahaha"}`.