3.1 KiB
Left Recursion in BP
Left recursion presents a special difficulty for parsing expression grammars. A regular recursive rule looks like this:
rec: "{" *rec "}"
This is a relatively simple case to handle, because we're always making forward
progress. In other words, calling match(<rec>, str)
might call
match(<rec>, str + 1)
but it will never recursively call the same function
with the same arguments: match(<rec>, str)
.
Left-recursive rules, on the other hand, are rules that may attempt to match themselves at the same position. For example:
laugh: (laugh "ha") / "Ha"
The rule above is functionally equivalent to laugh: "Ha" (*"ha")
, but it's
written in a recursive style. It should be obvious by inspection that the
string Hahaha
matches both versions of the rule, but a naive implementation
of match()
would cause match(<laugh>, str)
to call match(<laugh>, str)
before doing anything else, which results in infinite recursion and a stack
overflow. One option is to simply detect left recursion and cause a compilation
error (this is what LPEG does). However, it is possible to actually match left
recursive rules without overflowing the stack.
Matching Left Recursion
The solution used in BP is the following:
-
Whenever matching a named rule,
foo
, against stringstr
, first get the originally defined pattern for that rule. Call thisfoo-original-definition
. -
Temporarily define
match(<foo>, str) := signal left recursion and return Fail
(In other words,<foo>
is temporarily defined to signal left recursion and then fail to match if matching against the stringstr
.) -
Run
result := match(<foo-original-definition>, str)
. -
If the match failed, then return failure.
-
If the match was successful and did not trigger left recursion, return
result
. -
If the match was successful, but did trigger left recursion, then:
a. If
result
is not longer than any previousresult
, stop looping and return the longestresult
to avoid an infinite loop. b. Otherwise: temporarily definematch(<foo>, str) := signal left recursion and return <result>
c. Go to step 3.
This handles left recursion as a simple loop and successfully matches left recursive patterns, even ones that are indirectly or nontrivially left recursive.
Example
Consider the rule laugh: laugh "ha" / "Ha"
being applied to the input text "Hahaha!"
:
Temp. definition of match(<laugh>, "Hahaha!") |
Result of match(<laugh "ha" / "Ha">, "Hahaha!") |
---|---|
Fail |
Match{"Ha"} |
Match{"Ha"} |
Match{"Haha"} |
Match{"Haha"} |
Match{"Hahaha"} |
Match{"Hahaha"} |
Match{"Ha"} (Forward progress stops being made) |
As you can see in this example, each successive iteration builds up a final
match incrementally until progress is no longer being made. At that point,
the longest match is returned: Match{"Hahaha"}
.