2010/09/21

Hyphen(-) should be put at the Left Bound in a Class of PEG

On PEG, hyphen(-) in a class indicates range of characters. For example, [A-Za-z] means all the alphabets in ASCII characters. Thus, to indicate a character '-', the '-' is put at the bounds of a class, for example, [-A-Za-z]. Then, the '-' should be put at the left bound because the '-' at the right bound is parsed as unexpected.

Regular expression has similar syntax, and some implementations allow both the left and right bounds. So, [-A-Za-z] equals to [A-Za-z-]. (But, I do not know such behavior is valid or not in strict regular expression)

But at least official PEG syntax does not allow the right bound.

Literal <- ['] (!['] Char)* ['] Spacing
/ ["] (!["] Char)* ["] Spacing
Class <- '[' (!']' Range)* ']' Spacing
Range <- Char '-' Char / Char
Char <- '<- ’\\’ [nrt’"\[\]\\]
/ ’\\’ [0-2][0-7][0-7]
/ ’\\’ [0-7][0-7]?
/ !’\\’ .

In the above grammar, although the Class's expression excepts ']' from the class, Range's expression accepts the ']' and '-' as the Char that is following the '-'. Thus, the Char can match the last '-' of [A-Za-z-] and ']' become the second Char of a Range. Then, the class is not enclosed by the ']', which probably causes unexpected behavior on parsing.

On the other hand, as hyphen at the left bound is parsed as not Range but a single Char, such problem does not occur.

As the result, hyphen(-) should be put at the left bound in a class of PEG.

No comments:

How to set parameters to debugging program on Visual Studio 2019 with CMake

Solution: MSDN Sometimes the "Debug and Launch Settings for CMake" bottun is disabled. In this case, change to the target view. ...