tail -f judofyr

2013-08-24

cat 21:11:44 UTC
Parsing

Why programming languages should use ambiguous grammars:


(1) Creating unambiguous grammars are hard

From "Pure and declarative syntax definition: paradise lost and
regained" [1]:

   A recent case study by Malloy et al. gives an interesting insight
   into the typical development process of a Bison-based LALR parser for
   the C# language. The case study started with a grammar from the C#
   language definition, and described how over the course of 19 revisions
   it was painstakingly factored to eliminate all 40 shift/reduce and 617
   reduce/reduce connflicts that were reported for it. According to Malloy
   et al., this is not a surprising number of conflicts for a grammar
   not specifically designed for the targeted grammar class. In their
   efforts to resolve all conflicts, they encountered various limitations
   of the parser: for example, using only 1 symbol for lookahead, their
   parser had problems distinguishing array types of the form int[][]
   and the form [,]. Another notable problem they ran into is the use
   of context-sensitive keywords: for example, the add keyword used for
   properties is not reserved in C#. In general, they had to make a large
   number of small changes for aspects of the original, natural grammar
   that were not supported by Bison.

An unambiguous grammar forces you to be so explicit that it no longer
communicates the real structure of your language.


(2) An ambiguous grammar puts you in control of handling ambiguities

There's many valid ways to handle ambiguities:

- Report syntax error
- Prioritize one choice
- Keep both versions and make a decision later


(3) There are decent algorithms available

GLL [2] parses in O(n) for LL(1) grammars and O(n^3) in worst-case.

Efficient tabular LR parsing [3] parses in O(n) for LR(0) grammars and
O(n^3) in worst-case.


--

Sources:

[1] Kats, Lennart CL, Eelco Visser, and Guido Wachsmuth. "Pure and declarative
syntax definition: paradise lost and regained." ACM Sigplan Notices. Vol. 45.
No. 10. ACM, 2010.

[2] Scott, Elizabeth, and Adrian Johnstone. "GLL parsing." Electronic
Notes in Theoretical Computer Science 253.7 (2010): 177-189.

[3] Nederhof, Mark-Jan, and Giorgio Satta. "Efficient tabular LR
parsing." Proceedings of the 34th annual meeting on Association for
Computational Linguistics. Association for Computational Linguistics, 1996.