Skip to content
This repository was archived by the owner on Oct 28, 2023. It is now read-only.

Patterns

Christophe VG edited this page Mar 2, 2017 · 2 revisions

On this page we collect examples of typical patterns that occur when designing grammars and investigate how to improve them using the possibilities of the Human Parser Generator.

Options

A common pattern in grammars is the encoding of options in the form of key-value pairs. Often the values are one out of a list of possibilities.

Starting from a Straightforward Standard Grammar

Imagine the following grammar...

grammar = { option } ;

option = "KEY1" ( "YES"| "NO" )
       | "KEY2" ( "SOME"| "ANY" | "NONE" )
       ;

Which can parse the example input below...

KEY1 YES
KEY2 ANY
KEY1 NO
KEY2 SOME

... into this AST:

new Grammar() {
  Options = new List<Option>() {
    new Option() {
      HasYes = True,
      HasNo = False,
      HasSome = False,
      HasAny = False,
      HasNone = False
    },new Option() {
      HasYes = False,
      HasNo = False,
      HasSome = False,
      HasAny = True,
      HasNone = False
    },new Option() {
      HasYes = False,
      HasNo = True,
      HasSome = False,
      HasAny = False,
      HasNone = False
    },new Option() {
      HasYes = False,
      HasNo = False,
      HasSome = True,
      HasAny = False,
      HasNone = False
    }
  }
}

All options here are fixed strings. In an alternatives construction, the ability to parse them will be implemented as boolean properties for each. Although straightforward, there are some issues with this result: We loose the link to the string prefixes and checking all possible outcomes doesn't help down the road.

Another problem, that is still hidden here is the fact that this approach doesn't allow for alternatives that only differ in capitalisation: "Yes" | "yes" would both generate a property HasYes, which will cause problems.

Introducing Extractors

With extractors we can extract the values:

grammar = { option } ;

option = "KEY1" ? /(YES|NO)/ ?
       | "KEY2" ? /(SOME|ANY|NONE)/ ?
       ;

This produces this AST:

new Grammar() {
  Options = new List<Option>() {
    new Option() {
      Option0 = "YES",
      Option1 = null
    },new Option() {
      Option0 = null,
      Option1 = "ANY"
    },new Option() {
      Option0 = "NO",
      Option1 = null
    },new Option() {
      Option0 = null,
      Option1 = "SOME"
    }
  }
}

Every option now contains two properties, one for each of the alternatives. The properties are named after the entity, Option, since the extractors don't have names. We can quickly improve this result by adding a naming annotation:

grammar = { option } ;

option = "KEY1" key1 @ ? /(YES|NO)/ ?
       | "KEY2" key2 @ ? /(SOME|ANY|NONE)/ ?
       ;

Which produces:

new Grammar() {
  Options = new List<Option>() {
    new Option() {
      Key1 = "YES",
      Key2 = null
    },new Option() {
      Key1 = null,
      Key2 = "ANY"
    },new Option() {
      Key1 = "NO",
      Key2 = null
    },new Option() {
      Key1 = null,
      Key2 = "SOME"
    }
  }
}

Not bad, but we don't really want all options still in one Option class.

Applying Polymorphism

By extracting the different options into rules of their own, inheritance and polymorphism are now possible:

grammar = { option } ;

option = key1 | key2 ;

key1 = "KEY1" value @ ? /(YES|NO)/ ? ;
key2 = "KEY2" value @ ? /(SOME|ANY|NONE)/ ? ;

This produces the following AST:

Options = new List<Option>() {
  new Key1() { Value = "YES"  },
  new Key2() { Value = "ANY"  },
  new Key1() { Value = "NO"   },
  new Key2() { Value = "SOME" }
}

Now, this looks like a nice way to represent a list of options. It also allows for mixing different styles of options.

When applying this grammar...

grammar = { option } ;

option = key1 | key2 | key3 | key4 ;

key1 = "KEY1" value @ ? /(YES|NO)/ ? ;
key2 = "KEY2" value @ ? /(SOME|ANY|NONE)/ ? ;
key3 = "KEY3" ;
key4 = "KEY4" name @ ? /"([^"]*)"/ ? index @ ? /([0-9]+)/ ?;

... to this input ...

KEY1 YES
KEY2 ANY
KEY1 NO
KEY2 SOME
KEY3
KEY4 "Christophe" 123

... produces this AST:

new Grammar() {
  Options = new List<Option>() {
    new Key1() { Value = "YES"},
    new Key2() { Value = "ANY"},
    new Key1() { Value = "NO"},
    new Key2() { Value = "SOME"},
    new Key3() { },
    new Key4() {
      Name = "Christophe",
      Index = "123"
    }
  }
}

Lists

Comma-(or whatever other string)-separated lists are sometimes a bit of a pain in BNF. The most normal way of writing them would be something like:

grammar       = { statement } ;
statement     = process ;
process       = "PROCESS" argument-list ;
argument-list = argument [ _ @ "," argument-list ] ;
argument      = ? /(\w+)/ ? ;

This would parse this example input ...

PROCESS a
PROCESS a, b
PROCESS a, b, c

... to ...

new Grammar() {
  Statements = new List<ArgumentList>() {
    new ArgumentList() {
      Argument = "a",
      NextArgumentList = null
    },new ArgumentList() {
      Argument = "a",
      NextArgumentList = new ArgumentList() {
        Argument = "b",
        NextArgumentList = null
      }
    },new ArgumentList() {
      Argument = "a",
      NextArgumentList = new ArgumentList() {
        Argument = "b",
        NextArgumentList = new ArgumentList() {
          Argument = "c",
          NextArgumentList = null
        }
      }
    }
  }
}

We already know we can clean that up a bit using name annotations.

In EBNF, we can rewrite that a bit and partially remove the recursive data structure to a first argument and more arguments:

grammar       = { statement } ;
statement     = process ;
process       = "PROCESS" first @ argument { "," more @ argument } ;
argument      = ? /(\w+)/ ? ;

Which produces the following AST:

new Grammar() {
  Statements = new List<Process>() {
    new Process() {
      First = "a",
      Mores = new List<string>() {}
    },new Process() {
      First = "a",
      Mores = new List<string>() { b }
    },new Process() {
      First = "a",
      Mores = new List<string>() { b, c }
    }
  }
}

Now, here we have already redesigned the grammar too much. By adding the name annotations, to get nice property names, First and More_s_, we have undermined an implicitly supported grammar refactoring implemented by the Human Parser Generator.

Removing the name annotations:

grammar       = { statement } ;
statement     = process ;
process       = "PROCESS" argument { "," argument } ;
argument      = ? /(\w+)/ ? ;

actually produces and AST like this:

new Grammar() {
  Statements = new List<Process>() {
    new Process() {
      Arguments = new List<string>() { a }
    },new Process() {
      Arguments = new List<string>() { a, b }
    },new Process() {
      Arguments = new List<string>() { a, b, c }
    }
  }
}

And isn't that what we wanted all along? ;-)

HPG looks for grammar patterns of the form entity { "..." entity }, removes the property for the former and redirects the parsing of that part to the property for the latter. And the string doesn't have to be a colon/comma , but can be anything separating the list items.

Clone this wiki locally