Page 1 of 1

How to implement a list in a vipgrammar?

Posted: 5 Aug 2015 15:55
by B.Hooijenga
These are my first steps in using the VIPLALR parser.
As a kind of training I started width the following:

This is a very simplified example of a contact

begin:vcard
N:John
end:vcard

A contact must start with begin:vcard and must end with end:vcard.
In between are one or more contentlines. Like:

begin:vcard
VERSION:3.0
N:John
FN: John Neumann
end:vcard

I made a grammar for reading a contact having one contentline, like the first example. Parsing succeeds. The output from the parser is

Code: Select all

vcard("begin",":","vcard",contentline("N",":","John"),"end",":","vcard")
The grammar goes as follows:

Code: Select all

grammar vCardgrm open vCard, vCardgrmSem   nonterminals vcrd : vcard. rules vcrd {mkVCard(Begin,Colon,VC,CL,End,Colon1,VC1)}==> [t_prop] {Begin}, [t_colon] {Colon}, [t_prop] {VC},cntl {CL},[t_prop] {End}, [t_colon] {Colon1}, [t_prop] {VC1}.     nonterminals cntl : contentline. rules cntl {mkCtl(N,Colon,NV)}==> [t_prop] {N},[t_colon] {Colon}, [t_prop] {NV}.     end grammar vCardgrm
In order to read the second example with more contentlines a list is needed in the grammar.
How to do that?
I could not find any documentation.
Perhaps the lrparserdomains have something to do with it?.

Kind regards

Ben

Posted: 5 Aug 2015 21:21
by Thomas Linder Puls
vcards have a form which doesn't really need a "real" parser. But assuming that we want to do it anyway we will have to make a few changes.

The main change is that all your lines have the grammatical form:

Code: Select all

[t_prop] [t_colon] [t_prop]
So there is no difference between "inner" lines and the surrounding vcard lines.

So I think we should change "begin", "end" and "vcard" into individual terminal symbols/keywords: [t_begin], [t_end], and [t_vcard].

Given that a vcard have the form:

Code: Select all

[t_begin] [t_colon] [t_vcard] [t_prop] [t_colon] [t_prop] [t_prop] [t_colon] [t_prop] ... [t_prop] [t_colon] [t_prop] [t_end] [t_colon] [t_vcard]
Let us first ignore collecting the result and just consider the recognition.

The repetition part in the middle (i.e. your question) will be handled using recursion. This is an obvious solution:

Code: Select all

rules     vcrd  ==>         [t_begin], [t_colon], [t_vcard],         cntlines,         [t_end], [t_colon], [t_vcard].   rules     cntlines  ==> .       cntlines  ==> cntl, cntlines.   rules     cntl ==> [t_prop], [t_colon], [t_prop].
You have in my opinion collected a lot of irrelevant: the colons are not interesting, there will always be a colon in those places, so they may just as well be skipped completely.

The result could look like this:

Code: Select all

grammar vCardgrm     open vCard, vCardgrmSem   nonterminals     vcrd : vcard. rules     vcrd { vcard(CL) } ==>         [t_begin],         [t_colon],         [t_vcard],         cntlines { CL },         [t_end],         [t_colon],         [t_vcard].   nonterminals     cntlines : cntl*. rules     cntlines { [] } ==>         .       cntlines { [C | CL] } ==>         cntl { C },         cntlines { CL }.   nonterminals     cntl : cntl. rules     cntl { cntl(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.   end grammar vCardgrm
The parse tree for your second vcard will produce this parse tree:

Code: Select all

vcard([contentline("VERSION", "3.0"), contentline("N", "John"), contentline("FN", "John Neumann")])
LALR grammars and right recursion is not too good, often it will cause so called shift-reduce conflicts (though, not in this case) and in any case they parse stack will grow with the length of the parsed sequence.

For the sake of the recognized language we can easily make cntlines left recursive like this instead:

Code: Select all

rules     cntlines  ==> .       cntlines  ==> cntlines, cntl.
This difference does not make any difference for the recognized language, but now we kind find the lines in the opposite order of what we need for the list we want to collect.

The support class contains a reverse list domain (revList) to assist with this situation. The left-recursive solution could look like this:

Code: Select all

grammar vCardgrm     open vCard, vCardgrmSem   nonterminals     vcrd : vcard. rules     vcrd { vcard(unRevList(CL)) } ==>         [t_begin],         [t_colon],         [t_vcard],         cntlines { CL },         [t_end],         [t_colon],         [t_vcard].   nonterminals     cntlines : revList{contentline}. rules     cntlines { nil } ==>         .       cntlines { consRear(CL, C) } ==>         cntlines { CL },         cntl { C }.   nonterminals     cntl : contentline. rules     cntl { contentline(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.   end grammar vCardgrm

Posted: 6 Aug 2015 18:27
by B.Hooijenga
Thomas, thanks very much for your clarification.

It is working.

Yes, I gathered a lot of irrelevant stuff. I wanted for my first try-out as much output from the parser as possible.
And I also did not know yet how to prevent this ...... .
I now see that this can be arranged in the grammar.

I am not sure about the fact that vcard and contentline can have the same structure.
My example is a stripped_off version of a real world problem.
If someone is interested, here is a link: https://en.wikipedia.org/wiki/VCard
The formal grammar for the contact-language can be found here: https://tools.ietf.org/html/rfc2426

Thomas, thanks again.

Kind regards

Ben

Posted: 10 Aug 2015 17:06
by Peter Muraya
Thanks
I'm also new to the vipLalr parser and this example has been very helpful. I would like a few clarifications.

1 Which classes/interfaces is the open vCard,vCardGrm referring to?

2 If a terminal token is demarcated by the square brackets, why is it necessary to have the t_ prefix in its name? Without it we would have the neater recognition pattern [prop],[colon],[prop]

3 Is the following rule that uses the core::tuple definition valid?

Code: Select all

nonterminals     cntl : core::tuple{string, string}. rules     cntl { core::tuple(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.

Posted: 10 Aug 2015 20:17
by Thomas Linder Puls
The t_ is completely optional.

But some of the example grammars uses that convention.

In the Prolog part of the code the terminals will be collected into a terminal domain:

Code: Select all

domains     terminal = terminal1; terminal2; ...
And here they don't appear in brackets. But it is completely a matter of taste.

The Visual Prolog grammar is based a YACC grammar specification and that specification contained "Tokens" of the form T_XXXX, so we simply translated them to t_xxx.

Posted: 11 Aug 2015 6:25
by Peter Muraya
Thomas, thanks, especially for the reference to YACC grammar specification.

Posted: 11 Aug 2015 11:22
by Thomas Linder Puls
Also notice this wiki article: LALR Parser Generator.

And Wikipedia: LALR parser.

Posted: 11 Aug 2015 16:39
by Peter Muraya
Thank you.

Posted: 12 Aug 2015 9:54
by B.Hooijenga
Hello Peter,
1 Which classes/interfaces is the open vCard,vCardGrm referring to?
It is an instruction to the parsergenerator.
The generator adds it to vcardGrm.pro as you can see:

Code: Select all

implement vcardGrm     supports parserTable{terminal, sem_, state_, nonterminal_}     inherits parser{terminal, sem_, state_, nonterminal_}     open pfc\syntax\lrParser\, lrParserDomains     open vCard, vCardgrmSem    
class vCard contains this:

Code: Select all

    class vCard     open core   domains     vcard = vcard(string Begin,string Colon, string VCard,contentlines Contentlines,string End,string Colon, string VCard).   domains contentlines = contentline*.   domains     contentline = contentline(string Name,string StringValue).    
And vCardgrmSem goes as follows:

Code: Select all

  class vCardGrmSem     open core, vCard, pfc\syntax\, pfc\syntax\syntax, pfc\syntax\lrParser\lrParserDomains   predicates     mkVCard : (string Begin,string Colon,string VC,contentlines CLS,string End,string Colon,string VC) -> vCard.   predicates     mkctl : (string Name,string NameValue) -> contentline.    
All the codefragments are already belonging to the new grammar I wrote, after Thomas helped me out.
This is this grammar:

Code: Select all

grammar vCardgrm open vCard, vCardgrmSem   nonterminals vcrd : vcard. rules vcrd {mkVCard(Begin,Colon,VC,unrevlist(CL),End,Colon1,VC1)}==> [t_prop] {Begin}, [t_colon] {Colon}, [t_prop] {VC},cntlines {CL},[t_prop] {End}, [t_colon] {Colon1}, [t_prop] {VC1}.     nonterminals     cntlines : revList{contentline}. rules     cntlines { nil } ==>         .       cntlines { consRear(CL, C) } ==>         cntlines { CL },         cntl { C }.   nonterminals     cntl : contentline. rules     cntl { contentline(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.     end grammar vCardgrm
Kind regards

Ben

Posted: 12 Aug 2015 16:52
by Peter Muraya
Hello B.Hooijenga,
The impression I had was that, yes, open vCard, vCardGrm is indeed an instruction to the parser generator to open files that are yet to be produced. That is what I found strange.

I will do a few tests with different grammars in order to understand this generator much better; its worth the effort as it looks much more powerful than the earlier version which I have been using for my project.