Discussions related to Visual Prolog
B.Hooijenga
VIP Member
Posts: 57
Joined: 11 Jul 2002 23:01

How to implement a list in a vipgrammar?

Unread post by B.Hooijenga »

These are my first steps in using the VIPLALR parser.
As a kind of training I started width the following:

This is a very simplified example of a contact

begin:vcard
N:John
end:vcard

A contact must start with begin:vcard and must end with end:vcard.
In between are one or more contentlines. Like:

begin:vcard
VERSION:3.0
N:John
FN: John Neumann
end:vcard

I made a grammar for reading a contact having one contentline, like the first example. Parsing succeeds. The output from the parser is

Code: Select all

vcard("begin",":","vcard",contentline("N",":","John"),"end",":","vcard")
The grammar goes as follows:

Code: Select all

grammar vCardgrm open vCard, vCardgrmSem   nonterminals vcrd : vcard. rules vcrd {mkVCard(Begin,Colon,VC,CL,End,Colon1,VC1)}==> [t_prop] {Begin}, [t_colon] {Colon}, [t_prop] {VC},cntl {CL},[t_prop] {End}, [t_colon] {Colon1}, [t_prop] {VC1}.     nonterminals cntl : contentline. rules cntl {mkCtl(N,Colon,NV)}==> [t_prop] {N},[t_colon] {Colon}, [t_prop] {NV}.     end grammar vCardgrm
In order to read the second example with more contentlines a list is needed in the grammar.
How to do that?
I could not find any documentation.
Perhaps the lrparserdomains have something to do with it?.

Kind regards

Ben
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls »

vcards have a form which doesn't really need a "real" parser. But assuming that we want to do it anyway we will have to make a few changes.

The main change is that all your lines have the grammatical form:

Code: Select all

[t_prop] [t_colon] [t_prop]
So there is no difference between "inner" lines and the surrounding vcard lines.

So I think we should change "begin", "end" and "vcard" into individual terminal symbols/keywords: [t_begin], [t_end], and [t_vcard].

Given that a vcard have the form:

Code: Select all

[t_begin] [t_colon] [t_vcard] [t_prop] [t_colon] [t_prop] [t_prop] [t_colon] [t_prop] ... [t_prop] [t_colon] [t_prop] [t_end] [t_colon] [t_vcard]
Let us first ignore collecting the result and just consider the recognition.

The repetition part in the middle (i.e. your question) will be handled using recursion. This is an obvious solution:

Code: Select all

rules     vcrd  ==>         [t_begin], [t_colon], [t_vcard],         cntlines,         [t_end], [t_colon], [t_vcard].   rules     cntlines  ==> .       cntlines  ==> cntl, cntlines.   rules     cntl ==> [t_prop], [t_colon], [t_prop].
You have in my opinion collected a lot of irrelevant: the colons are not interesting, there will always be a colon in those places, so they may just as well be skipped completely.

The result could look like this:

Code: Select all

grammar vCardgrm     open vCard, vCardgrmSem   nonterminals     vcrd : vcard. rules     vcrd { vcard(CL) } ==>         [t_begin],         [t_colon],         [t_vcard],         cntlines { CL },         [t_end],         [t_colon],         [t_vcard].   nonterminals     cntlines : cntl*. rules     cntlines { [] } ==>         .       cntlines { [C | CL] } ==>         cntl { C },         cntlines { CL }.   nonterminals     cntl : cntl. rules     cntl { cntl(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.   end grammar vCardgrm
The parse tree for your second vcard will produce this parse tree:

Code: Select all

vcard([contentline("VERSION", "3.0"), contentline("N", "John"), contentline("FN", "John Neumann")])
LALR grammars and right recursion is not too good, often it will cause so called shift-reduce conflicts (though, not in this case) and in any case they parse stack will grow with the length of the parsed sequence.

For the sake of the recognized language we can easily make cntlines left recursive like this instead:

Code: Select all

rules     cntlines  ==> .       cntlines  ==> cntlines, cntl.
This difference does not make any difference for the recognized language, but now we kind find the lines in the opposite order of what we need for the list we want to collect.

The support class contains a reverse list domain (revList) to assist with this situation. The left-recursive solution could look like this:

Code: Select all

grammar vCardgrm     open vCard, vCardgrmSem   nonterminals     vcrd : vcard. rules     vcrd { vcard(unRevList(CL)) } ==>         [t_begin],         [t_colon],         [t_vcard],         cntlines { CL },         [t_end],         [t_colon],         [t_vcard].   nonterminals     cntlines : revList{contentline}. rules     cntlines { nil } ==>         .       cntlines { consRear(CL, C) } ==>         cntlines { CL },         cntl { C }.   nonterminals     cntl : contentline. rules     cntl { contentline(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.   end grammar vCardgrm
Regards Thomas Linder Puls
PDC
B.Hooijenga
VIP Member
Posts: 57
Joined: 11 Jul 2002 23:01

Unread post by B.Hooijenga »

Thomas, thanks very much for your clarification.

It is working.

Yes, I gathered a lot of irrelevant stuff. I wanted for my first try-out as much output from the parser as possible.
And I also did not know yet how to prevent this ...... .
I now see that this can be arranged in the grammar.

I am not sure about the fact that vcard and contentline can have the same structure.
My example is a stripped_off version of a real world problem.
If someone is interested, here is a link: https://en.wikipedia.org/wiki/VCard
The formal grammar for the contact-language can be found here: https://tools.ietf.org/html/rfc2426

Thomas, thanks again.

Kind regards

Ben
Peter Muraya
VIP Member
Posts: 147
Joined: 5 Dec 2012 7:29

Unread post by Peter Muraya »

Thanks
I'm also new to the vipLalr parser and this example has been very helpful. I would like a few clarifications.

1 Which classes/interfaces is the open vCard,vCardGrm referring to?

2 If a terminal token is demarcated by the square brackets, why is it necessary to have the t_ prefix in its name? Without it we would have the neater recognition pattern [prop],[colon],[prop]

3 Is the following rule that uses the core::tuple definition valid?

Code: Select all

nonterminals     cntl : core::tuple{string, string}. rules     cntl { core::tuple(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.
Mutall Data Management Technical Support
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls »

The t_ is completely optional.

But some of the example grammars uses that convention.

In the Prolog part of the code the terminals will be collected into a terminal domain:

Code: Select all

domains     terminal = terminal1; terminal2; ...
And here they don't appear in brackets. But it is completely a matter of taste.

The Visual Prolog grammar is based a YACC grammar specification and that specification contained "Tokens" of the form T_XXXX, so we simply translated them to t_xxx.
Regards Thomas Linder Puls
PDC
Peter Muraya
VIP Member
Posts: 147
Joined: 5 Dec 2012 7:29

Unread post by Peter Muraya »

Thomas, thanks, especially for the reference to YACC grammar specification.
Mutall Data Management Technical Support
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls »

Also notice this wiki article: LALR Parser Generator.

And Wikipedia: LALR parser.
Regards Thomas Linder Puls
PDC
Peter Muraya
VIP Member
Posts: 147
Joined: 5 Dec 2012 7:29

Unread post by Peter Muraya »

Thank you.
Mutall Data Management Technical Support
B.Hooijenga
VIP Member
Posts: 57
Joined: 11 Jul 2002 23:01

Unread post by B.Hooijenga »

Hello Peter,
1 Which classes/interfaces is the open vCard,vCardGrm referring to?
It is an instruction to the parsergenerator.
The generator adds it to vcardGrm.pro as you can see:

Code: Select all

implement vcardGrm     supports parserTable{terminal, sem_, state_, nonterminal_}     inherits parser{terminal, sem_, state_, nonterminal_}     open pfc\syntax\lrParser\, lrParserDomains     open vCard, vCardgrmSem    
class vCard contains this:

Code: Select all

    class vCard     open core   domains     vcard = vcard(string Begin,string Colon, string VCard,contentlines Contentlines,string End,string Colon, string VCard).   domains contentlines = contentline*.   domains     contentline = contentline(string Name,string StringValue).    
And vCardgrmSem goes as follows:

Code: Select all

  class vCardGrmSem     open core, vCard, pfc\syntax\, pfc\syntax\syntax, pfc\syntax\lrParser\lrParserDomains   predicates     mkVCard : (string Begin,string Colon,string VC,contentlines CLS,string End,string Colon,string VC) -> vCard.   predicates     mkctl : (string Name,string NameValue) -> contentline.    
All the codefragments are already belonging to the new grammar I wrote, after Thomas helped me out.
This is this grammar:

Code: Select all

grammar vCardgrm open vCard, vCardgrmSem   nonterminals vcrd : vcard. rules vcrd {mkVCard(Begin,Colon,VC,unrevlist(CL),End,Colon1,VC1)}==> [t_prop] {Begin}, [t_colon] {Colon}, [t_prop] {VC},cntlines {CL},[t_prop] {End}, [t_colon] {Colon1}, [t_prop] {VC1}.     nonterminals     cntlines : revList{contentline}. rules     cntlines { nil } ==>         .       cntlines { consRear(CL, C) } ==>         cntlines { CL },         cntl { C }.   nonterminals     cntl : contentline. rules     cntl { contentline(P, V) } ==>         [t_prop] { P },         [t_colon],         [t_prop] { V }.     end grammar vCardgrm
Kind regards

Ben
Peter Muraya
VIP Member
Posts: 147
Joined: 5 Dec 2012 7:29

Unread post by Peter Muraya »

Hello B.Hooijenga,
The impression I had was that, yes, open vCard, vCardGrm is indeed an instruction to the parser generator to open files that are yet to be produced. That is what I found strange.

I will do a few tests with different grammars in order to understand this generator much better; its worth the effort as it looks much more powerful than the earlier version which I have been using for my project.
Mutall Data Management Technical Support
Post Reply