FAQFAQ   SearchSearch   MemberlistMemberlist   RegisterRegister   ProfileProfile   Log inLog in 


Another tokenise

Post new topic   Reply to topic    discuss.visual-prolog.com Forum Index -> Visual Prolog Tips & Samples
View previous topic :: View next topic  
Author Message
Steve Lympany



Horsham, England
Joined: 31 Mar 2001
Posts: 1152

PostPosted: 12 May 2011 14:27    Post subject: Another tokenise Reply with quote

Maybe this is useful to someone. Jump to the bottom to see examples.


class predicates
   mytokenise:(string)->string*.
clauses
   mytokenise(STR)=L:-
      Rev=mytokprivate(STR,[""]),
      L=list::reverse(Rev).

class facts
   zz_splitby_items:string*:=[].
   zz_previous_char_was_a_split:boolean:=false.
class predicates
   mytokprivate:(string,string*)->string*.
clauses
   mytokprivate("",L)=L:-!.
   mytokprivate(STR,List)=List1:-
      string::front(STR,1,First,Last),
      list::isMember(First,zz_splitby_items),!, %then split
      if zz_previous_char_was_a_split=true then
         List1=mytokprivate(Last,List)
      else
         zz_previous_char_was_a_split:=true,
         List1=mytokprivate(Last,[""|List])
      end if.
   mytokprivate(STR,[S|List])=List1:-
      zz_previous_char_was_a_split:=false,
      string::front(STR,1,First,Last),!,
      List1=mytokprivate(Last,[string::concat(S,First)|List]).
   mytokprivate(_,L)=L:-!.
class predicates
   test:().
clauses
   test():-
      zz_splitby_items:=[" ",";"],
      TOKS=mytokenise("he was on-call; he wasn't oncall; ;;; three222"),
      stdio::write(TOKS).

You need to set the fact zz_splitby_items - here it is :=[" ",";"], so only splitting the string when there is a space or a semi-colon.

 "he was on-call; he wasn't oncall; ;;; three222"

results in

["he","was","on-call","he","wasn't","oncall","three222"]

A more complex version:

This manages text in pairs of quotes or brackets. eg <hello there> is not split

class predicates
   mytokenise:(string)->string*.
clauses
   mytokenise(STR)=L:-
      Rev=mytokprivate(STR,not_in_pair,[""]),
%it's all backwards (saves using append), so just reverse
      L=list::reverse(Rev).

domains
   pair=pair(string,string);
         pair_same_char(string). %eg within single quotes
   within_pair=within_pair;not_in_pair.
class facts
   zz_splitby_items:string*:=[].
   zz_previous_char_was_a_split:boolean:=false.
   ndb_dont_split:(pair).
class predicates
   mytokprivate:(string,within_pair,string*)->string*.
clauses
   mytokprivate("",_,L)=L:-!.
%the source string is always split, so do that as a first step.
   mytokprivate(STR,Within_pair,List)=List1:-
      string::front(STR,1,First,Last),
      List1=mytokprivate2(First,Last,Within_pair,List),!.


class predicates
   mytokprivate2:(string First,string Last,within_pair,string*)->string*.
clauses
%current token is not in a pair of quotes or brackets
%create new token
   mytokprivate2(First,Last,not_in_pair,List)=List1:-
      list::isMember(First,zz_splitby_items),!, %then split
      if zz_previous_char_was_a_split=true then %prevent empty tokens being created
         List1=mytokprivate(Last,not_in_pair,List)
      else
         zz_previous_char_was_a_split:=true,
         List1=mytokprivate(Last,not_in_pair,[""|List])
      end if.

%start of a pair of (eg) brackets. (Nothing will be split until a close bracket is reached)
   mytokprivate2(First,Last,not_in_pair,[_S|List])=List1:-
      zz_previous_char_was_a_split:=false,
      ndb_dont_split(pair(First,_Close)),!,
      List1=mytokprivate(Last,within_pair,[""|List]).
   mytokprivate2(First,Last,not_in_pair,[_S|List])=List1:-
      zz_previous_char_was_a_split:=false,
      ndb_dont_split(pair_same_char(First)),!,
      List1=mytokprivate(Last,within_pair,[""|List]).

%continue building the token, char by char
   mytokprivate2(First,Last,not_in_pair,[S|List])=List1:-
      zz_previous_char_was_a_split:=false,!,
      List1=mytokprivate(Last,not_in_pair,[string::concat(S,First)|List]).

%currently with a pair of brackets, and find the close bracket
   mytokprivate2(First,Last,within_pair,List)=List1:-
      ndb_dont_split(pair(_,First)),!,
      List1=mytokprivate(Last,not_in_pair,List).
   mytokprivate2(First,Last,within_pair,List)=List1:-
      ndb_dont_split(pair_same_char(First)),!,
      List1=mytokprivate(Last,not_in_pair,List).

%continue building the long token withing brackets, char by char
   mytokprivate2(First,Last,within_pair,[S|List])=List1:-!,
      List1=mytokprivate(Last,within_pair,[string::concat(S,First)|List]).

%all failures returns the list
   mytokprivate2(_,_,_,L)=L:-!.
class predicates
   test:().
clauses
   test():-
      zz_splitby_items:=[" ",";"],
      assert(ndb_dont_split(pair("<",">"))),
      assert(ndb_dont_split(pair_same_char("\""))),
      S="he was ;;;on-call;<twice keeping> the \"underscores together\"",
      TOKS=mytokenise(S),
      stdio::write(TOKS).

With

      zz_splitby_items:=[" ",";"],
      assert(ndb_dont_split(pair("<",">"))),
      assert(ndb_dont_split(pair_same_char("\""))),

EXAMPLES

1) "he was on-call;twice" is split to
["he","was","on-call","twice"]

2) "he was ;;;on-call;twice keeping the under_score_s" is split to:
["he","was","on-call","twice","keeping","the","under_score_s"]

3) "he was ;;;on-call;<twice keeping> the \"underscores together\"" is split to:
["he","was","on-call","twice keeping","the","underscores together"]

But I'm sure the gurus at PDC could write it more nicely/powerfully/flexibly !

I attach the class,

Steve



mytokenise.zip
 Description:

Download
 Filename:  mytokenise.zip
 Filesize:  2.28 KB
 Downloaded:  842 Time(s)

Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    discuss.visual-prolog.com Forum Index -> Visual Prolog Tips & Samples All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum