Discussions related to Visual Prolog
Jason_6
Posts: 13
Joined: 4 Jan 2015 13:39

how to remove duplicates of facts ?

Unread post by Jason_6 »

1>
How to remove duplicates of [ facts ] instead of list?

Code: Select all

list::removeDuplicates/1-> removeDuplicates : (Elem* List)     -> Elem* Rest.

Code: Select all

list::sort/2-> sort : (     Elem* List,     sortOrder Order)     -> Elem* Sorted.
which are both not exactly what I want.

2>
Moreover,after removing duplicates of facts,
could you sort the facts ?

Could you recommend some proper function to settle these little problems?

Thank you all for pacient answering ,especially Thomas! :-)
In me the tiger sniffs the rose.
Martin Meyer
VIP Member
Posts: 328
Joined: 14 Nov 2002 0:01

Unread post by Martin Meyer »

Hi Jason,

the best way is probably to retract each fact from the fact database, inserting it in a list, sort resp. deduplicate the list and finally assert each element of the list back into the fact database. So, for example:

Code: Select all

class facts     exampleFact : (string).   class predicates     dedupExampleFactDb : (). clauses     dedupExampleFactDb() :-         List = [Strg || retract(exampleFact(Strg))],         UniqueList = list::removeDuplicates(List),         foreach Strg in UniqueList do             assert(exampleFact(Strg))         end foreach.
Deduplicating the facts without using a list is possible (but inefficient):

Code: Select all

class facts     exampleFact : (string).   class predicates     dedupExampleFactDb : (). clauses     dedupExampleFactDb() :-         retract(exampleFact(Strg)),         !,         dedupExampleFactDb(),         if not(exampleFact(Strg)) then             assert((exampleFact(Strg)))         end if.     dedupExampleFactDb().
Regards
Martin
Vitaly Markov
Active Member
Posts: 40
Joined: 30 Nov 2003 0:01

Unread post by Vitaly Markov »

For list:

Code: Select all

class facts - aaa fff:(string).   clauses fff("aa").  fff("aaa").  fff("b").  fff("aa").  fff("c").   run():- FactList = [fff(A) || retract(fff(A))],           UniqueList = list::removeDuplicates(FactList),           list::forAll(UniqueList, {(F):-assert(F),write(F),nl}).
Last edited by Vitaly Markov on 20 Jan 2015 19:11, edited 1 time in total.
Vitaly Markov
Active Member
Posts: 40
Joined: 30 Nov 2003 0:01

Unread post by Vitaly Markov »

DB only:

Code: Select all

class facts fff:(string). clauses fff("aa").  fff("aaa").  fff("b").  fff("aa").  fff("c").   run():- fff(S), retractAll(fff(S)),asserta(fff(S)), fail;           fff(S),write(S),nl,fail;                  % <- check of DB           _ = readchar().
Martin Meyer
VIP Member
Posts: 328
Joined: 14 Nov 2002 0:01

Unread post by Martin Meyer »

Yes, the deduplication can also be performed using retractAll:

Code: Select all

class facts     exampleFact : (string).   class predicates     dedupExampleFactDb : (). clauses     dedupExampleFactDb() :-         foreach exampleFact(S) do             retractAll(exampleFact(S)),             assertA(exampleFact(S))         end foreach.
The advantage over the no-list-solution, I proposed before, is, that it does not eat up stack in recursion. Disadvantage is however, that it is in the worst case even more inefficient than my no-list-solution:

Let us count each try to match a fact with a value as 1 step (matches with unbound variables shall not count). The worst case is, that the database consists of n facts having n distinct values (i.e. that there are no duplicates).

In my before proposed no-list-solution the line if not(exampleFact(Strg)) then is executed n times, whereby first time doing 0 steps, second time doing 1 step, etc. until n'th time doing n-1 steps. In total that are 0 + 1 + ... + n-1 = 0.5*n^2 - 0.5*n steps.

In the retractAll-solution (in above code box) line retractAll(exampleFact(S)) is executed n times and does each time n steps (Thomas may correct me, if retractAll is implemented differently). So, in total that are n^2 steps.

In contrast the list::removeDuplicates predicate is implemented using a red-black tree (per package setM_redBlack) to keep track of the already seen values. Let's count now comparison of two values as 1 step. Lookup and insert in a red-black tree having m nodes is running in O(log m) steps. Hence the removeDuplicates predicate runs on a list with n elements in O(log 1 + log 2 + ... + log n) = O(n log n) steps. The transfer of the values from database to list and back runs in O(n) and thus does not increase the cost in terms of big O. So the fact-deduplication using list::removeDuplicates runs in only O(n log n) steps, while the both solutions without lists take O(n^2) steps.

Regards
Martin
Jason_6
Posts: 13
Joined: 4 Jan 2015 13:39

Millions of thanks!!

Unread post by Jason_6 »

Millions of thanks!!
Thank Martin and Markov again for your wonderful solutions in difrrent views, which help me a lot! 8-) :wink:

Of course,other ideas are always welcome!
In me the tiger sniffs the rose.
Vitaly Markov
Active Member
Posts: 40
Joined: 30 Nov 2003 0:01

Unread post by Vitaly Markov »

For line if not(exampleFact(Strg)) then O(n) = (n^2-n)/2.

For line retractAll(exampleFact(S)) O(m,r) = n^2/r - n(m+1)/2.
It is better for r>1.
Here: n - total quantity of facts, m - quantity of uniq facts, r - quantity of occurrences of each fact. Therefore n = mr.
Short explanation for retractAll:
n + (n-r) + (n-2r) + ... - m times.
Hence:
nm - r(m^2+m)/2 = n^2/r - n(m+1)/2.

However use removeDublicates certainly is the best.
Martin Meyer
VIP Member
Posts: 328
Joined: 14 Nov 2002 0:01

Unread post by Martin Meyer »

I agree to your conclusion: The higher the quota of duplicates on all facts is, the better performs the retractAll-solution.

When counting the steps for the retractAll-solution in the case you mention (n - total quantity of facts, m - quantity of unique facts, r - quantity of occurrences of each fact) I get a little different result:

Code: Select all

    n  +  n-r  +  n-2*r  +  ...  +  n-(m-1)*r =  n  +  n-r  +  n-2*r  +  ...  +  n-m*r+r =  n  +  n-r  +  n-2*r  +  ...  +  n-n+r =  n  +  n-r  +  n-2*r  +  ...  +  r =  (n + r) * m/2 =  (n*m + r*m) / 2 =  (n*m + n) / 2 =  0.5*n*m  +  0.5*n
In the extreme case, that all n facts are equal (i.e. m=1, r=n), the retractAll-solution needs only n steps. But the retract-solution runs also in n steps in that case. The retractAll-solution should however be the faster one by a factor in praxis in that case, because it avoids recursive calls.

I suppose, in praxis there is a threshold for the quota of duplicates on all facts, so that, iff the threshold is exceeded, the retractAll-solution outperformes the retract-solution.

Many regards
Martin
Randy
Posts: 5
Joined: 17 Nov 2006 13:23

Math at it's Best

Unread post by Randy »

Gentlemen - I am awed by this international Math collaboration. Thanks for sharing.

Best Regards,
Randy
User avatar
Ferenc Nagy
VIP Member
Posts: 215
Joined: 24 Apr 2007 12:26

Check before first assert

Unread post by Ferenc Nagy »

Nice solutions.
But you'd rather check in advance in order to prevent duplicate assertion.

Code: Select all

facts    yourfact(sometype). predicates    assertOnlyIfNew(sometype F) procedure. clauses    assertOnlyIfNew(F) :-       yourfact(F),        !.     assertOnlyIfNew(F) :-         assert(yourfact(F)).      
TIA, Regards,
Frank Nagy
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls »

Interesting conversation.

However, I would definitely use a suitable collection instead.

A set automatically only contains one of each value; a red-black set will also put the elements in order (which was a side question).

Code: Select all

facts     data : setM{sometype) := serM_redBlack::new().   clauses     insert(Info) :- data:insert(Info).
The running time is O(N*log(R)) when inserting N values of which R are unique.
Regards Thomas Linder Puls
PDC
User avatar
Ferenc Nagy
VIP Member
Posts: 215
Joined: 24 Apr 2007 12:26

What does correspont to consult and savein case of collections?

Unread post by Ferenc Nagy »

The old fashioned database handling has a nice mode to read and write a block of facts: the consult and save statements.
I like the files generated by the save statement.
1) I could handle them from Excel as CSV files. [I wrote macros to delete the unnecessary parentheses and final periods and tabulating them.]
2) The new VIP versions are not so sensitive to whitespaces and final cr+lf+eof characters as the original Turbo Prolog. The new versions allow the addition comment lines, too.

What is the corresponding simplest way to fill in and write out a whole collection?
TIA, Regards,
Frank Nagy
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls »

In many cases the simplest solution is to keep facts for serialization, i.e.
  • Load: consult a fact database move the data to the collections.
  • Save: move data from collections to a fact database and save it.
As long as out "think in facts" such a strategy is simple, but once you begin to "think in collections" and make more complex collections you may find it more tricky to "invent" a fact structure that corresponds to a certain collection structure.

Facts can contain structured data like lists and functor structures, but in a certain sense they are "flat" like a relational database: The facts them selves are always on top level.

On the other hand collections can be nested deeply, and deeply structured data may be more difficult to "flatten" into a fact database.

It is the same with objects: they can represent very complex data structures, but it may be difficult "untangle" such data structures into something that can be serialized.
Regards Thomas Linder Puls
PDC
User avatar
Ferenc Nagy
VIP Member
Posts: 215
Joined: 24 Apr 2007 12:26

Keep i/o data in facts sections but store internal data in collections

Unread post by Ferenc Nagy »

Hi,
Do I summarize properly your advices with the sentence in the subject line?
Keep i/o data in facts sections but store internal data in collections.
Additionally:
I/O facts should "flat" i.e. they should not contain lists having unequal size [because of easy conversion from/to Excel tables and/or relational databases.)

The coexistence of facts and collections means no problem in case of small amount of data records.
TIA, Regards,
Frank Nagy
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Unread post by Thomas Linder Puls »

Well, I don't really think I was giving such general advices.

Fact databases have many good properties; save and consult are definitely two of them.

But the collections also have some very good properties. Especially, they significantly offer better performance (for many vital operations). Therefore a general advice is to learn to use collections.

I don't see any problem with having lists an even more complex data in facts, unless you need easy conversion to Excel or the like.
Regards Thomas Linder Puls
PDC
Post Reply