"Update" bulk operation

Martin Meyer · Post by **Martin Meyer** » 8 Mar 2025 0:01

Hello Thomas,

this code piece serves a usual purpose; it checks whether Key is present in Tree_0, and if so, then it updates the data of Key:

Code: Select all

    if _Data = radixTree::lookup(Tree_0, Key) then
        Tree_1 = radixTree::insert({ (A, B) = A + B }, Tree_0, Key, 1)
    end if,

How to do the same in a single bulk operation? That is, given trees TreeA and TreeB compute a tree TreeAB such that TreeAB is the same as TreeA besides that the data of keys which are also in TreeB is updated by combining the data from TreeA and TreeB.

Class radixTree contains bulk operations merge, intersection, difference, and xor. A merge does not work here because the merge would also insert the keys (incl. data) from TreeB, which are not in TreeA, into TreeAB. The other bulk operations cannot solve the task either.

So it could make sense to introduce another bulk operation. The attached example gives a usecase of that bulk operation. I have simply named the operation update (I am but not completely happy with the name as it does not indicate that it is a set-operation).

The example outputs the results of some calls to predicates in class lstTrieSet. The class is a variant of class trieSet with a few additional predicates. Predicates delete and difference of class lstTrieSet are based on the update bulk operation for red-black trees.

Essentially my example just consists of moddings to your pfc code. I have left your copyright header in all files. Most probably, when you one day incorporate the bulk operations in red–black trees, you will do it better than me. Nevertheless, I hope that a look on the example's code can be helpful in somehow.

Post by **Thomas Linder Puls** » 18 Mar 2025 12:22

Hi, Martin. I have downloaded your example, but so many things was in the way. And I have not looked at i yet.

It seems that the operation you propose is a merge of the intersection:

Code: Select all

class predicates
    intersectionMerge : (radixTree::combine{Elem} Combine, 
        radixTree::dictionary{Elem} DictA, radixTree::dictionary{Elem} DictB)
        -> radixTree::dictionary{Elem} DictAB.
clauses
    intersectionMerge(Combine, DictA, DictB) = DictAB :-
        IntersectAB = radixTree::intersection({ (_, B) = B }, DictA, DictB),
        DictAB = radixTree::merge(Combine, DictA, IntersectAB).

A cannot say if such an operation is sufficiently useful to be a "first class" operation, but if so it should have a good name. And I am for example in doubt of whether intersectionMerge is better or worse than mergeIntersection.

Martin Meyer · Post by **Martin Meyer** » 19 Mar 2025 13:31

Yes, the operation could be implemented by an intersection and a merge.

Criticism of the name “intersectionMerge” (even though I do not know a better name):

The operation is not symmetric, but the name “intersectionMerge” gives no clue about the order of the arguments.
Viewing the operation as merging the result of an intersection seems complicated. If you see it as an analog to an SQL update statement, the idea is simpler.
“intersectionMerge” suggests that the predicate actually performs an intersection and a merge. However, the purpose of the predicate is exactly the opposite, namely to speed things up by doing all in one run.

But do not throw away a useful predicate just because its name is not perfect.

Post by **Thomas Linder Puls** » 20 Mar 2025 10:08

I can't find update in the example (only on your red black trees).

While looking at your code I was reminded that I have long ago decided that bulk operations are very problematic for our main collection types. Their performance is obviously appealing, but there is a (in my mind) huge problem concerning the comparators.

Our trees (including the radix trees) are built with a certain comparator. And the bulk operations only makes sense for trees built with the same comparator(s). But the type system does not capture this: 'set{string}' is a set of strings but we don't know which comparator is based on. So if we have two 'set{string}' we don't know whether or not 'union(A, B)' makes sense or is nonsense.

At this moment I cannot think of a satisfying way to deal with this problem, but I will consider it more. Mean while you will have to keep your bulk operations in your own code. But thank you for the effort anyway, and when/if I come up with a way to deal with this problem I will certainly revisit your code.

Martin Meyer · Post by **Martin Meyer** » 20 Mar 2025 19:29

The example is intended to be a use case for this update predicate in red-black trees: predicates lstTrieSet::delete and lstTrieSet::difference use redBlackTreeAddOn::update.

But you are right, class lstTrieSet should also get an update predicate - I will add one. Maybe "bulkUpdate" is a better name for the predicate, or "massUpdate", or "setUpdate", "updateAll" ...hmm

I have made an extreme amount of mistakes already in coding my programs, but I cannot remember ever doing a bulk operation with collections having different comparators. I assume this mistake is generally rare.

To fix the comparator problem, you would probably need to introduce a check for intensional equality of predicates, including equality of the captured values (if they are anonymous predicates). If this is feasible at all, I suspect this requires a lot of complicated work.

I think you wanted to type “Our trees (EXcluding the radix trees) are ...”, since the radix trees use bit manipulations instead of key comparisons.

Post by **Thomas Linder Puls** » 21 Mar 2025 10:52

You are right about radixTrees (it was trie's that was in my mind wne I wrote radix trees).

Regarding the likelihood of problems coming from different comparators: You will have to consider that you write all your code, but that we have projects with millions of lines of code and several programmers. In such a context you may obtain set's, map's, trie's etc where you know nothing at all about the comparator.

Finally, you cannot test intentional equality of comparators: that problem is undecidable. Basically meaning that any validator will run forever on some tests.

In any case an attempt to test intentional equality is very likely to ruin the complexity of your algorithms

You can test some kind of "identity", but it will give a runtime error. There is nothing you can do on compile time. (That would require a type system with some kind of dependent types, and that is very unlikely to be introduced).

Martin Meyer · Post by **Martin Meyer** » 21 Mar 2025 16:36

Do you mean that testing intentional equality of comparators is not only undecidable at compile time but also at runtime? If yes, please give me a clue why.

I made a test:

Code: Select all

    run() :-
        hasDomain(comparator{string}, ComparatorA),
        hasDomain(comparator{string}, ComparatorB),
        ComparatorA = { (A, B) = compare(A, B) },
        ComparatorB = { (X, Y) = compare(X, Y) },
        if ComparatorA = ComparatorB then
            stdIO::write("yes")
        else
            stdIO::write("no")
        end if.

Output of the test is "no". I suppose, currently VIP only compares addresses of the executable code.

But in principle it should be possible to unify the parse trees of the two comparators at compile time. If comparators capture variables from context, their values can be compared at runtime.

Martin Meyer · Post by **Martin Meyer** » 23 Mar 2025 14:21

The terms “intensional” vs. “extensional” equality can be misleading. I had forgotten which term stands for which concept. When I wrote to you, I looked it up. In section "Equality of functions" of First-class function it says in short: equality of code is intensional equality, producing same outputs on same inputs is extensional equality.

Post by **Thomas Linder Puls** » 24 Mar 2025 10:46

Sorry, I think it was me that mixed it up. Extensional equality is undecidable.

In the code snippet you present we could investigate the parse trees. But in real tree code the parse trees are not available in the place where the comparison should take place (for example here in an updated version of your code):

Code: Select all

clauses
    merge(trie(CompareA, TrieRepA), trie(CompareB, TrieRepB)) = Merge :-
        mustBeSameComparator(CompareA, CompareB),
        Merge = trie(CompareA, merge_rep(CompareA, TrieRepA, TrieRepB)).

Re: "Update" bulk operation

Re: "Update" bulk operation

Re: "Update" bulk operation

Re: "Update" bulk operation

Re: "Update" bulk operation

Re: "Update" bulk operation

Re: "Update" bulk operation

Re: "Update" bulk operation