Making collections threadsafe

Martin Meyer · Post by **Martin Meyer** » 26 Dec 2024 22:39

Hello Thomas,

I think I have understood how to use the compareAndSwap class. It was quite easy to make my version of class setM_redBlack threadsafe. The version is attached (altSetM_redBlack_cas). Interesting about it might be that it employs algorithms for bulk operations on red-black trees. Integrating the bulk algorithms into the pfc could be beneficial, they speed up processing large collections.

Making class setM_redBlack threadsafe is easy because it is based on a persistent algebraic construct, i.e. redBlackSet::set{Type}. However, making arrayM threadsafe seems more complicated because it internally does destructive changes.

Are the get and set predicates of arrayM threadsafe already? I assume it is only necessary to block parallel access to the array while it resizes. Do you think it could work as I did it in attached project (arrayM_block [1])?

Predicates getAll_nd and the like will not give point-in-time consistent results, because array data can change while the nondeterministic enumeration is running. But this can be considered a feature not a bug.

In predicate set_site, a lock must be set before the value of fact size is read in order to prevent another thread from resizing the array and changing the value of the fact in parallel. A shared lock suffices for this if the array does not need to be resized. However, if the array needs to be resized, an exclusive lock is required. The crux is that it is not known whether resizing is necessary before the value of fact size has been read. Thus, for all cases, an exclusive lock must be issued in the beginning of predicate set_site.

I have tried to set a shared lock first and afterwards, if necessary, additionally to apply an exclusive lock. This makes sense to me, since both locks are set in the same thread. But, as I have discovered, this is actually not possible; trying to set the later lock leads to waiting ad infinitum. Is there a reason for this behavior that I have not figured out yet, or could the behavior be changed, so that applying an exclusive lock after a shared lock will be possible?

I have noticed that the debugger has a problem stepping through the code after locks have been set. Stepping through my example reactions of the debuger are becomming slow in arrayM_block.pro after line 201 (lock:enter()). Before reaching line 228 (resize_noLock(size + Delta)) the disassembly window pops up and the debugger is hanging.

A happy new year 2025 to you and my fellow VIP users!

Post by **Thomas Linder Puls** » 9 Jan 2025 13:27

I agree that it would be a good idea to incorporate the bulk operations on sets and maps, so one day I think we will do that.

You should notice that everything here only makes certain operations on the data structures threadsafe. But in a specific concurrent/multithreaded/... program the operations you really need to be threadsafe may very often be some more higher-level operations.

Just assume that you have a map of counters (e.g. map{string Key, unsigned Counter}). Our threadsafe red-black tree maps provides threadsafe get and set, but to deal with a counter you actually need a threadsafe increment operation. Getting the value and then setting it as individual operations will not be sufficient, you will need a threadsafe combination of get, add and set.

So in general it doesn't really make sense to say that the data structure is threadsafe, it is only certain operations that are threadsafe.

However, it does make sense to ensure that the data structure always stays sound no matter which operations that are performed concurrently. Moreover when several threads are making updates all the updates should be made to the data structure (in some order), it may not be the case that some operations are lost (or duplicated). This provides some level of thread safety for the data structure. Whether or not the provided threadsafety is sufficient/suitable for your problem is another matter.

Due to the underlying "persistent" nature of the red black trees such a level of thread safety is (as you noticed) easily obtained with relatively little overhead. getAll_nd, upFrom_nd, etc. are sound in the respect that they return the elements in the set/map at a certain point in time (even though you may "stretch" the retrieval over additional changes of the original set/map).

Regarding arrays. The set and get operations on a pure memory array containing numbers or pointers are threadsafe by nature (and our arrays only stores such values). getAll_nd and upFrom_nd, etc. Will work as a sequence individual gets that are interleaved with updates. So with respect to those operations arrays have a natural threadsafety. Which again may or may not suit your needs.

But the grow/shrink operations on arrays are more problematic. These operations copies the contents of one memory array into another and then installs the memory array as the arrayM, updates made during this copying may be lost (i.e. if the update is made to the part that has already been copied). So to protect against operation loss the set and resize operations must be synchronized.

Your implementation does that. But I think the behavior of downFrom_nd in combination with a shrink operation is rather dubious. I doubt that this "feature" is ever the one you wished for.

If I were to use an arrayM in a multithreaded context (which would be strange, because I never use arrays at all). I believe, I would tailor the synchronization to the problem rather than the data structure. I.e., I would probably invent a "data storage/state" with synchronized higher-level operations fitted to the problem to solve and then have unprotected arrayMs and other unprotected stuff inside that data storage/state module.

Finally, (to trigger you more

) I will mention Communicating Sequential Processes (CSP) as a multithreading principle. Here the idea is that a certain thread (Process) "owns" some unsynchronized data, and any access and updates to that data is made by the owner thread. Hence the data is only accesses and updated in a single-threaded (Sequential) context. Other threads will have to Communicate with the owner thread to utilize the data owned by the thread. So all in all, you implement a set of Communicating Sequential Processes. There exist several variants of this in "true" CSP communication is made through Channels, Erlang uses ports, other systems uses message/"mail" boxes/queues and Ada have a so called rendezvous mechanism. But the overall principle of data-ownership and communication between dedicated threads is the same. Out there in the cloud these principles are used in services and micro-services.

Re: Making collections threadsafe