Algorithm of string:: hash?

Discussions related to Visual Prolog
Post Reply
Vitaly Markov
Active Member
Posts: 67
Joined: 30 Nov 2003 0:01

Algorithm of string:: hash?

Post by Vitaly Markov » 29 Jun 2018 17:08

What algorithm is used by function string::hash?

User avatar
CalmoSoft
Active Member
Posts: 39
Joined: 17 Oct 2006 5:49

Re: Algorithm of string:: hash?

Post by CalmoSoft » 1 Jul 2018 7:30

Hello Vitaly,

May the next link will useful:
Hash Tables in Logic Programming

Greetings,
Gal Zsolt
(~ CalmoSoft ~)
There are two types of programmers :
the one who already uses Ring
or Visual Prolog and the others
who are going to use them some time.
(~ CalmoSoft ~)
calmosoft@gmail.com

Vitaly Markov
Active Member
Posts: 67
Joined: 30 Nov 2003 0:01

Re: Algorithm of string:: hash?

Post by Vitaly Markov » 1 Jul 2018 13:55

Thanks.
But I want to compare computing complexity of my hash-function and library function string::hash.

Victor Yukhtenko
Posts: 19
Joined: 3 May 2001 23:01

Re: Algorithm of string:: hash?

Post by Victor Yukhtenko » 1 Jul 2018 19:01

Hi Vitaly,

Please look at string::hash(...) implementation.
There is

implement string
open core, string_native
...
clauses
hash(String) = hash_native::hashValueString(String).

and in hash_native you can see
class hash_native
open core

predicates
hashValueString : (string String) -> unsigned HashValue language apicall as runtimeLinkNamesRun::hashValueString.
% @short Obtain a #HashValue from a #String.
% @end

So it is Windows function used. To know the real algorithm you should look at MSDN for hashValueString description.

Best,
Victor.
PDC-SPb
Victor Yukhtenko.

User avatar
Thomas Linder Puls
VIP Member
Posts: 2333
Joined: 28 Feb 2000 0:01

Re: Algorithm of string:: hash?

Post by Thomas Linder Puls » 1 Jul 2018 19:16

string::hash (~ hash_native::hashValueString) uses an incremental version of MurmurHash2.

It is not a Windows function (so MSDN will not help), it is implemented directly in C++ (by somebody on the net):

Code: Select all

//----------------------------------------------------------------------------- // CMurmurHash2A, by Austin Appleby   // This is a sample implementation of MurmurHash2A designed to work // incrementally.   // Usage -   // CMurmurHash2A hasher // hasher.Begin(seed); // hasher.Add(data1,size1); // hasher.Add(data2,size2); // ... // hasher.Add(dataN,sizeN); // hashValue_t hash = hasher.End()   #define mmix(h,k) { k *= m; k ^= k >> r; k *= m; h *= m; h ^= k; } typedef unsigned int uint32_t;   class CMurmurHash2A { public:     static void * operator new(size_t Size){         return Kernel::AllocAtomicHeap(Size);     }   public:       CMurmurHash2A(hashValue_t seed = 0)     {         m_hash = seed;         m_tail = 0;         m_count = 0;         m_size = 0;     }       void Add(const unsigned char * data, int len)     {         m_size += len;           MixTail(data, len);           while (len >= 4) {             uint32_t k = *(uint32_t*)data;               mmix(m_hash, k);               data += 4;             len -= 4;         }           MixTail(data, len);     }       uint32_t End(void)     {         mmix(m_hash, m_tail);         mmix(m_hash, m_size);           m_hash ^= m_hash >> 13;         m_hash *= m;         m_hash ^= m_hash >> 15;           return m_hash;     }   private:       static const uint32_t m = 0x5bd1e995;     static const int r = 24;       void MixTail(const unsigned char * & data, int & len)     {         while (len && ((len<4) || m_count)) {             m_tail |= (*data++) << (m_count * 8);               m_count++;             len--;               if (m_count == 4) {                 mmix(m_hash, m_tail);                 m_tail = 0;                 m_count = 0;             }         }     }       uint32_t m_hash;     uint32_t m_tail;     uint32_t m_count;     uint32_t m_size; };
Regards Thomas Linder Puls
PDC

Victor Yukhtenko
Posts: 19
Joined: 3 May 2001 23:01

Re: Algorithm of string:: hash?

Post by Victor Yukhtenko » 1 Jul 2018 21:07

:-) good ending!
PDC-SPb
Victor Yukhtenko.

Vitaly Markov
Active Member
Posts: 67
Joined: 30 Nov 2003 0:01

Re: Algorithm of string:: hash?

Post by Vitaly Markov » 2 Jul 2018 21:26

Thomas, thanks for the valuable information!
My idea - to write the fast and simple hash-function focused on national texts (separately or together).

The cycle of the algorithm murmur contains 8 commands (for 32/64-bit):
3 mul + 1 add + 2 rol + 2 xor (according wiki)

The cycle of the my function contains 4 commands (for 16/32/64-bit) :
2 rol + 2 xor
However the function may has n-bit (n = 8,9..63,64,..) for a choice of the desirable size of the hash-table.
Each the n-bit command rol is emulated by a set of commands:
1 shr + 1 shl + 1 or + 1 and
Therefore the function contains 10 fast and simple commands (for n-bit):
2 shr + 2 shl + 2 or + 2 and + 2 xor

Efficiency of these functions is comparable.
Function murmur gives 2 collisions for the Russian lexicon (more than 125 thousand words)
My function gives 3 collisions, but it can be n-bit function.

I will improve function and investigate it for the English dictionary and any other.
I will inform later the code of function and results. If it is interesting :)

User avatar
Thomas Linder Puls
VIP Member
Posts: 2333
Joined: 28 Feb 2000 0:01

Re: Algorithm of string:: hash?

Post by Thomas Linder Puls » 2 Jul 2018 22:09

Yes, it sounds very interesting, so please keep us up-to-date on your results.

You should notice that we wanted an incremental function, where data can be hashed in chunks (i.e. so you can hash a large structure, which consists of several pieces of data which is not in consecutive memory (e.g. a list)). See the hash class/interface.
Regards Thomas Linder Puls
PDC

Post Reply