Page 1 of 1

Algorithm of string:: hash?

Posted: 29 Jun 2018 17:08
by Vitaly Markov
What algorithm is used by function string::hash?

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 7:30
by CalmoSoft
Hello Vitaly,

May the next link will useful:
Hash Tables in Logic Programming

Greetings,
Gal Zsolt
(~ CalmoSoft ~)

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 13:55
by Vitaly Markov
Thanks.
But I want to compare computing complexity of my hash-function and library function string::hash.

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 19:01
by Victor Yukhtenko
Hi Vitaly,

Please look at string::hash(...) implementation.
There is

Code: Select all

implement string     open core, string_native ... clauses     hash(String) = hash_native::hashValueString(String).   and in hash_native you can see class hash_native     open core   predicates     hashValueString : (string String) -> unsigned HashValue         language apicall as runtimeLinkNamesRun::hashValueString.     % @short Obtain a #HashValue from a #String.     % @end


So it is Windows function used. To know the real algorithm you should look at MSDN for hashValueString description.

Best,
Victor.

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 19:16
by Thomas Linder Puls
string::hash (~ hash_native::hashValueString) uses an incremental version of MurmurHash2.

It is not a Windows function (so MSDN will not help), it is implemented directly in C++ (by somebody on the net):

Code: Select all

//----------------------------------------------------------------------------- // CMurmurHash2A, by Austin Appleby   // This is a sample implementation of MurmurHash2A designed to work // incrementally.   // Usage -   // CMurmurHash2A hasher // hasher.Begin(seed); // hasher.Add(data1,size1); // hasher.Add(data2,size2); // ... // hasher.Add(dataN,sizeN); // hashValue_t hash = hasher.End()   #define mmix(h,k) { k *= m; k ^= k >> r; k *= m; h *= m; h ^= k; } typedef unsigned int uint32_t;   class CMurmurHash2A { public:     static void * operator new(size_t Size){         return Kernel::AllocAtomicHeap(Size);     }   public:       CMurmurHash2A(hashValue_t seed = 0)     {         m_hash = seed;         m_tail = 0;         m_count = 0;         m_size = 0;     }       void Add(const unsigned char * data, int len)     {         m_size += len;           MixTail(data, len);           while (len >= 4) {             uint32_t k = *(uint32_t*)data;               mmix(m_hash, k);               data += 4;             len -= 4;         }           MixTail(data, len);     }       uint32_t End(void)     {         mmix(m_hash, m_tail);         mmix(m_hash, m_size);           m_hash ^= m_hash >> 13;         m_hash *= m;         m_hash ^= m_hash >> 15;           return m_hash;     }   private:       static const uint32_t m = 0x5bd1e995;     static const int r = 24;       void MixTail(const unsigned char * & data, int & len)     {         while (len && ((len<4) || m_count)) {             m_tail |= (*data++) << (m_count * 8);               m_count++;             len--;               if (m_count == 4) {                 mmix(m_hash, m_tail);                 m_tail = 0;                 m_count = 0;             }         }     }       uint32_t m_hash;     uint32_t m_tail;     uint32_t m_count;     uint32_t m_size; };

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 21:07
by Victor Yukhtenko
:-) good ending!

Re: Algorithm of string:: hash?

Posted: 2 Jul 2018 21:26
by Vitaly Markov
Thomas, thanks for the valuable information!
My idea - to write the fast and simple hash-function focused on national texts (separately or together).

The cycle of the algorithm murmur contains 8 commands (for 32/64-bit):
3 mul + 1 add + 2 rol + 2 xor (according wiki)

The cycle of the my function contains 4 commands (for 16/32/64-bit) :
2 rol + 2 xor
However the function may has n-bit (n = 8,9..63,64,..) for a choice of the desirable size of the hash-table.
Each the n-bit command rol is emulated by a set of commands:
1 shr + 1 shl + 1 or + 1 and
Therefore the function contains 10 fast and simple commands (for n-bit):
2 shr + 2 shl + 2 or + 2 and + 2 xor

Efficiency of these functions is comparable.
Function murmur gives 2 collisions for the Russian lexicon (more than 125 thousand words)
My function gives 3 collisions, but it can be n-bit function.

I will improve function and investigate it for the English dictionary and any other.
I will inform later the code of function and results. If it is interesting :)

Re: Algorithm of string:: hash?

Posted: 2 Jul 2018 22:09
by Thomas Linder Puls
Yes, it sounds very interesting, so please keep us up-to-date on your results.

You should notice that we wanted an incremental function, where data can be hashed in chunks (i.e. so you can hash a large structure, which consists of several pieces of data which is not in consecutive memory (e.g. a list)). See the hash class/interface.