Page 1 of 1

Algorithm of string:: hash?

Posted: 29 Jun 2018 17:08
What algorithm is used by function string::hash?

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 7:30
Hello Vitaly,

May the next link will useful:
Hash Tables in Logic Programming

Greetings,
Gal Zsolt
(~ CalmoSoft ~)

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 13:55
Thanks.
But I want to compare computing complexity of my hash-function and library function string::hash.

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 19:01
Hi Vitaly,

There is

Code: Select all

``````implement string
open core, string_native
...
clauses
hash(String) = hash_native::hashValueString(String).

and in hash_native you can see
class hash_native
open core

predicates
hashValueString : (string String) -> unsigned HashValue
% @short Obtain a #HashValue from a #String.
% @end``````

So it is Windows function used. To know the real algorithm you should look at MSDN for hashValueString description.

Best,
Victor.

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 19:16
string::hash (~ hash_native::hashValueString) uses an incremental version of MurmurHash2.

It is not a Windows function (so MSDN will not help), it is implemented directly in C++ (by somebody on the net):

Code: Select all

``````//-----------------------------------------------------------------------------
// CMurmurHash2A, by Austin Appleby

// This is a sample implementation of MurmurHash2A designed to work
// incrementally.

// Usage -

// CMurmurHash2A hasher
// hasher.Begin(seed);
// ...
// hashValue_t hash = hasher.End()

#define mmix(h,k) { k *= m; k ^= k >> r; k *= m; h *= m; h ^= k; }
typedef unsigned int uint32_t;

class CMurmurHash2A
{
public:
static void * operator new(size_t Size){
return Kernel::AllocAtomicHeap(Size);
}

public:

CMurmurHash2A(hashValue_t seed = 0)
{
m_hash = seed;
m_tail = 0;
m_count = 0;
m_size = 0;
}

void Add(const unsigned char * data, int len)
{
m_size += len;

MixTail(data, len);

while (len >= 4) {
uint32_t k = *(uint32_t*)data;

mmix(m_hash, k);

data += 4;
len -= 4;
}

MixTail(data, len);
}

uint32_t End(void)
{
mmix(m_hash, m_tail);
mmix(m_hash, m_size);

m_hash ^= m_hash >> 13;
m_hash *= m;
m_hash ^= m_hash >> 15;

return m_hash;
}

private:

static const uint32_t m = 0x5bd1e995;
static const int r = 24;

void MixTail(const unsigned char * & data, int & len)
{
while (len && ((len<4) || m_count)) {
m_tail |= (*data++) << (m_count * 8);

m_count++;
len--;

if (m_count == 4) {
mmix(m_hash, m_tail);
m_tail = 0;
m_count = 0;
}
}
}

uint32_t m_hash;
uint32_t m_tail;
uint32_t m_count;
uint32_t m_size;
};``````

Re: Algorithm of string:: hash?

Posted: 1 Jul 2018 21:07
good ending!

Re: Algorithm of string:: hash?

Posted: 2 Jul 2018 21:26
Thomas, thanks for the valuable information!
My idea - to write the fast and simple hash-function focused on national texts (separately or together).

The cycle of the algorithm murmur contains 8 commands (for 32/64-bit):
3 mul + 1 add + 2 rol + 2 xor (according wiki)

The cycle of the my function contains 4 commands (for 16/32/64-bit) :
2 rol + 2 xor
However the function may has n-bit (n = 8,9..63,64,..) for a choice of the desirable size of the hash-table.
Each the n-bit command rol is emulated by a set of commands:
1 shr + 1 shl + 1 or + 1 and
Therefore the function contains 10 fast and simple commands (for n-bit):
2 shr + 2 shl + 2 or + 2 and + 2 xor

Efficiency of these functions is comparable.
Function murmur gives 2 collisions for the Russian lexicon (more than 125 thousand words)
My function gives 3 collisions, but it can be n-bit function.

I will improve function and investigate it for the English dictionary and any other.
I will inform later the code of function and results. If it is interesting :)

Re: Algorithm of string:: hash?

Posted: 2 Jul 2018 22:09
Yes, it sounds very interesting, so please keep us up-to-date on your results.

You should notice that we wanted an incremental function, where data can be hashed in chunks (i.e. so you can hash a large structure, which consists of several pieces of data which is not in consecutive memory (e.g. a list)). See the hash class/interface.