Discussions related to Visual Prolog
Vitaly Markov
Active Member
Posts: 40
Joined: 30 Nov 2003 0:01

Algorithm of string:: hash?

Unread post by Vitaly Markov »

What algorithm is used by function string::hash?
User avatar
CalmoSoft
VIP Member
Posts: 73
Joined: 17 Oct 2006 5:49

Re: Algorithm of string:: hash?

Unread post by CalmoSoft »

Hello Vitaly,

May the next link will useful:
Hash Tables in Logic Programming

Greetings,
Gal Zsolt
(~ CalmoSoft ~)
Vitaly Markov
Active Member
Posts: 40
Joined: 30 Nov 2003 0:01

Re: Algorithm of string:: hash?

Unread post by Vitaly Markov »

Thanks.
But I want to compare computing complexity of my hash-function and library function string::hash.
Victor Yukhtenko
Posts: 10
Joined: 3 May 2001 23:01

Re: Algorithm of string:: hash?

Unread post by Victor Yukhtenko »

Hi Vitaly,

Please look at string::hash(...) implementation.
There is

Code: Select all

implement string     open core, string_native ... clauses     hash(String) = hash_native::hashValueString(String).   and in hash_native you can see class hash_native     open core   predicates     hashValueString : (string String) -> unsigned HashValue         language apicall as runtimeLinkNamesRun::hashValueString.     % @short Obtain a #HashValue from a #String.     % @end


So it is Windows function used. To know the real algorithm you should look at MSDN for hashValueString description.

Best,
Victor.
Best Prolog
Victor Yukhtenko.
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Re: Algorithm of string:: hash?

Unread post by Thomas Linder Puls »

string::hash (~ hash_native::hashValueString) uses an incremental version of MurmurHash2.

It is not a Windows function (so MSDN will not help), it is implemented directly in C++ (by somebody on the net):

Code: Select all

//----------------------------------------------------------------------------- // CMurmurHash2A, by Austin Appleby   // This is a sample implementation of MurmurHash2A designed to work // incrementally.   // Usage -   // CMurmurHash2A hasher // hasher.Begin(seed); // hasher.Add(data1,size1); // hasher.Add(data2,size2); // ... // hasher.Add(dataN,sizeN); // hashValue_t hash = hasher.End()   #define mmix(h,k) { k *= m; k ^= k >> r; k *= m; h *= m; h ^= k; } typedef unsigned int uint32_t;   class CMurmurHash2A { public:     static void * operator new(size_t Size){         return Kernel::AllocAtomicHeap(Size);     }   public:       CMurmurHash2A(hashValue_t seed = 0)     {         m_hash = seed;         m_tail = 0;         m_count = 0;         m_size = 0;     }       void Add(const unsigned char * data, int len)     {         m_size += len;           MixTail(data, len);           while (len >= 4) {             uint32_t k = *(uint32_t*)data;               mmix(m_hash, k);               data += 4;             len -= 4;         }           MixTail(data, len);     }       uint32_t End(void)     {         mmix(m_hash, m_tail);         mmix(m_hash, m_size);           m_hash ^= m_hash >> 13;         m_hash *= m;         m_hash ^= m_hash >> 15;           return m_hash;     }   private:       static const uint32_t m = 0x5bd1e995;     static const int r = 24;       void MixTail(const unsigned char * & data, int & len)     {         while (len && ((len<4) || m_count)) {             m_tail |= (*data++) << (m_count * 8);               m_count++;             len--;               if (m_count == 4) {                 mmix(m_hash, m_tail);                 m_tail = 0;                 m_count = 0;             }         }     }       uint32_t m_hash;     uint32_t m_tail;     uint32_t m_count;     uint32_t m_size; };
Regards Thomas Linder Puls
PDC
Victor Yukhtenko
Posts: 10
Joined: 3 May 2001 23:01

Re: Algorithm of string:: hash?

Unread post by Victor Yukhtenko »

:-) good ending!
Best Prolog
Victor Yukhtenko.
Vitaly Markov
Active Member
Posts: 40
Joined: 30 Nov 2003 0:01

Re: Algorithm of string:: hash?

Unread post by Vitaly Markov »

Thomas, thanks for the valuable information!
My idea - to write the fast and simple hash-function focused on national texts (separately or together).

The cycle of the algorithm murmur contains 8 commands (for 32/64-bit):
3 mul + 1 add + 2 rol + 2 xor (according wiki)

The cycle of the my function contains 4 commands (for 16/32/64-bit) :
2 rol + 2 xor
However the function may has n-bit (n = 8,9..63,64,..) for a choice of the desirable size of the hash-table.
Each the n-bit command rol is emulated by a set of commands:
1 shr + 1 shl + 1 or + 1 and
Therefore the function contains 10 fast and simple commands (for n-bit):
2 shr + 2 shl + 2 or + 2 and + 2 xor

Efficiency of these functions is comparable.
Function murmur gives 2 collisions for the Russian lexicon (more than 125 thousand words)
My function gives 3 collisions, but it can be n-bit function.

I will improve function and investigate it for the English dictionary and any other.
I will inform later the code of function and results. If it is interesting :)
User avatar
Thomas Linder Puls
VIP Member
Posts: 1398
Joined: 28 Feb 2000 0:01

Re: Algorithm of string:: hash?

Unread post by Thomas Linder Puls »

Yes, it sounds very interesting, so please keep us up-to-date on your results.

You should notice that we wanted an incremental function, where data can be hashed in chunks (i.e. so you can hash a large structure, which consists of several pieces of data which is not in consecutive memory (e.g. a list)). See the hash class/interface.
Regards Thomas Linder Puls
PDC
Post Reply