discuss.visual-prolog.com

Posted: **29 Jun 2018 17:08**

What algorithm is used by function string::hash?

Posted: **1 Jul 2018 7:30**

Hello Vitaly,

May the next link will useful:
Hash Tables in Logic Programming

Greetings,
Gal Zsolt
(~ CalmoSoft ~)

Posted: **1 Jul 2018 13:55**

Thanks.
But I want to compare computing complexity of my hash-function and library function string::hash.

Posted: **1 Jul 2018 19:01**

Hi Vitaly,

Please look at string::hash(...) implementation.
There is

Code: Select all

implement string
    open core, string_native
...
clauses
    hash(String) = hash_native::hashValueString(String).
 
and in hash_native you can see
class hash_native
    open core
 
predicates
    hashValueString : (string String) -> unsigned HashValue
        language apicall as runtimeLinkNamesRun::hashValueString.
    % @short Obtain a #HashValue from a #String.
    % @end

So it is Windows function used. To know the real algorithm you should look at MSDN for hashValueString description.

Best,
Victor.

Posted: **1 Jul 2018 19:16**

string::hash (~ hash_native::hashValueString) uses an incremental version of MurmurHash2.

It is not a Windows function (so MSDN will not help), it is implemented directly in C++ (by somebody on the net):

Code: Select all

//-----------------------------------------------------------------------------
// CMurmurHash2A, by Austin Appleby
 
// This is a sample implementation of MurmurHash2A designed to work 
// incrementally.
 
// Usage - 
 
// CMurmurHash2A hasher
// hasher.Begin(seed);
// hasher.Add(data1,size1);
// hasher.Add(data2,size2);
// ...
// hasher.Add(dataN,sizeN);
// hashValue_t hash = hasher.End()
 
#define mmix(h,k) { k *= m; k ^= k >> r; k *= m; h *= m; h ^= k; }
typedef unsigned int uint32_t;
 
class CMurmurHash2A
{
public:
    static void * operator new(size_t Size){
        return Kernel::AllocAtomicHeap(Size);
    }
 
public:
 
    CMurmurHash2A(hashValue_t seed = 0)
    {
        m_hash = seed;
        m_tail = 0;
        m_count = 0;
        m_size = 0;
    }
 
    void Add(const unsigned char * data, int len)
    {
        m_size += len;
 
        MixTail(data, len);
 
        while (len >= 4) {
            uint32_t k = *(uint32_t*)data;
 
            mmix(m_hash, k);
 
            data += 4;
            len -= 4;
        }
 
        MixTail(data, len);
    }
 
    uint32_t End(void)
    {
        mmix(m_hash, m_tail);
        mmix(m_hash, m_size);
 
        m_hash ^= m_hash >> 13;
        m_hash *= m;
        m_hash ^= m_hash >> 15;
 
        return m_hash;
    }
 
private:
 
    static const uint32_t m = 0x5bd1e995;
    static const int r = 24;
 
    void MixTail(const unsigned char * & data, int & len)
    {
        while (len && ((len<4) || m_count)) {
            m_tail |= (*data++) << (m_count * 8);
 
            m_count++;
            len--;
 
            if (m_count == 4) {
                mmix(m_hash, m_tail);
                m_tail = 0;
                m_count = 0;
            }
        }
    }
 
    uint32_t m_hash;
    uint32_t m_tail;
    uint32_t m_count;
    uint32_t m_size;
};

Posted: **1 Jul 2018 21:07**

good ending!

Posted: **2 Jul 2018 21:26**

Thomas, thanks for the valuable information!
My idea - to write the fast and simple hash-function focused on national texts (separately or together).

The cycle of the algorithm murmur contains 8 commands (for 32/64-bit):
3 mul + 1 add + 2 rol + 2 xor (according wiki)

The cycle of the my function contains 4 commands (for 16/32/64-bit) :
2 rol + 2 xor
However the function may has n-bit (n = 8,9..63,64,..) for a choice of the desirable size of the hash-table.
Each the n-bit command rol is emulated by a set of commands:
1 shr + 1 shl + 1 or + 1 and
Therefore the function contains 10 fast and simple commands (for n-bit):
2 shr + 2 shl + 2 or + 2 and + 2 xor

Efficiency of these functions is comparable.
Function murmur gives 2 collisions for the Russian lexicon (more than 125 thousand words)
My function gives 3 collisions, but it can be n-bit function.

I will improve function and investigate it for the English dictionary and any other.
I will inform later the code of function and results. If it is interesting :)

Posted: **2 Jul 2018 22:09**

Yes, it sounds very interesting, so please keep us up-to-date on your results.

You should notice that we wanted an incremental function, where data can be hashed in chunks (i.e. so you can hash a large structure, which consists of several pieces of data which is not in consecutive memory (e.g. a list)). See the hash class/interface.

discuss.visual-prolog.com

Algorithm of string:: hash?

Algorithm of string:: hash?

Re: Algorithm of string:: hash?

Re: Algorithm of string:: hash?

Re: Algorithm of string:: hash?

Re: Algorithm of string:: hash?

Re: Algorithm of string:: hash?

Re: Algorithm of string:: hash?

Re: Algorithm of string:: hash?