-
- Active Member
- Posts: 40
- Joined: 30 Nov 2003 0:01
Algorithm of string:: hash?
What algorithm is used by function string::hash?
Re: Algorithm of string:: hash?
Hello Vitaly,
May the next link will useful:
Hash Tables in Logic Programming
Greetings,
Gal Zsolt
(~ CalmoSoft ~)
May the next link will useful:
Hash Tables in Logic Programming
Greetings,
Gal Zsolt
(~ CalmoSoft ~)
-
- Active Member
- Posts: 40
- Joined: 30 Nov 2003 0:01
Re: Algorithm of string:: hash?
Thanks.
But I want to compare computing complexity of my hash-function and library function string::hash.
But I want to compare computing complexity of my hash-function and library function string::hash.
-
- Posts: 10
- Joined: 3 May 2001 23:01
Re: Algorithm of string:: hash?
Hi Vitaly,
Please look at string::hash(...) implementation.
There is
So it is Windows function used. To know the real algorithm you should look at MSDN for hashValueString description.
Best,
Victor.
Please look at string::hash(...) implementation.
There is
Code: Select all
implement string
open core, string_native
...
clauses
hash(String) = hash_native::hashValueString(String).
and in hash_native you can see
class hash_native
open core
predicates
hashValueString : (string String) -> unsigned HashValue
language apicall as runtimeLinkNamesRun::hashValueString.
% @short Obtain a #HashValue from a #String.
% @end
So it is Windows function used. To know the real algorithm you should look at MSDN for hashValueString description.
Best,
Victor.
Best Prolog
Victor Yukhtenko.
Victor Yukhtenko.
- Thomas Linder Puls
- VIP Member
- Posts: 1424
- Joined: 28 Feb 2000 0:01
Re: Algorithm of string:: hash?
string::hash (~ hash_native::hashValueString) uses an incremental version of MurmurHash2.
It is not a Windows function (so MSDN will not help), it is implemented directly in C++ (by somebody on the net):
It is not a Windows function (so MSDN will not help), it is implemented directly in C++ (by somebody on the net):
Code: Select all
//-----------------------------------------------------------------------------
// CMurmurHash2A, by Austin Appleby
// This is a sample implementation of MurmurHash2A designed to work
// incrementally.
// Usage -
// CMurmurHash2A hasher
// hasher.Begin(seed);
// hasher.Add(data1,size1);
// hasher.Add(data2,size2);
// ...
// hasher.Add(dataN,sizeN);
// hashValue_t hash = hasher.End()
#define mmix(h,k) { k *= m; k ^= k >> r; k *= m; h *= m; h ^= k; }
typedef unsigned int uint32_t;
class CMurmurHash2A
{
public:
static void * operator new(size_t Size){
return Kernel::AllocAtomicHeap(Size);
}
public:
CMurmurHash2A(hashValue_t seed = 0)
{
m_hash = seed;
m_tail = 0;
m_count = 0;
m_size = 0;
}
void Add(const unsigned char * data, int len)
{
m_size += len;
MixTail(data, len);
while (len >= 4) {
uint32_t k = *(uint32_t*)data;
mmix(m_hash, k);
data += 4;
len -= 4;
}
MixTail(data, len);
}
uint32_t End(void)
{
mmix(m_hash, m_tail);
mmix(m_hash, m_size);
m_hash ^= m_hash >> 13;
m_hash *= m;
m_hash ^= m_hash >> 15;
return m_hash;
}
private:
static const uint32_t m = 0x5bd1e995;
static const int r = 24;
void MixTail(const unsigned char * & data, int & len)
{
while (len && ((len<4) || m_count)) {
m_tail |= (*data++) << (m_count * 8);
m_count++;
len--;
if (m_count == 4) {
mmix(m_hash, m_tail);
m_tail = 0;
m_count = 0;
}
}
}
uint32_t m_hash;
uint32_t m_tail;
uint32_t m_count;
uint32_t m_size;
};
Regards Thomas Linder Puls
PDC
PDC
-
- Posts: 10
- Joined: 3 May 2001 23:01
-
- Active Member
- Posts: 40
- Joined: 30 Nov 2003 0:01
Re: Algorithm of string:: hash?
Thomas, thanks for the valuable information!
My idea - to write the fast and simple hash-function focused on national texts (separately or together).
The cycle of the algorithm murmur contains 8 commands (for 32/64-bit):
3 mul + 1 add + 2 rol + 2 xor (according wiki)
The cycle of the my function contains 4 commands (for 16/32/64-bit) :
2 rol + 2 xor
However the function may has n-bit (n = 8,9..63,64,..) for a choice of the desirable size of the hash-table.
Each the n-bit command rol is emulated by a set of commands:
1 shr + 1 shl + 1 or + 1 and
Therefore the function contains 10 fast and simple commands (for n-bit):
2 shr + 2 shl + 2 or + 2 and + 2 xor
Efficiency of these functions is comparable.
Function murmur gives 2 collisions for the Russian lexicon (more than 125 thousand words)
My function gives 3 collisions, but it can be n-bit function.
I will improve function and investigate it for the English dictionary and any other.
I will inform later the code of function and results. If it is interesting :)
My idea - to write the fast and simple hash-function focused on national texts (separately or together).
The cycle of the algorithm murmur contains 8 commands (for 32/64-bit):
3 mul + 1 add + 2 rol + 2 xor (according wiki)
The cycle of the my function contains 4 commands (for 16/32/64-bit) :
2 rol + 2 xor
However the function may has n-bit (n = 8,9..63,64,..) for a choice of the desirable size of the hash-table.
Each the n-bit command rol is emulated by a set of commands:
1 shr + 1 shl + 1 or + 1 and
Therefore the function contains 10 fast and simple commands (for n-bit):
2 shr + 2 shl + 2 or + 2 and + 2 xor
Efficiency of these functions is comparable.
Function murmur gives 2 collisions for the Russian lexicon (more than 125 thousand words)
My function gives 3 collisions, but it can be n-bit function.
I will improve function and investigate it for the English dictionary and any other.
I will inform later the code of function and results. If it is interesting :)
- Thomas Linder Puls
- VIP Member
- Posts: 1424
- Joined: 28 Feb 2000 0:01
Re: Algorithm of string:: hash?
Yes, it sounds very interesting, so please keep us up-to-date on your results.
You should notice that we wanted an incremental function, where data can be hashed in chunks (i.e. so you can hash a large structure, which consists of several pieces of data which is not in consecutive memory (e.g. a list)). See the hash class/interface.
You should notice that we wanted an incremental function, where data can be hashed in chunks (i.e. so you can hash a large structure, which consists of several pieces of data which is not in consecutive memory (e.g. a list)). See the hash class/interface.
Regards Thomas Linder Puls
PDC
PDC