Hashing
hash table \(ht\) contains * \(b\) buckets ht[0],h1[1],...,ht[b-1] * A bucket has \(s\) slots
loading density
\[
a = n/sb
\]
n = hash 中已經放入的資料 s = 每個bucket 有幾個slots b = 幾個bucket
Note:若 load density 越大, 代表 Hash Table 利用度高,但相對 Collision 機會也大
Collision and Overflow
Collision
所在的bucket已經有了資料
Overflow
所在的bucket已經滿了
Hash function
Division
\[
h(k) = k\%D
\]
D = buckets
Folding
將key 切成幾個小片段\(P_0,P_1,...,P_i\) 接下來將他們加在一起
Folding at the boundaries : 偶數做reverse
Overflow Handing
Chaining
每個bucket 皆使用linked list 實作
Open Addressing
尋找還沒有滿的bucket
- Linear probing 當 \(H (x)\) 發生 overflow 的時候,則探測 \((H (x)+i)% B\),\(B\) 為 Bucket 數,\(i = 1,2,3,…,B-1\),直到有 Bucket 可存 or Table 全滿為止
sample
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int h[10005][105];
int main(){
int bucket,slot;
scanf("bucket %d\n",&bucket);
scanf("slot %d\n",&slot);
char op[20];
memset(h,-1,sizeof(h));
while(scanf("%s",op)){
if(strcmp(op,"exit") == 0)break;
int num;
scanf("%d",&num);
if(strcmp(op,"insert") == 0){
int key = num % bucket;
int i = 0,flag = 0;
while( i < bucket){
for(int j=0;j<slot;j++){
if(h[key][j] == -1){
h[key][j] = num;
flag = 1;
break;
}
}
if(flag) break;
i++;
key = (key+1)%bucket;
}
}
if(strcmp(op,"search") == 0){
int key = num % bucket;
int i = 0,flag = 0;
while( i < bucket){
for(int j=0;j<slot;j++){
if(h[key][j] == num){
printf("%d %d\n",key,j);
flag = 1;
break;
}
}
if(flag) break;
i++;
key = (key+1)%bucket;
}
}
if(strcmp(op,"delete") == 0){
int key = num % bucket;
int i = 0,flag = 0;
while( i < bucket){
for(int j=0;j<slot;j++){
if(h[key][j] == num){
h[key][j] = -1;
flag = 1;
break;
}
}
if(flag) break;
i++;
key = (key+1)%bucket;
}
}
}
}
- Quadratic probing (二次方探測)
會以下面的順序探測hash function $$ ht[h(k) \% b]\ ht[(h(k)+1) \% b],ht[(h(k) - 1) \% b]\ ht[(h(k)+2^2) \% b],ht[(h(k) - 2^2) \% b]\ ht[(h(k)+3^2) \% b],ht[(h(k) - 3^2) \% b]\ $$
b 需要是質數
- Rehashing 有很多的hash function \(h_1,h_2,...,h_m\) 滿了改用下一個直到找到可以存的地方或hash function用完為止
Performance
Worst-case find/insert/delete time is O(n) 在沒有Collision的情況下預期search是O(1)