Skip to content

Hashing

hash table \(ht\) contains * \(b\) buckets ht[0],h1[1],...,ht[b-1] * A bucket has \(s\) slots

loading density

\[ a = n/sb \]

n = hash 中已經放入的資料 s = 每個bucket 有幾個slots b = 幾個bucket

Note:若 load density 越大, 代表 Hash Table 利用度高,但相對 Collision 機會也大

Collision and Overflow

Collision

所在的bucket已經有了資料

Overflow

所在的bucket已經滿了

Hash function

Division

\[ h(k) = k\%D \]

D = buckets

Folding

將key 切成幾個小片段\(P_0,P_1,...,P_i\) 接下來將他們加在一起

Folding at the boundaries : 偶數做reverse

Overflow Handing

Chaining

每個bucket 皆使用linked list 實作

Open Addressing

尋找還沒有滿的bucket

  • Linear probing 當 \(H (x)\) 發生 overflow 的時候,則探測 \((H (x)+i)% B\)\(B\) 為 Bucket 數,\(i = 1,2,3,…,B-1\),直到有 Bucket 可存 or Table 全滿為止

sample

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int h[10005][105];
int main(){
    int bucket,slot;
    scanf("bucket %d\n",&bucket);
    scanf("slot %d\n",&slot);
    char op[20];
    memset(h,-1,sizeof(h));
    while(scanf("%s",op)){
        if(strcmp(op,"exit") == 0)break;
        int num;
        scanf("%d",&num);
        if(strcmp(op,"insert") == 0){
            int key = num % bucket;
            int i = 0,flag = 0;
            while( i < bucket){
                for(int j=0;j<slot;j++){
                    if(h[key][j] == -1){    
                        h[key][j] = num;
                        flag = 1;
                        break;
                    }
                }
                if(flag) break;
                i++;
                key = (key+1)%bucket;
            }
        }
        if(strcmp(op,"search") == 0){
            int key = num % bucket;
            int i = 0,flag = 0;
            while( i < bucket){
                for(int j=0;j<slot;j++){
                    if(h[key][j] == num){   
                        printf("%d %d\n",key,j);
                        flag = 1;
                        break;
                    }
                }
                if(flag) break;
                i++;
                key = (key+1)%bucket;
            }
        }
        if(strcmp(op,"delete") == 0){
            int key = num % bucket;
            int i = 0,flag = 0;
            while( i < bucket){
                for(int j=0;j<slot;j++){
                    if(h[key][j] == num){   
                        h[key][j] = -1;
                        flag = 1;
                        break;
                    }
                }
                if(flag) break;
                i++;
                key = (key+1)%bucket;
            }
        }

    }
}

  • Quadratic probing (二次方探測)

會以下面的順序探測hash function $$ ht[h(k) \% b]\ ht[(h(k)+1) \% b],ht[(h(k) - 1) \% b]\ ht[(h(k)+2^2) \% b],ht[(h(k) - 2^2) \% b]\ ht[(h(k)+3^2) \% b],ht[(h(k) - 3^2) \% b]\ $$

b 需要是質數

  • Rehashing 有很多的hash function \(h_1,h_2,...,h_m\) 滿了改用下一個直到找到可以存的地方或hash function用完為止

Performance

Worst-case find/insert/delete time is O(n) 在沒有Collision的情況下預期search是O(1)