14  Hashing

15 Key concepts

15.1 The problem of searching

In the context of hashing, the problem of searching refers to the task of finding a specific element in a set of elements that have been stored in a hash table. The goal of searching in a hash table is to quickly find the location of the element in the table, so that its value can be accessed or modified.

The problem of searching in a hash table can be divided into two main sub-problems: first, determining the hash value of the element to be searched, and second, using this hash value to locate the element in the hash table. The hash function used to determine the hash value should be designed to minimize collisions, so that the search can be performed quickly and efficiently.

In order to search for an element in a hash table, the same hash function that was used to store the element in the table must be used to compute its hash value. Once the hash value is computed, the element can be found by using a search algorithm that takes into account the collision resolution method used by the hash table.

There are several search algorithms that can be used to search for an element in a hash table, including linear probing, quadratic probing, and chaining. The choice of search algorithm depends on the specific implementation of the hash table and the characteristics of the input data. The goal of these algorithms is to find the location of the element in the hash table in as few operations as possible, so that the search can be performed quickly and efficiently.

15.2 Hash tables, hash functions

In the context of hashing, a hash table is a data structure that is used to store a set of elements, with the goal of providing efficient insertion, deletion, and search operations. A hash table uses a hash function to map each element to a unique location in the table, based on its key.

A hash function is a function that takes an element as input and computes a hash code, which is an integer that represents the location of the element in the hash table. A good hash function should be designed to minimize collisions, which occur when two or more elements are mapped to the same location in the table. Collisions can lead to slower insertion, deletion, and search operations in the hash table, so a good hash function is crucial to the performance of a hash table.

Once the hash code is computed, the element is stored in the corresponding location in the hash table. If another element is already stored in that location, a collision occurs, and a collision resolution method is used to resolve the conflict. There are several collision resolution methods that can be used, including linear probing, quadratic probing, and chaining.

Hash tables are widely used in computer science and software engineering, due to their efficiency in storing and retrieving data. Hash tables are used in applications such as database indexing, symbol tables in compilers and interpreters, and network routing tables.

15.3 Collision resolution techniques

In the context of hashing, collision resolution techniques are methods used to handle the situation when two or more elements in a hash table are mapped to the same location, also known as a collision. There are several techniques for resolving collisions, including the following:

  1. Open addressing: In this technique, when a collision occurs, the hash function is used to compute a new location for the element in the table, based on a predefined rule. There are several rules that can be used for open addressing, including linear probing, quadratic probing, and double hashing.

  2. Chaining: In this technique, each location in the hash table contains a linked list of elements that have the same hash code. When a collision occurs, the new element is added to the linked list at the corresponding location.

  3. Cuckoo hashing: In this technique, each element is assigned two hash functions and can be stored in two different locations in the hash table. When a collision occurs, one of the elements is evicted from its location and moved to the alternate location, recursively updating the hash codes until a free location is found.

The choice of collision resolution technique depends on the specific requirements of the application and the characteristics of the input data. The goal of these techniques is to minimize the number of collisions and the time required to resolve them, in order to maintain the performance of the hash table.

16 Learning outcomes

16.1 Describe the different methods used to search for data

There are different methods used to search for data in a hash table, depending on the specific implementation and the characteristics of the data being searched. Here are some common methods:

  1. Direct Addressing: In this method, the hash function is used to compute the index of the location in the hash table where the element is stored. When searching for an element, the hash function is used to compute the index, and the element is retrieved from that location.

  2. Linear Probing: In this method, if a collision occurs, the search algorithm checks the next location in the table, and so on, until an empty location is found. When searching for an element, the search algorithm uses the hash function to compute the index of the first location where the element might be stored, and then checks each subsequent location until the element is found.

  3. Quadratic Probing: This method is similar to linear probing, but instead of checking the next location in the table, the search algorithm checks a location that is a quadratic offset from the original location. For example, if the original location is i, the next location checked is (i + 1^2) % table_size, followed by (i + 2^2) % table_size, and so on.

  4. Chaining: In this method, each location in the hash table contains a linked list of elements that have the same hash code. When searching for an element, the search algorithm uses the hash function to compute the index of the location where the element might be stored, and then searches the linked list at that location to find the element.

The choice of search method depends on the specific implementation of the hash table and the characteristics of the input data. The goal of these methods is to find the location of the element in the hash table as quickly and efficiently as possible, in order to maintain the performance of the hash table.

16.2 Describe different collision resolution methods

Collision resolution methods are used to handle the situation when two or more elements in a hash table are mapped to the same location, also known as a collision. There are several collision resolution methods, including the following:

  1. Open Addressing: In this method, when a collision occurs, the hash function is used to compute a new location for the element in the table, based on a predefined rule. There are several rules that can be used for open addressing, including linear probing, quadratic probing, and double hashing.

  2. Chaining: In this method, each location in the hash table contains a linked list of elements that have the same hash code. When a collision occurs, the new element is added to the linked list at the corresponding location.

  3. Robin Hood Hashing: In this method, the hash table is filled sequentially with elements, and when a collision occurs, the element with the smaller distance from its original location is moved closer to the location where it would have been inserted if there were no collision.

  4. Cuckoo Hashing: In this method, each element is assigned two hash functions and can be stored in two different locations in the hash table. When a collision occurs, one of the elements is evicted from its location and moved to the alternate location, recursively updating the hash codes until a free location is found.

The choice of collision resolution method depends on the specific requirements of the application and the characteristics of the input data. The goal of these methods is to minimize the number of collisions and the time required to resolve them, in order to maintain the performance of the hash table.

16.3 Implement a hash table with linear probing collision resolution

Here’s an implementation of a hash table with linear probing collision resolution in Python:

class HashTable:
    def __init__(self, size):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def hash_function(self, key):
        return hash(key) % self.size

    def put(self, key, value):
        index = self.hash_function(key)
        while self.keys[index] is not None and self.keys[index] != key:
            index = (index + 1) % self.size
        self.keys[index] = key
        self.values[index] = value

    def get(self, key):
        index = self.hash_function(key)
        while self.keys[index] is not None:
            if self.keys[index] == key:
                return self.values[index]
            index = (index + 1) % self.size
        return None

    def remove(self, key):
        index = self.hash_function(key)
        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.keys[index] = None
                self.values[index] = None
                return
            index = (index + 1) % self.size

    def print_table(self):
        for i in range(self.size):
            if self.keys[i] is not None:
                print(self.keys[i], self.values[i])

This implementation uses a hash function to compute the index of the location in the hash table where the key-value pair should be stored. If there is a collision, it uses linear probing to find the next available location in the table.

The put() method inserts a key-value pair into the hash table. It first computes the index of the location in the table where the key-value pair should be stored using the hash function. If that location is already occupied, it uses linear probing to find the next available location. Once it finds an empty location, it stores the key-value pair there.

The get() method retrieves the value associated with a given key from the hash table. It first computes the index of the location in the table where the key-value pair should be stored using the hash function. If that location is already occupied and the key at that location is not the desired key, it uses linear probing to find the next occupied location. If it finds the key, it returns the corresponding value. If it reaches an empty location or the end of the table without finding the key, it returns None.

The remove() method removes a key-value pair from the hash table. It first computes the index of the location in the table where the key-value pair should be stored using the hash function. If that location is already occupied and the key at that location is not the desired key, it uses linear probing to find the next occupied location. If it finds the key, it sets the key and value at that location to None.

The print_table() method is a helper method that prints the contents of the hash table.