C Program To Implement Dictionary Using Hashing Algorithms [top]
To implement a robust dictionary in C using hashing, you should focus on three core components: a reliable hash function collision resolution strategy dynamic resizing to maintain performance. 1. Robust Hash Function (FNV-1a) For strings, the FNV-1a algorithm
is highly recommended due to its speed and low collision rate. It works by performing a series of XOR and multiply operations on each byte of the key. FNV_OFFSET 14695981039346656037ULL 1099511628211ULL hash = FNV_OFFSET; * p = key; *p; p++) hash ^= ( )(*p); hash *= FNV_PRIME; Use code with caution. Copied to clipboard
2. Collision Resolution: Separate Chaining vs. Open Addressing
Since multiple keys can produce the same hash index, you must choose a resolution method: Separate Chaining : Each slot in the hash table points to a linked list
of entries. This is simple to implement and the table never truly "fills up," though lookups degrade to if lists become long. Open Addressing (Linear Probing) : If a collision occurs, the program searches for the next available slot
in the array. This is more cache-friendly but suffers from "clustering," where occupied slots group together and slow down operations. GeeksforGeeks 3. Dynamic Resizing (Load Factor Management) A hash table's performance is tied to its load factor
(number of items / total slots). As the table fills, collisions increase and speed drops. : Generally, you should resize when the table is : Allocate a new array (typically double the size
), rehash every existing item into the new array, and free the old memory. Feature Summary Table Recommendation Hash Function FNV-1a or djb2 Fast with excellent distribution for string keys. Collision Handling Separate Chaining Easier to manage deletions and avoids clustering. Resizing Rule Load Factor > 0.75 Balances memory usage with average time complexity. Growth Strategy Double Table Size Amortizes the cost of rehashing. For a complete reference, you can look at the K & R C implementation which uses a fixed-size table with chaining for simplicity. Stack Overflow code example combining these features into a single C file? How to implement a hash table (in C) - Ben Hoyt
Implementing a Dictionary in C Using Hashing In computer science, a Dictionary (also known as an Associative Array or Map) is a data structure that stores data in key-value pairs. While you could use a linked list or an array to build one, search times would be slow— in the worst case.
To achieve near-instantaneous lookups, we use Hashing. This article will guide you through the logic, the algorithms, and a complete C implementation of a dictionary using a Hash Table. How Hashing Works
Hashing transforms a "key" (like a word) into an integer index. This index tells us exactly where to store the corresponding "value" (the definition) in an array. Hash Function: Takes a string and returns an integer.
Compression: Maps that large integer into the range of our array size (using the modulo operator %).
Collision Handling: Since different keys can produce the same index, we must handle "collisions." In this guide, we will use Chaining (linked lists at each index). The Components 1. The Node Structure
Each entry in our dictionary will be a node containing the key, the value, and a pointer to the next node (for collisions).
typedef struct Node char *key; char *value; struct Node *next; Node; Use code with caution. 2. The Hash Table The table itself is an array of pointers to these nodes.
#define TABLE_SIZE 100 typedef struct Node *buckets[TABLE_SIZE]; HashTable; Use code with caution. The Implementation
Here is the complete C program. We use a simple but effective hashing algorithm called djb2 to minimize collisions.
#include Use code with caution. Why Use Hashing?
Efficiency: In a well-designed hash table, search, insertion, and deletion take O(1) time on average. c program to implement dictionary using hashing algorithms
Scalability: Dictionaries built with hashing can handle millions of entries while maintaining high performance.
Flexibility: You can map almost any data type (strings, objects, files) to a key. Best Practices
Load Factor: Keep the table size larger than the number of items to prevent long chains.
Memory Management: Always use free() on your nodes and strings to prevent memory leaks in long-running programs.
Choose a Good Hash: Simple "sum of ASCII" functions lead to many collisions. Algorithms like djb2 or MurmurHash are much better for real-world data.
The Hash Function: Converts a string (the key) into an integer index. A common choice is the djb2 algorithm because it distributes strings evenly across the table.
The Hash Table: An array of structs, where each struct holds the word (key) and its definition (value).
Collision Handling: Since different words can hash to the same index, we use Linear Probing—if a slot is taken, the program simply looks at the next available slot (index + 1). The Implementation
#include Use code with caution. Copied to clipboard Why this works Speed: In a perfect scenario, finding a word is
, meaning it takes the same amount of time regardless of how many words are in the dictionary.
Memory: By using a fixed SIZE, we manage memory predictably, though in a real-world app, you would implement Dynamic Resizing to expand the table as it fills up.
To implement a dictionary in C using hashing, you essentially build a hash table that maps string keys to specific values. Since C lacks a built-in dictionary type, you must manage memory and collisions manually. 1. Core Components
A robust hashing-based dictionary requires four main pieces:
Data Structures: A way to store key-value pairs (buckets) and the table itself.
Hash Function: An algorithm to convert a string key into a numerical array index.
Collision Handling: A strategy for when two different keys generate the same index (e.g., Separate Chaining).
API Functions: Standard operations like insert, search, and delete. 2. Implementation Guide Define Structures
Use a linked list for "Separate Chaining" to handle collisions. This allows multiple entries to exist at the same index. To implement a robust dictionary in C using
typedef struct Entry char *key; char *value; struct Entry *next; // Pointer for collision chaining Entry; typedef struct Dictionary int size; Entry **buckets; // Array of pointers to Entry Dictionary; Use code with caution. Copied to clipboard Choose a Hashing Algorithm
A common, efficient algorithm for strings is djb2 or a simple polynomial rolling hash.
Simple Polynomial Hash: Iterates through characters and multiplies by a prime (like 31) to reduce clustering.
Modulo: Always apply % table_size to the result to keep the index within bounds.
unsigned int hash(const char *key, int size) unsigned int hash_val = 5381; int c; while ((c = *key++)) hash_val = ((hash_val << 5) + hash_val) + c; // djb2: hash * 33 + c return hash_val % size; Use code with caution. Copied to clipboard Essential Operations
Insert: Hash the key, then add a new node to the front of the linked list at that index.
Lookup: Hash the key and traverse the linked list at that index using strcmp to find the exact match.
Delete: Find the node in the list and carefully rewire pointers to remove it, then free the memory. 3. Best Practices How to implement a hash table (in C) - Ben Hoyt
You simply start at the beginning ( foo at index 0) and compare each key. If the key matches what you're looking for, you're done. Quick Way to Implement Dictionary in C - Stack Overflow
Once upon a time in the digital kingdom of , there was a chaotic library known as the Flat-File Archives
. Whenever the King (the CPU) wanted to know the meaning of a word, the Royal Librarian had to start at the very first shelf and read every single book until he found the right one. This was a "Linear Search," and it was painfully slow. One day, a clever architect named
proposed a revolutionary system. "Why search every shelf," he asked, "when the word itself can tell us exactly where it lives?" The Great Transformation Hash built a magical machine called the Hash Function
. Its job was simple: take any string of characters and turn it into a specific number—an The Formula: He decided on a method called Polynomial Rolling
. It took the ASCII value of each letter, multiplied it by a prime number, and added them up. The Constraints: Since the library only had 100 shelves (the Table Size ), he used the operator ( ) to make sure every number fit within the library's walls. The Conflict: The Collision
The system worked perfectly until two different words, "Apple" and "Sleep," both produced the same index: . This was the dreaded The architect had two choices: Open Addressing:
If Shelf 42 is full, go to 43, then 44, until you find a spot.
Hang a "Linked List" off Shelf 42. If multiple words land there, just line them up one after another. Hash chose
. It was cleaner and allowed the library to grow even if the shelves got crowded. The Blueprint (The Code) The library became the fastest in the land
To immortalize this system, the architect wrote it in the ancient language of
Node)); strcpy(newNode->key, key); strcpy(newNode->value, value); // Chaining: Insert at the beginning of the list
newNode->next = hash_table[index]; hash_table[index] = newNode; printf( "Stored '%s' at Shelf %d\n" , key, index); // Finding a word index = hash(key); Node* temp = hash_table[index]; (strcmp(temp->key, key) == ) printf( "Found: %s -> %s\n" , key, temp->value); ; temp = temp->next; } printf( "Word not found.\n" // Initialize the library ; i < TABLE_SIZE; i++) hash_table[i] = NULL;
insert( "Algorithm" "A step-by-step procedure." );
insert( "A variable that stores a memory address." );
lookup( "Algorithm" Use code with caution. Copied to clipboard The Moral of the Story</p>
The library became the fastest in the land. Instead of searching thousands of books, the King could find any definition in "Constant Time," or
. The architect proved that with a little bit of math and a well-placed pointer, even the largest mountains of data could be tamed.
And so, the Kingdom of Memory flourished, forever organized by the power of the Hash. method like Linear Probing in more detail, or should we refine the hash function for better performance?
dictionary is an abstract data type that maps unique keys to values. Since C lacks a built-in dictionary like Python or C#, you can implement one efficiently using a hash table . This approach provides average constant-time complexity, , for insertion, search, and deletion. 1. Define the Data Structures
A dictionary requires a structure for individual entries (key-value pairs) and a structure for the table itself. To handle collisions—when two different keys produce the same hash—we use Separate Chaining
, where each table index points to a linked list of entries. GeeksforGeeks
// 1. Structure for a single dictionary entry (node in linked list) Entry *next; } Entry; // 2. Structure for the Dictionary (Hash Table) Entry *buckets[TABLE_SIZE]; Dictionary; Use code with caution. Copied to clipboard 2. Implement the Hash Function
The hash function converts a string key into an integer index within the table's bounds. A common and effective algorithm for strings is
, which uses bit-shifting and a prime number (33) to minimize collisions. Stack Overflow hash_val = ((c = *key++)) hash_val = ((hash_val << ) + hash_val) + c; // hash * 33 + c hash_val % TABLE_SIZE; Use code with caution. Copied to clipboard 3. Core Dictionary Operations
These functions manage the lifecycle of entries in the dictionary: Stack Overflow
3.4 Key-Value Types
Keys are null-terminated strings (char*). Values are integers (int) for demonstration; this can be made generic using void*.
Get value by key
int get(Dictionary* dict, const char* key, int* found)
int index = hash(key, dict->size);
Entry* curr = dict->buckets[index];
while (curr != NULL)
if (strcmp(curr->key, key) == 0)
*found = 1;
return curr->value;
curr = curr->next;
*found = 0;
return -1;
Approach: Separate Chaining
In this method, the hash table is an array of pointers. Each pointer points to a linked list. When two keys result in the same hash index (a collision), they are stored in the same linked list at that index.
3.1 Data Structure: Separate Chaining
We use an array of linked lists (buckets). Each bucket contains all key-value pairs that hash to the same index. This method is simple, handles an arbitrary number of collisions gracefully, and does not require the table to be resized as aggressively as open addressing.
Why separate chaining over open addressing?
- Easier to implement deletion.
- Better for high load factors.
- No clustering issues.