Mastering Java HashMap: A Comprehensive Guide

BY Neelabh
May 16, 2024
0 Comments
184 Views

This is most frequenly asked by the interviewer, when an interviewer asks about the internal workings of a HashMap, they are typically looking for an understanding of the following concepts:

Hashing: The hashCode() method is used to get a unique integer for a given key. This integer is then used to determine the bucket location where the key-value pair should be stored.

int hash = key.hashCode();

Buckets: A HashMap consists of an array of nodes, which are referred to as buckets. The hash code generated is used to find the appropriate bucket location to store our data.
Collision Handling: If two different keys have the same hash (this is called a collision), they’ll end up in the same bucket. The HashMap handles this by turning that bucket into a linked list (or a tree in case of high hash collisions) and each node in the list contains a reference to the next node.
equals() method: The equals() method is used to ensure the correct key is identified during a collision. When you try to retrieve a value using a key, it will go to the bucket and iterate through the list to find the node with our key. It uses the equals() method to identify the correct node.
Load Factor and Rehashing: The load factor is a measure of how full the HashMap is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt).
Null keys and values: HashMap allows one null key and multiple null values. They are stored in a special null bucket.

Here’s a simple example of how a HashMap works:

HashMap<Integer, String> map = new HashMap<>();
map.put(1, "One"); // The key "1" is hashed and the key-value pair is stored in the corresponding bucket.
map.put(2, "Two");
map.put(3, "Three");

String value = map.get(2); // The key "2" is hashed and used to look up the corresponding bucket. The bucket is then searched for the key-value pair using the equals() method.

Remember, the exact details of how a HashMap works can vary between different versions of Java and different data structures (like LinkedHashMap and TreeMap), but the basic principles remain the same. Understanding these will help you use the HashMap class effectively and efficiently.

HashMap is a part of the Java Collections Framework and Hash table-based implementation of the Map interface. It allows storing key-value pairs and unlike, it permits null keys and values. It doesn’t guarantee any specific order of the elements over time.

Internal Workings of HashMap in Java

HashMap is a widely used data structure in Java that implements the Map interface by using a hashtable. It provides constant-time performance for basic operations like get and put, given the hash function disperses the elements properly among the buckets. Let’s delve into its internal workings.

Java

/* ---------------- Fields -------------- */

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 */
transient Node<K,V>[] table;

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * The number of key-value mappings contained in this map.
 */
transient int size;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 */
transient int modCount;

The transient keyword in Java is used to indicate that a field should not be serialized. In the context of HashMap, the table field, which is an array of HashMapEntry, is marked as transient.

The reason for this is that the table field is an internal data structure used by HashMap for fast lookup of entries. When a HashMap is serialized, it’s not necessary to serialize this entire table because it can be reconstructed from the other data in the HashMap.

Instead of serializing the table directly, HashMap uses custom serialization methods (writeObject and readObject) to write out the size, the load factor, and the individual key-value pairs. When the HashMap is deserialized, these values are read from the stream and the table is rebuilt.

This approach has several benefits:

It reduces the amount of data that needs to be written during serialization¹.
It allows the HashMap to be correctly deserialized even if the hash codes of keys change between serialization and deserialization.
It avoids serializing empty or default-valued buckets in the table, saving space.

So, the transient keyword is used in HashMap to optimize the serialization process.

The table is an array of Node objects, where each Node is a key-value pair. The entrySet holds the cached entrySet(). The size represents the number of key-value mappings, and modCount keeps track of the number of times the HashMap has been structurally modified.

Node Class

Java

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }
}

Each Node in the HashMap is a linked list node. The hash is the hashCode of the key, key and value are the key-value pair, and next is the reference to the next Node in the linked list.

Creating a HashMap

// Placeholder for image: Creating a HashMap

When creating a HashMap, we first calculate the hashCode of the key. This hashCode is then used to find the index of the table where the Node will be stored. If there is no Node at the calculated index, a new Node is created and placed there.

Handling Collisions

// Placeholder for image: Handling Collisions

If a Node already exists at the calculated index, this is a collision. The HashMap traverses the linked list at that index and uses the equals() method to check if a Node with the same key exists. If it does, the value of that Node is updated. If it doesn’t, a new Node is added at the end of the linked list.

Handling Null Keys and Values

// Placeholder for image: Handling Null Keys and Values

HashMap allows null keys and values. If a null key is passed, the Node is stored at the 0th index of the table. If a null value is passed, the Node is created with a null value.

Rehashing and Treeify/Bucketify

// Placeholder for image: Rehashing and Treeify/Bucketify

When the HashMap becomes too full, it increases its capacity and rehashes all its Nodes. If a bucket becomes too large (more than 8 nodes), it is transformed into a balanced tree to improve performance. If it becomes too small (fewer than 6 nodes), it is converted back into a linked list.

Effect of Rehashing on LinkedList and Tree

During rehashing, the order of Nodes in the linked list can get reversed. As for the balanced tree, it gets converted back to a linked list during rehashing and then gets treeified again if the number of Nodes is still high.

This is a basic overview of how HashMap works internally in J

The performance of HashMap

HashMap provides constant-time performance for basic operations like get and put, assuming the hash function distributes the elements evenly among the buckets. The time for iteration over the map depends on its ‘capacity’ (number of buckets) plus its ‘size’ (number of key-value pairs).

let’s break this down:

Iteration Time: The concept of iteration time here is related to the data structure of the HashMap itself. In a HashMap, data is stored in an array of buckets. Each bucket contains a linked list of entries (key-value pairs). When the load factor is low, the HashMap maintains a larger number of buckets relative to the number of entries. This means there are more buckets, many of which may be empty or contain fewer entries.

When you iterate over a HashMap, you’re essentially going through each of its buckets to find the key-value pairs. The time it takes to do this is proportional to the number of buckets (the capacity) plus the number of key-value mappings (the size).

This is because you have to visit each bucket once, and for each bucket, you also have to go through each of its entries.

The iteration time in a HashMap refers to the time it takes to traverse through all the entries in the map. This is directly related to the number of buckets in the HashMap.
When the load factor is low, it means that the HashMap will maintain a larger number of buckets relative to the number of entries (key-value pairs).

This is done to reduce the probability of collisions (i.e., multiple keys being hashed to the same bucket), which can improve the time complexity of operations like get() and put().

However, a side effect of having a larger number of buckets (especially when many of them are empty or sparsely populated) is that it increases the time it takes to iterate over the entire HashMap.

This is because the iterator has to traverse through all the buckets, including the empty ones.
So, while a lower load factor can improve the performance of get() and put() operations by reducing collisions, it can also increase the iteration time. This is an example of the trade-offs involved in choosing an appropriate load factor for a HashMap. Let me know if you need more help!
Capacity and Size: The capacity is the number of buckets in the HashMap, and the size is the number of key-value pairs it contains. Even if a bucket contains no entries, it still takes time to visit that bucket during iteration. That’s why both the capacity and size affect the iteration time.
Initial Capacity: This is the initial number of buckets in the HashMap when it is first created. A bucket is used to store key-value pairs. Each bucket can store multiple key-value pairs in a linked list. If the initial capacity is set too high relative to the number of entries, you’ll end up with many empty buckets, which wastes memory.
Load Factor: The load factor is a measure of how full the HashMap is allowed to get before its capacity is automatically increased. It’s a value between 0 and 1. When the number of entries in the HashMap exceeds the product of the load factor and the current capacity, the HashMap is rehashed (i.e., internal data structures are rebuilt) so that the HashMap has approximately twice the number of buckets.

As a trade-off:
- A lower load factor increases the iteration time (i.e., the time to traverse the HashMap) but decreases the probability of collisions (i.e., multiple keys being hashed to the same bucket).
- A higher load factor decreases the iteration time but increases the probability of collisions.
Initial Capacity and Load Factor: The initial capacity is the number of buckets when the HashMap is created, and the load factor is a threshold for resizing the HashMap. If you set the initial capacity too high, you’ll have many empty buckets if the size (number of entries) is small, which wastes space and increases iteration time.

On the other hand, if you set the load factor too low, the HashMap will resize more frequently, which can slow down the operations that add new entries.

The load factor is a measure of how full the HashMap is allowed to get before its capacity is automatically increased. The default load factor of 0.75 in Java’s HashMap implementation offers a good balance between time and space costs.
Time Costs: This refers to the time it takes to perform operations like get and put. If the load factor is low, that means we allow the HashMap to get less full before resizing it.

This could lead to faster lookups because there are fewer entries in each bucket to iterate over, but it also means we’re more frequently resizing the HashMap, which has a cost in time.
Space Costs: This refers to memory usage. A higher load factor means we allow the HashMap to get more full before resizing it. This could lead to more efficient memory usage because we’re using more of the array before we resize it, but it could also lead to slower lookups if the number of entries in each bucket gets high.

So, if you set a higher load factor, the HashMap will use less memory space, but the cost of looking up an entry (the time cost) could increase because more entries end up in each bucket. Conversely, a lower load factor can result in faster lookups but uses more memory because the HashMap gets resized more often.

Concurrency and Synchronization

HashMap is not synchronized, meaning it’s not safe for use by multiple threads without external synchronization. If multiple threads access a HashMap concurrently, and at least one of the threads modifies the map structurally, it can lead to data inconsistency issues.

So, both structural modifications and changing the value associated with a key are not thread-safe operations in HashMap. So, if multiple threads are expected to modify the map, external synchronization is necessary.

A structural modification refers to an operation that changes the structure of the map, i.e., it adds or deletes one or more key-value pairs. For example, operations like put(key, value) (if the key is not already in the map) or remove(key) are considered structural modifications because they change the number of key-value pairs in the map.

If a key is already present in the map, and you just change its associated value using put(key, newValue), it’s not considered a structural modification. This is because the number of key-value pairs in the map remains the same; only the value associated with the key changes.

Fail-Fast Iterators

The iterators returned by HashMap are fail-fast. If the map is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove method, the iterator will throw a ConcurrentModificationException. This fail-fast behaviour is designed to prevent non-deterministic behaviour at an undetermined time in the future.

However, it’s important to note that the fail-fast behaviour of an iterator can’t be guaranteed in the presence of unsynchronized concurrent modification.

Fail-fast iterators throw ConcurrentModificationException on a best-effort basis, meaning they do their best to detect structural modifications but there’s no guarantee they will always succeed.

Therefore, it would be wrong to write a program that depends on this exception for its correctness. The fail-fast behaviour of iterators should be used only to detect bugs, such as concurrent modification during iteration, which is typically not a valid operation for most collections.

Java

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

public class ConcurrentHashMapModificationDemo {
    public static void main(String[] args) {
        Map<String, String> map = new HashMap<>();
        map.put("key1", "value1");
        map.put("key2", "value2");
        map.put("key3", "value3");
        new Thread(() -> {
            Iterator<String> iterator = map.keySet().iterator();
            while (iterator.hasNext()){
                System.out.println(iterator.next());
                try {
                    // Pauses the current thread for 100 milliseconds.
                    // This can be used to simulate time-consuming tasks,
                    // or to pause execution to allow other threads to run.
                    // In this case, it allows another thread to potentially
                    // modify the structure of the map during iteration,
                    // which can trigger a ConcurrentModificationException.
                    Thread.sleep(100);
                }catch (InterruptedException ex){
                    ex.printStackTrace();
                }
            }
        }).start();

        new Thread(() -> {
            try {
                Thread.sleep(50);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            map.put("key4", "value4");
        }).start();
    }
}

OUTPUT

Java

//OUTPUT
"C:\Program Files\Java\jdk-11.0.16.1\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2022.1.1\lib\idea_rt.jar=31867:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2022.1.1\bin" -Dfile.encoding=UTF-8 -classpath "C:\Users\NEELABH SINGH\Documents\Projects\practice_algods\out\production\practice_algods" concurrent.ConcurrentHashMapModificationDemo
key1
Exception in thread "Thread-0" java.util.ConcurrentModificationException
	at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1510)
	at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1533)
	at concurrent.ConcurrentHashMapModificationDemo.lambda$main$0(ConcurrentHashMapModificationDemo.java:16)
	at java.base/java.lang.Thread.run(Thread.java:834)

In the provided code example, we have two threads: one that is iterating over the HashMap and another that is trying to modify it. The modification happens after a short delay (Thread.sleep(100)), so it’s possible that the iterator has already been called hasNext() before the modification occurs.

The hasNext() method of an iterator checks if the map structure has been modified since the iterator was created and if it has, it throws a ConcurrentModificationException.

However, if the map is modified after hasNext() is called, the iterator won’t know about this modification until the next call to hasNext().

Instead, it will throw the exception at the next call to hasNext(), or it might not throw the exception at all if there are no more elements to iterate over.

This is why the fail-fast behaviour of iterators in Java can’t be guaranteed in the presence of unsynchronized concurrent modification. It’s generally a good practice to avoid modifying a collection while iterating over it, unless you’re using an explicit iterator and its remove method.

If multiple threads need to access a shared collection, consider using a concurrent collection like ConcurrentHashMap, or synchronize access to the collection externally.

In conclusion, HashMap is a powerful data structure in Java that provides efficient operations for storing and retrieving data. However, it’s important to understand its characteristics and behaviours, especially when dealing with multithreading, to use it effectively and avoid potential issues.

ava. For more detailed information, refer to the official Java documentation.

More On HashMap

Java

public class HashMapExample{
    public static void main(String [] args){
        Map<String, Long> phoneBook = new HashMap<>();
        phoneBook.put("Neelabh", 12132123L);
        phoneBook.put("Sushama", 232323L);
        phoneBook.put("Aadvik", 232121L);
        Iterator<String> itr =  phoneBook.keySet().iterator();
        while(itr.hasNext()) {
            String personName = itr.next();
            System.out.println(personName);
            if (personName.equals("Neelabh")) {
                phoneBook.put("Naveen", 8128182L);
            }
        }
    }
}

Java

public class HashMapExample{
    public static void main(String [] args){
        Map<String, Long> phoneBook = new HashMap<>();
        phoneBook.put("Neelabh", 12132123L);
        phoneBook.put("Sushama", 232323L);
        phoneBook.put("Aadvik", 232121L);
        Iterator<String> itr =  phoneBook.keySet().iterator();
        while(itr.hasNext()) {
            String personName = itr.next();
            System.out.println(personName);
            if (personName.equals("Neelabh")) {
                phoneBook.put("Neelabh", 8128182L);
            }
        }
    }
}

The reason why the second program does not throw a ConcurrentModificationException is because it does not violate the fail-fast behavior of the iterator.

In the first program:

phoneBook.put("Naveen", 8128182L);

This line adds a new entry to the HashMap while the iterator is still in use. This violates the fail-fast behavior, which means that if the collection is modified after the iterator has been created, the iterator will throw a ConcurrentModificationException.

In the second program:

phoneBook.put("Neelabh", 8128182L);

This line is not adding a new entry to the HashMap, but rather updating an existing key-value pair. Since the size of the HashMap does not change, the fail-fast behavior is not violated, and no exception is thrown.

The fail-fast behavior is a safety mechanism in Java collections to prevent unpredictable behavior that could occur when a collection is modified while iterating over it. It ensures that if the collection is modified during iteration, an exception is thrown to avoid unexpected results.

The ConcurrentModificationException is specifically designed to detect structural modifications to the collection, such as adding or removing elements. In the case of the second program, the modification is not structural, as it only updates an existing value, so the fail-fast behavior is not triggered.

In summary, the first program violates the fail-fast behavior because it modifies the structure of the HashMap by adding a new entry, while the second program only updates an existing value, which does not violate the fail-fast behavior.

The HashMap keeps track of structural modifications through the use of a modification count variable, which is incremented every time the structure of the HashMap is changed.

Here’s how it works:

When an iterator is created for a HashMap, the current value of the modification count variable is recorded.
During the iteration process, if any structural modification occurs to the HashMap, such as adding or removing an entry, the modification count variable is incremented.
Before returning the next element in the iteration, the iterator checks if the current modification count value is different from the recorded value at the time of its creation.
If the modification count has changed, it means that a structural modification has occurred, and the iterator throws a ConcurrentModificationException.

The modification count variable is part of the internal implementation of the HashMap, and it is updated automatically whenever the structure of the HashMap is modified, such as when new entries are added, existing entries are removed, or the entire HashMap is cleared.

However, if an existing entry is simply updated with a new value (as in the second example you provided), the modification count variable is not incremented because the structure of the HashMap remains the same. Only the value associated with an existing key is changed, which does not trigger the fail-fast behavior.

This mechanism allows the HashMap to detect structural modifications during iteration and throw an exception to prevent unpredictable behavior and potential data corruption.

It’s important to note that this fail-fast behavior is a safety mechanism, but it does not guarantee thread-safety. If multiple threads are modifying the HashMap concurrently without proper synchronization, it may still lead to race conditions and data inconsistencies.

Sure, the modification count variable in the HashMap implementation is named modCount.

In the next() method of the HashMap.HashIterator class (which is the iterator implementation for HashMap), the modCount variable is checked against the expectedModCount variable, which is initialized at the time of iterator creation.

Here’s the relevant code from the OpenJDK 8 source code (HashMap.HashIterator.next() method):

final Entry<K,V> nextEntry() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    // Rest of the method implementation
}

The modCount and expectedModCount variables are defined in the HashMap class itself:

transient int modCount = 0; // Used for fail-fast behavior

final Entry<K,V> nextEntry() {
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();

    Entry<K,V> e = nextEntry;
    if (e == null)
        throw new NoSuchElementException();

    entryNext = e.next;
    currentValue = e;
    updateEntry(currentValue);
    if (modCount != expectedModCount)
        throw new ConcurrentModificationException();
    return e;
}

The modCount variable is incremented whenever a structural modification occurs in the HashMap, such as in the put, remove, clear, and putAll methods.

For example, in the put method:

public V put(K key, V value) {
    // ...
    modCount++;
    // Rest of the method implementation
}

By comparing the modCount and expectedModCount variables in the next() method, the iterator can detect if a structural modification has occurred during iteration and throw a ConcurrentModificationException accordingly.

This implementation detail allows the HashMap to maintain its fail-fast behavior and ensure that iterators do not return inconsistent or invalid elements when the underlying collection is modified during iteration.

In the case you provided:

public class HashMapExample {
    public static void main(String[] args) {
        Map<String, Long> phoneBook = new HashMap<>();
        phoneBook.put("Neelabh", 12132123L);
        phoneBook.put("Sushama", 232323L);
        phoneBook.put("Aadvik", 232121L);
        Iterator<String> itr = phoneBook.keySet().iterator();
        while (itr.hasNext()) {
            String personName = itr.next();
            System.out.println(personName);
            if (personName.equals("Sushama")) {
                phoneBook.put("Naveen", 8128182L);
            }
        }
    }
}

The reason why no ConcurrentModificationException is thrown is that the structural modification (adding the new entry “Naveen”) happens after the iteration over the keys has already completed.

Here’s what happens step-by-step:

The iterator itr is created from the keySet of the phoneBook HashMap.
The while loop starts, and the first key “Aadvik” is retrieved and printed using itr.next().
The condition personName.equals("Sushama") is not met, so no modification occurs.
The second key “Neelabh” is retrieved and printed using itr.next().
The condition personName.equals("Sushama") is not met, so no modification occurs.
The third key “Sushama” is retrieved and printed using itr.next().
The condition personName.equals("Sushama") is met, and the phoneBook.put("Naveen", 8128182L) line is executed, adding a new entry to the HashMap.
At this point, the iteration over the keys has already completed, and the iterator itr is no longer in use.

The key point here is that the structural modification (adding the new entry “Naveen”) occurs after the iteration has finished. The fail-fast behavior and the ConcurrentModificationException are meant to detect modifications while the iteration is still in progress, not after it has completed.

If you move the phoneBook.put("Naveen", 8128182L) line before the while loop, or inside the loop but before the itr.next() call, you will see the ConcurrentModificationException being thrown because the structural modification happens while the iteration is still in progress.

In summary, the reason why no exception is thrown in your example is that the structural modification occurs after the iteration has completed, which does not violate the fail-fast behavior and does not trigger the ConcurrentModificationExcept

Reference:

Oracle. (2014). Class HashMap. Retrieved from https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

Post Views: 153

Follow Us

Mastering Java HashMap: A Comprehensive Guide

Internal Workings of HashMap in Java

Creating a HashMap

Handling Collisions

Handling Null Keys and Values

Rehashing and Treeify/Bucketify

Effect of Rehashing on LinkedList and Tree

The performance of HashMap

Concurrency and Synchronization

Fail-Fast Iterators

ava. For more detailed information, refer to the official Java documentation.

Sapient Interview Question

3 Questions from My Interview Experience at Infogain

Neelabh

About Author

You may also like

Understanding the Builder Design Pattern in Java | Creational Design Patterns | CodeTechSummit

Base64 Decode