Friday, November 10, 2017

HashMap: Internal Implementation

What is HashMap? HasMap is an implementation of Map<k,v> interface.

map_interface


Map keeps data in Key-Value form. It allowed one null key and also allowed null values for any key. There is no guarantee for insertion order of elements in HashMap. The beauty of HashMap is it’s performance. Get and put operations performed in constant time that is O(1) complexity (Assuming your hash function is implemented in such a way that elements dispersed properly among the buckets available). </k,v>
Initial Capacity: HashMap bucket is backed by Array. 16 is default initial capacity of HashMap, yes off course  you can also provide your own initial capacity by passing to capacity to its constructor, like below.
HashMap<HashMapKey, String> hashMap = new HashMap<>(); //Initial Capacity 16
HashMap<HashMapKey, String> hashMap = new HashMap<>(20); //Initial Capacity 20
Load Factor: Initial capacity and Load factor are two important factor that are crucial for HashMap’s performance. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed so that the hash table has approximately twice the number of buckets.Default value of load factor is 0.75. Load factor can have any value between 0 and 1.
Higher value of load factor decrease the space overhead but get() and put() operations cost increased. While choosing initial capacity expected elements to be added and load factor should be taken into account. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

HashMap Internal Implementation:

How is HashMap internally works?  This is a question which is being asked in almost every java interview question. Even if you are using HashMap for you programming requirement, it it equally important to understand internal implementation of the said data structure. Because if you know how it works, you will be able to avail its features in efficient way.
HashMap is an implementation for Map interface. It keeps the data in key-value pair. An array implementation is used to as bucket with HashMap. Each element in this array is a bucket. Bucket itself is a different data structure. Prior to Java 8, it used to be a LinkedList but in Java 8 its Tree structure. The important thing here to remember is, these LinkedList and Tree are not actual collection provided by Java. These are created inline whenever needed inside HashMap itself.  Here the important thing to understand is it key. Because Map operations are performed on the key. Only one null key is allowed in HashMap  There two main operations performed on HashMap.
  1. put(key, value)
  2. get(key)
These two operations are sufficient to understand HashMap internal working.
While putting key-value into HashMap following important steps can be identified.
  1. Check if the key is null. HashMap allow a single key as null. And null key is treated with special care. Whenever any null key is added to HashMap, it is always placed at zero index of bucket. Its simple because haschCode of null always will be 0.
  2. If key is not null,
    • First of all haschCode() method is called on it to get hashCode value.
    • Then hash() function is applied to get actual index of bucket.
    • After calculating bucket location, two cases are possible
      • There is nothing at that location. In this case simple an Object of Map.Entry will be created and added to LinkedList or Tree. Map.Entry is an inner Interface of Map Class which is used to store key, value, hash value, and next Map.Entry object.
      • There is already any object present in bucket at calculated index. This situation is called Hash Collision. This the case when two objects come up with same hash code. Here equals() method of key object comes to rescue. equals() method check whether both keys are same or not. If both kesy are same value corresponding to the key will be updated with new value (This is how, HashMap avoid duplicate keys.). Else a new node will be created with new key-value and added to the bucket.
If you look into HashMap class in java library code, you will see like this:
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
In above code snippet you can see, before putting value hash() is applied. Don’t focus on other parameters for now.
Similarly while getting any element from HashMap, similar step are followed for finding bucket and then element in side bucked.
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Example:
package com.ktechlabs.java.collections;
import java.util.HashMap;
public class HashMapExample{
public static void main(String[] args){
//Creating instance of HashMap
Map<String, String> hashMap = new HashMap<String, String>();
//Adding key value to hashMap instance
hashMap.put(“topic”, “HashMap Example”);
hashMap.put(“writtenBy”,”KTechLabs”);
//Getting value from hashMap for key “topic”
String topic = hashMap.get(“topic”);
System.out.println(topic);
}
}

Output:
HashMap Example

2 comments:

  • Tarun Kumar says:
    November 27, 2017 at 11:49 PM

    Good post. You can also add how the location of bucket is identified. Hashcode is a big number say 1234567 and buckets are say 16, then to find bucket location = (n - 1) XOR hashcode then leftshift by 1 position.
    Second, you can add detail like tree structure BST is used in place of LinkedList after bucket size reached 8.

  • Tech Java says:
    December 5, 2017 at 11:41 PM

    @Tarun thanks for adding missing information, it will help other readers.

Post a Comment