Create Nested Dictionary Efficiently [Benchmark Between 3 Approaches]

Sometimes, You may find that it is helpful to access and process your data with nested dictionary. In this article, we see the best and the most efficient method to create a nested dictionary – a dictionary whose all values are also dictionaries.

Make Your Code Clearer With Nested Dictionary

When you use a nested dictionary to process your data, Your code is much more readable. In a few lines of code, you can easily access the desired data without unnecessary if-statements, making your intent is more apparent to the reader of the code.

Let’s see it in the following example, Suppose you have tabular data that was retrieved from rational database or CSV file:

Tabular Data – A List From Rational Database Or CSV
 data = [
     ['key1-A','key2-A', 'key1', 8000],
     ['key1-A','key2-A', 'key2', 8001],
     ['key1-A','key2-B', 'key1', 8010],
     ['key1-A','key2-B', 'key2', 8011],
     ['key1-B','key2-A', 'key1', 8100],
     ['key1-B','key2-A', 'key2', 8101],
     ['key1-B','key2-B', 'key1', 8110],
     ['key1-B','key2-B', 'key2', 8111],
 ]

Now, suppose we want to access two specific rows and print their fourth value:

  • The row that its first value is 'key1-A', its second value is 'key2-B' and its third value is 'key2' – that is; the value 1011
  • The row that its first value is 'key1-B', its second value is 'key2-A' and its third value is 'key1' – that is; the value 1100

You probably end with something similar to the following code:

Tabular Data – A List From Rational Database Or CSV
for row in data:    
    if row[0] == 'key1-A':
        if row[1] == 'key2-B':
            if row[2] == 'key2':
                print(row[3]) # print 8011
    if row[0] == 'key1-B':
        if row[1] == 'key2-A':
            if row[2] == 'key1':
                print(row[2]) # print 8100

However, if you first convert the data to a nested dictionary, you can access the data easily as the following code shows:

Tabular Data – A List From Rational Database Or CSV
>>> nested = toNested(data)
{ 'key1-A' : { 'key2-A' : { 'key1': 8000,
                              'key2': 8001 } } ,
  'key1-A' : { 'key2-B' : { 'key1': 8010,
                              'key2': 8011 } } ,
  'key1-B' : { 'key2-A' : { 'key1': 8100,
                              'key2': 8101 } } ,
  'key1-B' : { 'key2-B' : { 'key1': 8110,
                              'key2': 8111 } } }

>>> print(nested['key1-A']['key2-B']['key2'])
8011
>>> print(nested['key1-B']['key2-A']['key1'])
8100

As we can see in the above example, when you use a nest dictionaries, You do not need to access the data with if-statements anymore, and your code is shorter and more transparent.

Now, Let’s see how to create the nest dictionaries,

Approach 1: Nested Dictionary With dict.setdefault

The dict has a handy method to creating nested dictionaries on the fly. dict.setdefault(key,default) will return the key value if it exists in the dictionary; otherwise, it will insert the key into the dictionary with the value default. Thus, we can write the following straightforward function:

With dict.setdefault
 def toNested1(data):
     nested = {}
     for a0,a1,a2,a3 in data:
         nested.setdefault(a0, {}).setdefault(a1, {})[a2] = a3
     return nested

Approach 2: Nested Dictionary With Optimized dict.setdefault

There is one limitation to approach 1. Every time we process a row in the data, we create a new dictionary even if the key is already in the dictionary. Let’s see how we can fix it by implementing our version of dict.setdefault that will create a new dictionary only when the key is not the dictionary.

With Optimized dict.setdefault
def toNested2(data):
     def addKeyDict(map,key):
         if key not in map:
             item = map[key] = {}
             return item
         return map[key]
     nested = {}
     for a0,a1,a2,a3 in data :
         addKeyDict( addKeyDict( nested, a0) , a1 )[a2] = a3
         return nested

When benchmarking those functions, we found that the second one is faster by ~10%. dict.setdefault benchmark. Here is benchmarking code to compare the 2 functions:

Benchmark Two Functions
 if __name__ == '__main__':
     def faster(z1,z2):
         return round(100 * (z1-z2)/z1, 1);
 import pprint
 import timeit
 pp = pprint.PrettyPrinter(indent=2)
 pp.pprint(toNested1(data))
 pp.pprint(toNested2(data))
 z1 = timeit.timeit("toNested1(data*10);", setup="from __main__ import toNested1,data")
 z2 = timeit.timeit("toNested2(data*10);", setup="from __main__ import toNested2,data")
 print(faster(z1,z2))

Approach 3: Nested Dictionary With collections.defaultdict

The collections module provides specialized container datatypes providing alternatives to Python’s general purpose built-in dict. One of them is collections.defaultdict.

The behavior of this class is similar to above approaches. When you evaluate the expression dictionary[key] the result will be the value of key if it exist in the dictionary, otherwise the function will call the function stored in default_factory attribute, store its return value, and insert the key into the dictionary with its value. The default_factory attribute can be initialized in the defaultdict constructor.

Using this method we can implement the following method: collections.defaultdict

With collections.defaultdict
 def nested_dict():
     return defaultdict(nested_dict)
 def toNested3(data):
     nested = nested_dict()
     for a0,a1,a2,a3 in data:
         nested[a0][a1][a2] = a3
     return nested

When we benchmark toNested2 and this approach, we found that it faster by 55%. (and ~60% faster than the toNested1)

The Best Way To Create Nested Dictionary

We see 3 different methods to create it . After benchmarking, we found that the efficient way to create a nested dictionary is to collections.defaultdict – It is faster by more than 50% than the other methods.

Share with us if you find a better method.

Nested Dictionary FAQ

What Is A Nested Dictionary?

Nested dictionary is a dictionary that its values are also dictionaries.

We say that a dictionary is a nested dictionary. If all the values of the dictionary are dictionaries. This type of dictionary allows you can easily access the desired data without unnecessary if-statements and make your intents more apparent to the reader of the code.

How To Create a Nested Dictionary?

What is the Best Way To Create a Nested Dictionary?

We find that collections.defaultdict is the best and the most efficient way to create a nested dictionary. It is faster by ~55% than any other method.

Leave a Reply

Your email address will not be published. Required fields are marked *