Create Nested Dictionary Efficiently [Benchmark Between 3 Approaches]

Sometimes, You may find that it is helpful to access and process your data with nested dictionary. In this article, we see the best and the most efficient method to create a nested dictionary – a dictionary whose all values are also dictionaries.

Make Your Code Clearer With Nested Dictionary

When you use a nested dictionary to process your data, Your code is much more readable. In a few lines of code, you can easily access the desired data without unnecessary if-statements, making your intent is more apparent to the reader of the code.

Let’s see it in the following example, Suppose you have tabular data that was retrieved from rational database or CSV file:

Tabular Data – A List From Rational Database Or CSV
`````` data = [
['key1-A','key2-A', 'key1', 8000],
['key1-A','key2-A', 'key2', 8001],
['key1-A','key2-B', 'key1', 8010],
['key1-A','key2-B', 'key2', 8011],
['key1-B','key2-A', 'key1', 8100],
['key1-B','key2-A', 'key2', 8101],
['key1-B','key2-B', 'key1', 8110],
['key1-B','key2-B', 'key2', 8111],
]
``````

Now, suppose we want to access two specific rows and print their fourth value:

• The row that its first value is `'key1-A'`, its second value is `'key2-B'` and its third value is `'key2'` – that is; the value 1011
• The row that its first value is `'key1-B'`, its second value is `'key2-A'` and its third value is `'key1'` – that is; the value 1100

You probably end with something similar to the following code:

Tabular Data – A List From Rational Database Or CSV
``````for row in data:
if row[0] == 'key1-A':
if row[1] == 'key2-B':
if row[2] == 'key2':
print(row[3]) # print 8011
if row[0] == 'key1-B':
if row[1] == 'key2-A':
if row[2] == 'key1':
print(row[2]) # print 8100``````

However, if you first convert the data to a nested dictionary, you can access the data easily as the following code shows:

Tabular Data – A List From Rational Database Or CSV
``````>>> nested = toNested(data)
{ 'key1-A' : { 'key2-A' : { 'key1': 8000,
'key2': 8001 } } ,
'key1-A' : { 'key2-B' : { 'key1': 8010,
'key2': 8011 } } ,
'key1-B' : { 'key2-A' : { 'key1': 8100,
'key2': 8101 } } ,
'key1-B' : { 'key2-B' : { 'key1': 8110,
'key2': 8111 } } }

>>> print(nested['key1-A']['key2-B']['key2'])
8011
>>> print(nested['key1-B']['key2-A']['key1'])
8100``````

As we can see in the above example, when you use a nest dictionaries, You do not need to access the data with if-statements anymore, and your code is shorter and more transparent.

Now, Let’s see how to create the nest dictionaries,

Approach 1: Nested Dictionary With dict.setdefault

The `dict` has a handy method to creating nested dictionaries on the fly. `dict.setdefault(key,default)` will return the `key` value if it exists in the dictionary; otherwise, it will insert the `key` into the dictionary with the value `default`. Thus, we can write the following straightforward function:

With dict.setdefault
`````` def toNested1(data):
nested = {}
for a0,a1,a2,a3 in data:
nested.setdefault(a0, {}).setdefault(a1, {})[a2] = a3
return nested``````

Approach 2: Nested Dictionary With Optimized dict.setdefault

There is one limitation to approach 1. Every time we process a row in the data, we create a new dictionary even if the key is already in the dictionary. Let’s see how we can fix it by implementing our version of `dict.setdefault` that will create a new dictionary only when the key is not the dictionary.

With Optimized dict.setdefault
``````def toNested2(data):
if key not in map:
item = map[key] = {}
return item
return map[key]
nested = {}
for a0,a1,a2,a3 in data :
return nested``````

When benchmarking those functions, we found that the second one is faster by ~10%. dict.setdefault benchmark. Here is benchmarking code to compare the 2 functions:

Benchmark Two Functions
`````` if __name__ == '__main__':
def faster(z1,z2):
return round(100 * (z1-z2)/z1, 1);
import pprint
import timeit
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(toNested1(data))
pp.pprint(toNested2(data))
z1 = timeit.timeit("toNested1(data*10);", setup="from __main__ import toNested1,data")
z2 = timeit.timeit("toNested2(data*10);", setup="from __main__ import toNested2,data")
print(faster(z1,z2))``````

Approach 3: Nested Dictionary With collections.defaultdict

The `collections` module provides specialized container datatypes providing alternatives to Python’s general purpose built-in dict. One of them is `collections.defaultdict`.

The behavior of this class is similar to above approaches. When you evaluate the expression `dictionary[key]` the result will be the value of `key` if it exist in the dictionary, otherwise the function will call the function stored in `default_factory` attribute, store its return value, and insert the `key` into the dictionary with its value. The `default_factory` attribute can be initialized in the `defaultdict` constructor.

Using this method we can implement the following method: collections.defaultdict

With collections.defaultdict
`````` def nested_dict():
return defaultdict(nested_dict)
def toNested3(data):
nested = nested_dict()
for a0,a1,a2,a3 in data:
nested[a0][a1][a2] = a3
return nested``````

When we benchmark toNested2 and this approach, we found that it faster by 55%. (and ~60% faster than the toNested1)

The Best Way To Create Nested Dictionary

We see 3 different methods to create it . After benchmarking, we found that the efficient way to create a nested dictionary is to collections.defaultdict – It is faster by more than 50% than the other methods.

Share with us if you find a better method.

Nested Dictionary FAQ

What Is A Nested Dictionary?

We say that a dictionary is a nested dictionary. If all the values of the dictionary are dictionaries. This type of dictionary allows you can easily access the desired data without unnecessary if-statements and make your intents more apparent to the reader of the code.

What is the Best Way To Create a Nested Dictionary?

We find that collections.defaultdict is the best and the most efficient way to create a nested dictionary. It is faster by ~55% than any other method.