Sometimes, You may find that it is helpful to access and process your data with nested dictionary. In this article, we see the best and the most efficient method to create a nested dictionary – a dictionary whose all values are also dictionaries.
Make Your Code Clearer With Nested Dictionary
When you use a nested dictionary to process your data, Your code is much more readable. In a few lines of code, you can easily access the desired data without unnecessary if-statements, making your intent is more apparent to the reader of the code.
Let’s see it in the following example, Suppose you have tabular data that was retrieved from rational database or CSV file:
Tabular Data – A List From Rational Database Or CSV
data = [
['key1-A','key2-A', 'key1', 8000],
['key1-A','key2-A', 'key2', 8001],
['key1-A','key2-B', 'key1', 8010],
['key1-A','key2-B', 'key2', 8011],
['key1-B','key2-A', 'key1', 8100],
['key1-B','key2-A', 'key2', 8101],
['key1-B','key2-B', 'key1', 8110],
['key1-B','key2-B', 'key2', 8111],
]
Now, suppose we want to access two specific rows and print their fourth value:
- The row that its first value is
'key1-A'
, its second value is'key2-B'
and its third value is'key2'
– that is; the value 1011 - The row that its first value is
'key1-B'
, its second value is'key2-A'
and its third value is'key1'
– that is; the value 1100
You probably end with something similar to the following code:
Tabular Data – A List From Rational Database Or CSV
for row in data:
if row[0] == 'key1-A':
if row[1] == 'key2-B':
if row[2] == 'key2':
print(row[3]) # print 8011
if row[0] == 'key1-B':
if row[1] == 'key2-A':
if row[2] == 'key1':
print(row[2]) # print 8100
However, if you first convert the data to a nested dictionary, you can access the data easily as the following code shows:
Tabular Data – A List From Rational Database Or CSV
>>> nested = toNested(data)
{ 'key1-A' : { 'key2-A' : { 'key1': 8000,
'key2': 8001 } } ,
'key1-A' : { 'key2-B' : { 'key1': 8010,
'key2': 8011 } } ,
'key1-B' : { 'key2-A' : { 'key1': 8100,
'key2': 8101 } } ,
'key1-B' : { 'key2-B' : { 'key1': 8110,
'key2': 8111 } } }
>>> print(nested['key1-A']['key2-B']['key2'])
8011
>>> print(nested['key1-B']['key2-A']['key1'])
8100
As we can see in the above example, when you use a nest dictionaries, You do not need to access the data with if-statements anymore, and your code is shorter and more transparent.
Now, Let’s see how to create the nest dictionaries,
Approach 1: Nested Dictionary With dict.setdefault
The dict
has a handy method to creating nested dictionaries on the fly. dict.setdefault(key,default)
will return the key
value if it exists in the dictionary; otherwise, it will insert the key
into the dictionary with the value default
. Thus, we can write the following straightforward function:
With dict.setdefault
def toNested1(data):
nested = {}
for a0,a1,a2,a3 in data:
nested.setdefault(a0, {}).setdefault(a1, {})[a2] = a3
return nested
Approach 2: Nested Dictionary With Optimized dict.setdefault
There is one limitation to approach 1. Every time we process a row in the data, we create a new dictionary even if the key is already in the dictionary. Let’s see how we can fix it by implementing our version of dict.setdefault
that will create a new dictionary only when the key is not the dictionary.
With Optimized dict.setdefault
def toNested2(data):
def addKeyDict(map,key):
if key not in map:
item = map[key] = {}
return item
return map[key]
nested = {}
for a0,a1,a2,a3 in data :
addKeyDict( addKeyDict( nested, a0) , a1 )[a2] = a3
return nested
When benchmarking those functions, we found that the second one is faster by ~10%. dict.setdefault benchmark. Here is benchmarking code to compare the 2 functions:
Benchmark Two Functions
if __name__ == '__main__':
def faster(z1,z2):
return round(100 * (z1-z2)/z1, 1);
import pprint
import timeit
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(toNested1(data))
pp.pprint(toNested2(data))
z1 = timeit.timeit("toNested1(data*10);", setup="from __main__ import toNested1,data")
z2 = timeit.timeit("toNested2(data*10);", setup="from __main__ import toNested2,data")
print(faster(z1,z2))
Approach 3: Nested Dictionary With collections.defaultdict
The collections
module provides specialized container datatypes providing alternatives to Python’s general purpose built-in dict. One of them is collections.defaultdict
.
The behavior of this class is similar to above approaches. When you evaluate the expression dictionary[key]
the result will be the value of key
if it exist in the dictionary, otherwise the function will call the function stored in default_factory
attribute, store its return value, and insert the key
into the dictionary with its value. The default_factory
attribute can be initialized in the defaultdict
constructor.
Using this method we can implement the following method: collections.defaultdict
With collections.defaultdict
def nested_dict():
return defaultdict(nested_dict)
def toNested3(data):
nested = nested_dict()
for a0,a1,a2,a3 in data:
nested[a0][a1][a2] = a3
return nested
When we benchmark toNested2 and this approach, we found that it faster by 55%. (and ~60% faster than the toNested1)
The Best Way To Create Nested Dictionary
We see 3 different methods to create it . After benchmarking, we found that the efficient way to create a nested dictionary is to collections.defaultdict – It is faster by more than 50% than the other methods.
Share with us if you find a better method.
Nested Dictionary FAQ
What Is A Nested Dictionary?

We say that a dictionary is a nested dictionary. If all the values of the dictionary are dictionaries. This type of dictionary allows you can easily access the desired data without unnecessary if-statements and make your intents more apparent to the reader of the code.
How To Create a Nested Dictionary?
What is the Best Way To Create a Nested Dictionary?
We find that collections.defaultdict is the best and the most efficient way to create a nested dictionary. It is faster by ~55% than any other method.