Create Nested Dictionaries

The problem

Suppose you have tabular data can be retrieve from rational database or CSV file:

The tabular data
`````` list = [
['fk1-A','fk2-A', 'key1', 10],
['fk1-A','fk2-A', 'key2', 20],
['fk1-B','fk2-B', 'key3', 30],
['fk1-B','fk2-B', 'key4', 40],
['fk1-B','fk2-A', 'key1', 20],
['fk1-B','fk2-A', 'key2', 40],
['fk1-B','fk2-B', 'key3', 40],
['fk1-B','fk2-B', 'key4', 40],
]
``````

You want to convert this data to a nested dictionary so you can access the data in an efficient way.

The nested dictionary
`````` >>> nd = {
'fk1-A': { 'fk2-A': {'key1': 10, 'key2': 20},
'fk2-B': {'key3': 30, 'key4': 40}},
'fk1-B': { 'fk2-A': {'key1': 20, 'key2': 40},
'fk2-B': {'key3': 40, 'key4': 40}}
}
>>> print(nd['fk1-A']['fk2-B']['key3'])
30
>>> print(nd['fk1-B']['fk2-A']['key1'])
20
``````

using dict.setdefault

The dict has very useful method to creating nested dictionaries on the fly. dict.setdefault(key,default) will return the value of `key` if it exist in the dictionary, otherwise it will insert the `key` into the dictionary with the value `default`. Thus using this method we can write a straightforward function to convert the data to nested dictionary.

dict.setdefault
`````` def toNested1(data):
nd = {}
for a0,a1,a2,a3 in data:
nd.setdefault(a0, {}).setdefault(a1, {})[a2] = a3
return nd
``````

However there is one limitation to this approach. Every time we process a row in the data, we create a new 2 dictionary even if the is already in the dictionary. Let’s see how can we fix it:

dict.setdefault improvement
`````` def toNested2(data):
if key not in map:
item = map[key] = {}
return item
return map[key]
nd = {}
for a0,a1,a2,a3 in data :
return nd
``````

When benchmarking those function we found that the second one is faster by ~10%.

dict.setdefault benchmark
`````` if __name__ == '__main__':
def faster(z1,z2):
return round(100 * (z1-z2)/z1, 1);
import pprint
import timeit
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(toNested1(data))
pp.pprint(toNested2(data))
z1 = timeit.timeit("toNested1(data*10);", setup="from __main__ import toNested1,data")
z2 = timeit.timeit("toNested2(data*10);", setup="from __main__ import toNested2,data")
print(faster(z1,z2))
``````

using collections.defaultdict

The collections module provides specialized container datatypes providing alternatives to Python’s general purpose built-in dict. One of them is collections.defaultdict.

The behavior of this class is similar to above approaches. When you evaluate the expression `dictionary[key]` the result will be the value of `key` if it exist in the dictionary, otherwise the function will call the function stored in `default_factory` attribute, store its return value, and insert the `key` into the dictionary with its value. The `default_factory` attribute can be initialized in the defaultdict constructor.

Using this method we can implement the following method:

collections.defaultdict
`````` def nested_dict():
return defaultdict(nested_dict)
def toNested3(data):
nd = nested_dict()
for a0,a1,a2,a3 in data:
nd[a0][a1][a2] = a3
return nd
``````

When we benchmark toNested2 and this approach, we found that it faster by 55%.

This is more neat, readable and straightforward to create the nested dictionaries structure.

Conclusions

Using collections.defaultdict is neat, readable and straightforward way to nested dictionaries structure. Furthermore, it is faster by 55% of other methods