## The problem

Suppose you have tabular data which was retrieved from rational database or CSV file:

The tabular data
``````list = [
['fk1-A','fk2-A', 'key1', 10],
['fk1-A','fk2-A', 'key2', 20],
['fk1-B','fk2-B', 'key3', 30],
['fk1-B','fk2-B', 'key4', 40],
['fk1-B','fk2-A', 'key1', 20],
['fk1-B','fk2-A', 'key2', 40],
['fk1-B','fk2-B', 'key3', 40],
['fk1-B','fk2-B', 'key4', 40],
]
``````

You want convert this data to a nested dictionary so you can access the data in efficient way.

The nested dictionary
``````>>> nd = {
'fk1-A': { 'fk2-A': {'key1': 10, 'key2': 20},
'fk2-B': {'key3': 30, 'key4': 40}},
'fk1-B': { 'fk2-A': {'key1': 20, 'key2': 40},
'fk2-B': {'key3': 40, 'key4': 40}}
}
>>> print(nd['fk1-A']['fk2-B']['key3'])
30
>>> print(nd['fk1-B']['fk2-A']['key1'])
20
``````

## using dict.setdefault

The dict has very useful method for creating nested dictionaries on the fly. dict.setdefault(key,default) will return the value of `key` if it exists in the dictionary, otherwise it will insert the `key` into the dictionary with the value `default`. Thus using this method we can write a straightforward function to convert the data to nested dictionary.

dict.setdefault
``````def toNested1(data):
nd = {}
for a0,a1,a2,a3 in data:
nd.setdefault(a0, {}).setdefault(a1, {})[a2] = a3
return nd
``````

However there is one limitation to this approach. Every time we process a row in the data, we create a new 2 dictionary even if the key is already in dictionary. Let’s see how can we fix it:

dict.setdefault improvement
``````def toNested2(data):
if key not in map:
item = map[key] = {}
return item
return map[key]
nd = {}
for a0,a1,a2,a3 in data :
return nd
``````

When benchmark those functions we found that the second one is faster by ~10%.

dict.setdefault
``````if __name__ == '__main__':
def faster(z1,z2):
return round(100 * (z1-z2)/z1, 1);
import pprint
import timeit
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(toNested1(data))
pp.pprint(toNested2(data))
z1 = timeit.timeit("toNested1(data*10);", setup="from __main__ import toNested1,data")
z2 = timeit.timeit("toNested2(data*10);", setup="from __main__ import toNested2,data")
print(faster(z1,z2))
``````

## using collections.defaultdict

The collections module provides specialized container datatypes providing alternatives to Python’s general purpose built-in like dict. One of them is collections.defaultdict.

The behavior of this class is similar to above approaches. When you evaluate the expression `dictionary[key]` the result will be the value of `key` if it exists in the dictionary, otherwise the function will call the function stored in `default_factory` attribute and store the value, insert the `key` into the dictionary with this value and return it. The default_factory attribute can be initialized in the defaultdict constructor.

Using this method we can implement the following method:

collections.defaultdict
``````def nested_dict():
return defaultdict(nested_dict)
def toNested3(data):
nd = nested_dict()
for a0,a1,a2,a3 in data:
nd[a0][a1][a2] = a3
return nd
``````

When we benchmark toNested2 and this approach, we found that it faster by 55%.

## conclusions

Using `collections.defaultdict` is neat, readable and straightforward way to create nested dictionaries structure. Furthermore, it is faster by 55% of other methods