Skip to content

Dictionaries

Lists let you look things up by position. But often you want to look something up by name. Not "give me item 3", but "give me the score for Alice". A dictionary stores data as key-value pairs: you look up a value by its key, not its position.

When a list's positional index is not meaningful, a dictionary is the right structure. Dicts map arbitrary keys to values, giving you named lookup in O(1) time. A leaderboard, a JSON response, a config file: all are naturally expressed as key-value mappings.

dict is a hash-table-backed key-value store with O(1) average lookup, insertion, and deletion. Keys must be hashable; values can be any object. Since Python 3.7, dicts preserve insertion order. dict is the foundation for Python namespaces, object __dict__ attributes, and keyword arguments.

Creating a dictionary

Curly braces with a colon between each key and value, and commas between pairs. Keys are almost always strings. Values can be anything: numbers, strings, other lists, even other dictionaries.

Dict literals use curly braces with key: value syntax. Keys can be any immutable (hashable) type: strings, integers, tuples. Values can be any Python object. Dicts preserve insertion order, so when you iterate, you get items in the order they were added.

Dict literals are evaluated left to right. Keys must be hashable: str, int, tuple work; list and dict do not. Values are unrestricted. Insertion order is guaranteed as of Python 3.7 (implemented as a compact hash table since 3.6). Duplicate keys in a literal silently use the last value.

python
player = {
    "name":  "Alice",
    "score": 87,
    "level": 5,
    "alive": True,
}

Accessing values

Use square brackets with the key to get the value. If the key does not exist, Python raises a KeyError. Use .get() when you are not sure a key is there: it returns None instead of crashing, or a default value you specify.

Square bracket access raises KeyError on a missing key. .get(key) returns None on a miss. .get(key, default) returns the default instead. Use .get() whenever the key's presence is uncertain; it is safer and more readable than wrapping access in a try/except.

d[key] calls __getitem__, which hashes the key and probes the table: O(1) average. On a miss it raises KeyError. .get(key, default=None) performs the same probe but returns the default on a miss instead of raising. The key in d check (which calls __contains__) is O(1) and is the idiomatic way to guard before access.

python
player = {"name": "Alice", "score": 87}

player["name"]    # "Alice"
player["score"]   # 87
player["lives"]   # KeyError (key doesn't exist)
python
player.get("score")          # 87
player.get("lives")          # None (no error, returns None by default)
player.get("lives", 3)       # 3   (use this default if key is absent)

.get() is safer whenever a key might be missing:

python
count = inventory.get("arrows", 0)   # 0 if "arrows" isn't in the dict

Adding and updating

Assign to a key with square brackets. If the key already exists, the value is replaced. If it does not exist yet, a new entry is created. Use .update() to merge an entire other dictionary in at once.

Assignment to a key calls __setitem__: O(1) average, creates or replaces. .update() accepts another dict or an iterable of key-value pairs and calls __setitem__ for each entry, overwriting existing keys.

d[key] = value calls __setitem__, which hashes the key and inserts or overwrites in the table: O(1) average. .update(other) is equivalent to repeated __setitem__ calls. The | operator (Python 3.9+) merges dicts without mutation and returns a new dict; |= mutates in place.

python
player = {"name": "Alice", "score": 87}

player["score"] = 92        # update existing
player["level"] = 5         # add new key
python
extras = {"level": 5, "alive": True}
player.update(extras)   # adds/overwrites with keys from extras

Removing items

Four ways to remove entries. .pop() removes a key and gives you the value back. .pop() with a default is safe when the key might not be there. del removes a key with no return value. .clear() empties the whole dictionary.

.pop(key) raises KeyError on a miss. .pop(key, default) returns the default instead, making it the safe removal method. del d[key] calls __delitem__ and raises KeyError on a miss. .clear() removes all entries but keeps the dict object itself.

.pop(key, default) is a single hash probe: O(1) average. del d[key] calls __delitem__, same probe, raises on miss. After removal, the hash table may shrink to free memory. .clear() resets the table size. Iterating a dict and mutating it in the same loop raises RuntimeError; build a list of keys to remove first.

python
player = {"name": "Alice", "score": 87, "level": 5}

player.pop("level")            # removes "level" and returns 5
player.pop("lives", None)      # safe pop, returns None if key absent
del player["score"]            # removes "score", no return value
player.clear()                 # removes everything

.pop() with a default is the safest way to remove a key that might not exist.

Iterating

Three views let you loop through different parts of a dictionary. Iterating just the dict gives you keys. .values() gives values. .items() gives both at once and is what you will use most: unpack each pair into two names for clean, readable loops.

.keys(), .values(), and .items() return view objects, not lists. Views reflect the dict's current state dynamically: if you modify the dict, the view updates immediately. .items() is the most useful for most loops because tuple unpacking for k, v in d.items() reads clearly.

.keys(), .values(), and .items() return dict_keys, dict_values, and dict_items view objects. Views are lazy: they do not copy data and update when the underlying dict changes. dict_keys supports set algebra (&, |, -) since keys are unique and hashable. Mutating a dict during iteration raises RuntimeError; use list(d.items()) to snapshot if needed.

python
player = {"name": "Alice", "score": 87, "level": 5}

for key in player:               # iterate keys (most common)
    print(key)

for key in player.keys():        # same, explicit keys view
    print(key)

for value in player.values():    # just the values
    print(value)

for key, value in player.items():   # both, most useful
    print(f"{key}: {value}")

.items() is what you will use most. Unpacking each pair into two names makes the loop readable.

Checking membership

in checks whether a key exists in the dictionary. It does not check values, only keys. To check whether something is not present, use not in.

in and not in call __contains__, which is O(1) for dicts. It checks keys only. To check values, you would use in d.values(), but that is O(n) since values are not indexed.

key in d calls dict.__contains__, which hashes the key and probes the table: O(1) average. value in d.values() iterates the values view: O(n). This asymmetry is a core reason to prefer dict keys for lookup over scanning values.

python
player = {"name": "Alice", "score": 87}

"name"  in player      # True
"lives" in player      # False
"lives" not in player  # True

in only checks keys. To check values, use in player.values(), though that is rarely needed.

Nested dictionaries

Values can be dictionaries themselves. This is how you represent structured data with multiple levels: a player that has a stats section, a config file with sub-sections. Two sets of square brackets access a nested value: the first picks the outer key, the second picks the inner key.

Nested dicts are dicts where the values are themselves dicts. Access with chained subscripts. Mutating an inner dict affects the outer dict because the outer dict holds a reference to the same object. Keep nesting shallow where possible: deep nesting quickly becomes hard to read and navigate.

Nested dicts store object references, not copies. Shallow copy of the outer dict (d.copy()) does not copy inner dicts; mutations to inner dicts are visible through both the original and the copy. For deeply nested structures, copy.deepcopy() creates fully independent copies. Chained __getitem__ calls are each O(1), so access depth has no asymptotic cost.

python
users = {
    "alice": {"score": 87, "level": 5},
    "bob":   {"score": 74, "level": 3},
}

users["alice"]["score"]   # 87
users["bob"]["level"]     # 3

Access with chained square brackets. For deeply nested structures, this can get unwieldy, so keep nesting shallow where you can.

setdefault

.setdefault() reads a key if it exists, or sets it to a default value if it does not, then returns the value. It is useful when you need a key to exist but do not want to overwrite it if it is already there.

.setdefault(key, default) is an atomic read-or-create: if the key exists, return its current value without changing anything; if it does not, insert the default and return it. The common use case is building up grouped structures without a separate existence check.

.setdefault(key, default) is a single hash probe: O(1) average. If the key is absent, default is inserted and returned. If present, the existing value is returned and default is ignored (never evaluated after the check). For the common "group items into lists" pattern, this is the standard alternative to checking key in d before appending.

python
inventory = {}

inventory.setdefault("arrows", 0)    # sets "arrows": 0, returns 0
inventory.setdefault("arrows", 10)   # "arrows" already exists, no change, returns 0

It is useful for building up grouped structures without checking for key existence first:

python
groups = {}

for name, team in players:
    groups.setdefault(team, []).append(name)

collections.defaultdict and Counter

The standard library has two dict subclasses that handle common patterns automatically. defaultdict creates a default value for missing keys so you never get a KeyError. Counter counts how often each item appears in a sequence and gives you the results as a dict.

defaultdict takes a callable that produces the default value for new keys, eliminating the need for .setdefault(). Counter is a specialised dict for frequency counting with a .most_common() method. Both are dict subclasses, so all standard dict operations work on them.

defaultdict.__missing__ calls the factory and stores the result, making it thread-safe for the common case. Counter subclasses dict and adds .most_common(n) (O(n log n) via heapq), .subtract(), and arithmetic operators for combining counts. Both are in collections; imports are covered in the Modules chapter.

collections import

defaultdict and Counter need importing from the standard library. Imports are covered in the Modules chapter.

python
from collections import defaultdict

groups = defaultdict(list)
for name, team in players:
    groups[team].append(name)   # no KeyError if team is new
python
from collections import Counter

words  = ["cat", "dog", "cat", "bird", "cat", "dog"]
counts = Counter(words)
# Counter({'cat': 3, 'dog': 2, 'bird': 1})

counts.most_common(2)   # [('cat', 3), ('dog', 2)]

Counter saves a lot of "count things in a loop" boilerplate.

In practice

Building a score tracker and printing a summary with all entries:

python
scores = {"Alice": 87, "Bob": 74, "Carol": 92, "Dave": 55}

total   = sum(scores.values())
average = total / len(scores)

print(f"Players:  {len(scores)}")
print(f"Average:  {average:.1f}")
print(f"Highest:  {max(scores.values())}")
print(f"Lowest:   {min(scores.values())}")
print()

for name, score in scores.items():
    print(f"  {name}: {score}")

Building a dict of per-file results in a loop, then summarising across all entries:

python
job_results = {}
files       = ["report_jan.csv", "report_feb.csv", "report_mar.csv"]

for filename in files:
    size = len(filename) * 100   # placeholder for real file size
    if size < 2000:
        status = "ok"
    else:
        status = "large"
    job_results[filename] = {"size": size, "status": status}

ok_count    = 0
large_count = 0

for result in job_results.values():
    if result["status"] == "ok":
        ok_count += 1
    else:
        large_count += 1

print(f"Processed {len(job_results)} file(s): {ok_count} ok, {large_count} large")

Validating a nested request dict by iterating required fields, then normalising a feature importance dict in place:

python
request = {
    "method":  "POST",
    "path":    "/users",
    "headers": {"Content-Type": "application/json"},
    "body":    {"username": "alice", "email": "[email protected]"},
}

body   = request["body"]
errors = []

for field in ["username", "email"]:
    if not body.get(field):
        errors.append(f"Missing required field: {field}")

if "email" in body and "@" not in body["email"]:
    errors.append("Invalid email format")

print(f"Method: {request['method']} {request['path']}")
if errors:
    print(f"Errors: {errors}")
else:
    print("Validation passed")

# Normalise feature importance values to sum to 1
feature_importance = {"age": 0.34, "income": 0.28, "region": 0.15, "purchases": 0.23}
total = sum(feature_importance.values())

for key in feature_importance:
    feature_importance[key] = round(feature_importance[key] / total, 3)

print(f"Normalised: {feature_importance}")