Python Tips: Generator unrolling

Generators are computed iterables, which only require a fraction of the space that would normally be necessary to store a fully populated collection in memory.

While this property is generally beneficial, they also bring along some ergonomic problems:

  • No index operator support
  • Cloning is problematic (see itertools.tee)
  • Results have to be stored in an auxiliary collection if caching is desired
  • Debugging misbehaving generators is non-trivial

The most common way to deal with these shortcomings is to just unroll the generator into a collection if the element count yielded by the generator is manageable. There are many ways to accomplish this, but all of them have different strengths and shortcomings.

Benchmark code
from timeit import timeit
from functools import partial
from array import array
from sys import getsizeof

RUNS = 1000
RANGE = 10000

for exprs, desc in (
    (lambda gen: list(gen), "List constructor"),
    (
        lambda gen: (lambda gen, list: [list.append(num) for num in gen])(gen, list()),
        "List append",
    ),
    (lambda gen: [num for num in gen], "List comprehension"),
    (lambda gen: tuple(gen), "Tuple constructor"),
    (lambda gen: set(gen), "Set constructor"),
    (lambda gen: frozenset(gen), "Frozenset constructor"),
    (lambda gen: {num: None for num in gen}, "Dictionary nonsense"),
    (lambda gen: array("h", gen), "Array constructor"),
    (
        lambda gen: (lambda gen, arr: arr.extend(gen) or arr)(gen, array("h")),
        "Array extension",
    ),
    (
        lambda gen: (lambda gen, arr: [arr.append(num) for num in gen])(
            gen, array("h")
        ),
        "Array append",
    ),
):
    bound_expr = partial(exprs, (lambda: range(RANGE))())
    print(f"{timeit(bound_expr, number=RUNS):.5f}s: {desc} ({getsizeof(bound_expr())})")
Method Description Speed (sec) Size (byte)
list(gen) List constructor 0.27175 90112
tuple(gen) Tuple constructor 0.35145 80048
frozenset(gen) Frozenset constructor 0.47581 524512
set(gen) Set constructor 0.47707 524512
[num for num in gen] List comprehension 0.52155 87624
{num: None for num in gen} Dictionary nonsense 0.74752 295008
array("h", gen) Array constructor (from array import array) 0.90895 20234
arr.extend(gen) Array extension 0.94037 20234
[list.append(num) for num in gen] List append 1.19855 87624
[arr.append(num) for num in gen] Array append 1.77882 87624

Conclusion

  • The list constructor is the definitive all-rounder and the best choice in most cases.
  • Typed arrays shine when memory usage is key.
  • The tuple constructor is about 20% slower than the list constructor, while taking up ~10% less space.
Attribution-NonCommercial 4.0 International (only applies to text, code license: MIT)