Low memory options

There are several reasons why you could run into memory issues while using thingsvision:

Running out of GPU memory

When extracting features on GPU, the default batch size might be too large for your GPU. Try reducing the batch size by setting the batch_size parameter in the DataLoader to a smaller value.

Alternatively, you can also run the extraction on CPU. This will be slower, but you can use a larger batch size. To do so, set the device parameter in the get_extractor function to 'cpu'.

Running out of RAM

As all features are stored in RAM while the extraction is running, you might run out of RAM if you extract features for a large number of images. To avoid this, you can instead write them directly to disk by setting the output_dir parameter in the extract_features function to a directory of your choice. This will write the features to disk as they are extracted, freeing up RAM. The step_size parameter can be used to specify how many batches are extracted before the features are written to disk. For the default, we set it so that it uses about 8GB of RAM.

Usage example:

# get extractor and dataloader 
extractor = ...
batches = ...

output_dir = '/path/to/output/directory'
extractor.extract_features(
    batches=batches,
    module_name=...,
    flatten_acts=True,
    output_dir=output_dir
) # returns None if output_dir is set

Running into MemoryError while storing features to disk

If you happen to extract activations for many images, which do fit into RAM, it is still possible to run into MemoryErrors when saving the extracted features to disk. To circumvent such problems, a helper function called split_activations will split the activation matrix into several batches, and stores them in separate files. For now, the split parameter is set to 10. Hence, the function will split the activation matrix into 10 files. This parameter can, however, easily be modified in case you need more (or fewer) splits. To merge the separate activation batches back into a single activation matrix, just call merge_activations when loading the activations (e.g., activations = merge_activations(PATH)).