Available models and their sources
thingsvision
currently supports many models from several different sources, which represent different places or other libraries from which the model architectures or weights may come from. You can find more information about which models are available in which source on this page. Additionally, we provide several notes on their usage.
torchvision
thingsvision
supports all models from the torchvision.models
module. You can find a list of all available torchvision
models here.
Example:
import torch
from thingsvision import get_extractor
model_name = 'alexnet'
source = 'torchvision'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {'weights': 'DEFAULT'}
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters,
)
Model names are case-sensitive and must be spelled exactly as they are in the torchvision
documentation (e.g., alexnet
, resnet18
, vgg16
, …).
If you use pretrained=True
, the model weights will by default be pretrained on ImageNet, otherwise it is initialized randomly. For some models, torchvision
provides multiple weight initializations, in which case you can pass the name of the weights in the model_parameters
argument, e.g. if you want to get the extractor for a RegNet Y 32GF
model, pretrained using SWAG and finetuned on ImageNet, you want to do the following:
import torch
from thingsvision import get_extractor
model_name = 'regnet_y_32gf'
source = 'torchvision'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {'weights': 'IMAGENET1K_SWAG_LINEAR_V1'}
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters,
)
For a list of all available weights, please refer to the torchvision documentation.
timm
thingsvision
supports all models from the timm
module. You can find a list of all available timm
models here.
Example:
import torch
from thingsvision import get_extractor
model_name = 'tf_efficientnet_b0'
source = 'timm'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True
)
Model names are case-sensitive and must be spelled exactly as they are in the timm
documentation (e.g., tf_efficientnet_b0
, densenet121
, mixnet_l
, …).
If you use pretrained=True
, the model will be pretrained according to the model documentation, otherwise it is initialized randomly.
ssl
thingsvision
provides various Self-supervised learning models that are loaded from the VISSL library or the Torch Hub.
- SimCLR (
simclr-rn50
) - MoCov V2 (
mocov2-rn50
), - Jigsaw (
jigsaw-rn50
), - RotNet (
rotnet-rn50
) - SwAV (
swav-rn50
) - PIRL (
pirl-rn50
) - BarlowTwins (
barlowtwins-rn50
) - VicReg (
vicreg-rn50
) - DINO (
dino-rn50
)
All models have the ResNet50 architecture and are pretrained on ImageNet-1K. Here, the model name describes the pre-training objective rather than the model architecture.
DINO models are available in ViT (Vision Transformer) and XCiT (Cross-Covariance Image Transformer) variants. For ViT models trained using DINO, the following models are available: dino-vit-small-p8
, dino-vit-small-p16
, dino-vit-base-p8
, dino-vit-base-p16
, where the trailing number describes the image patch resolution in the ViT (i.e. either 8x8 or 16x16). For the XCiT models, we have dino-xcit-small-12-p16
, dino-xcit-small-12-p8
, dino-xcit-medium-24-p16
, dino-xcit-medium-24-p8
, where the penultimate number represents model depth (12 = small, 24 = medium).
Example SimCLR:
import torch
from thingsvision import get_extractor
model_name = 'simclr-rn50'
source = 'ssl'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True
)
Example DINO:
import torch
from thingsvision import get_extractor
model_name = 'dino-vit-base-p16'
source = 'ssl'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_paramters = {"token_extraction": "cls_token"} # extract DINO features exclusively for the [cls] token
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters,
)
Example MAE:
import torch
from thingsvision import get_extractor
model_name = 'mae-vit-large-p16'
source = 'ssl'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_paramters = {"token_extraction": "avg_pool"} # average-pool tokens before extracting the MAE features
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters,
)
keras
thingsvision
supports all models from the keras.applications
module. You can find a list of all available models here.
Example:
import torch
from thingsvision import get_extractor
model_name = 'VGG16'
source = 'keras'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True
)
Model names are case-sensitive and must be spelled exactly as they are in the keras.applications
documentation (e.g., VGG16
, ResNet50
, InceptionV3
, …).
If you use pretrained=True
, the model will be pretrained on ImageNet, otherwise it is initialized randomly.
custom
In addition, we provide several custom models - that are not available in other sources -, in the custom
source. These models are:
Official CLIP and OpenCLIP
We provide CLIP models from the official CLIP repo and from OpenCLIP. Available model_name
’s are:
clip
OpenClip
Both provide multiple model architectures and, in the case of OpenCLIP also different training datasets, which can both be specified using the model_parameters
argument. For example, if you want to get a ViT-B/32
model from the official CLIP repo (trained on WIT), you would do the following:
import torch
from thingsvision import get_extractor
model_name = 'clip'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
'variant': 'ViT-B/32'
}
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters
)
ViT-B/32
is the default model architecture, so you can also leave out the model_parameters
argument. For a list of all available architectures and datasets, please refer to the CLIP repo.
In the case of OpenCLIP
, you can also specify the dataset used for training for most models, e.g. if you want to get a ViT-B/32
model trained on the LAION-400M
dataset, you would do the following:
import torch
from thingsvision import get_extractor
model_name = 'OpenCLIP'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
'variant': 'ViT-B/32',
'dataset': 'laion400m_e32'
}
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters
)
For a list of all available architectures and datasets, please refer to the OpenCLIP repo.
DreamSim
In thingsvision
you can extract representations from DreamSim. See the official DreamSim repo for more information. To extract features, install the dreamsim
package with the following pip
command (ideally, into your thingsvision
environment):
$ pip install dreamsim==0.1.2
The base model name is:
DreamSim
We provide four DreamSim
models: clip_vitb32
, open_clip_vitb32
, dino_vitb16
, and a DreamSim ensemble
. Specify this using the model_parameters
argument. For instance, to get the OpenCLIP variant of DreamSim you want to do the following:
import torch
from thingsvision import get_extractor
model_name = 'DreamSim'
module_name = 'model.mlp'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
'variant': 'open_clip_vitb32'
}
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters
)
To load the CLIP ViT-B/32 version of DreamSim, pass 'clip_vitb32'
to the variant
parameter instead. Caution (!): for the DreamSim dino_vitb16
and ensemble
features can only be extracted from the model.mlp
module and not for the model
block. We are currently working on a version that allows feature extraction from the model
block. Please be patient until then.
Harmonization
If you want to extract features for harmonized models from the Harmonization repo, you have to run the following pip
command in your thingsvision
environment (FYI: as of now, this seems to be working smoothly only on Ubuntu but not on macOS),
$ pip install git+https://github.com/serre-lab/Harmonization.git
$ pip install keras-cv-attention-models>=1.3.5
The following models from here are available for feature extraction:
ViT_B16
ResNet50
VGG16
EfficientNetB0
tiny_ConvNeXT
tiny_MaxViT
LeViT_small
Example:
import torch
from thingsvision import get_extractor
model_name = 'Harmonization'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
'variant': 'ViT_B16'
}
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters
)
CORnet
We provide all CORnet models from this paper. Available model names are:
cornet-s
cornet-r
cornet-rt
cornet-z
Example:
import torch
from thingsvision import get_extractor
model_name = 'cornet-s'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True
)
Models trained on Ecoset
We also provide models trained on the Ecoset dataset, which contains 1.5m images from 565 categories selected to be both frequent in linguistic use and rated as concrete by human observers. Available model_name
s are:
Alexnet_ecoset
Resnet50_ecoset
VGG16_ecoset
Inception_ecoset
Example:
import torch
from thingsvision import get_extractor
model_name = 'Alexnet_ecoset'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True
)
Segment Anything
We provide all models from Segment Anything.
import torch
from thingsvision import get_extractor
model_name = 'SegmentAnything'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
'variant': 'vit_h' # also vit_l and vit_b
}
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
model_parameters=model_parameters
)
ALIGN model
We provide Kakaobrain’s reproduction of the original ALIGN model from huggingface.
import torch
from thingsvision import get_extractor
model_name = 'Kakaobrain_Align'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
extractor = get_extractor(
model_name=model_name,
source=source,
device=device,
pretrained=True,
)