pynif3d.models

class pynif3d.models.ConvolutionalOccupancyNetworksModel(input_channels=3, output_channels=1, block_depth=5, block_channels=256, linear_channels=128, is_linear_active=True, encoding_fn=None)

Bases: torch.nn.modules.module.Module

The model for Convolutional Occupancy Networks (CON) as described in: https://arxiv.org/abs/2003.04618

Note

This implementation is based on the original one, which can be found at: https://github.com/autonomousvision/convolutional_occupancy_networks

This class is the neural implicit function (NIF) model of CON. It takes the query points as input, along with optional plane or grid features at query locations and outputs the occupancy probability of the input point. If encoding_fn is provided, the input points will be processed with encoding_fn before being supplied to model.

Usage:

model = ConvolutionalOccupancyNetworksModel()
occupancies = model(query_points, query_features)
Parameters
  • input_channels (int) – The input layer’s channel size. If encoding_fn is provided, this value will be overridden by the encoding_fn.get_dimensions() function. Default is 3.

  • output_channels (int) – The output channel size. Default is 1.

  • block_depth (int) – The number of resnet blocks connected sequentially. Default is 5.

  • block_channels (int) – The channel size of each Fully-Connected ResNet block. Default is 256.

  • linear_channels (int) – The channel size for the linear layers that bind plane features to Fully-Connected ResNet blocks. This value shall be equal to the input feature’s channel dimensions. Default is 128.

  • is_linear_active (bool) – Boolean flag indicating whether linear layers are enabled for the plane features. If True, query_features shall be provided during inference. Default is True.

  • encoding_fn – The function instance that is called in order to apply encoding to input point coordinates. It has to contain callable get_dimensions property which returns the resulting dimensions. Default is None.

forward(query_points, query_features=None)
Parameters
  • query_points (torch.Tensor) – The points to provide as input to the network. Its shape is (batch_size, n_points, input_channels).

  • query_features (torch.Tensor) – The plane or grid features related to query_points locations. Its shape is (batch_size, n_points, linear_channels). Optional.

Returns

Tensor which holds the occupancy probabilities of query locations. Its shape is (batch_size, n_points).

Return type

torch.Tensor

training: bool
class pynif3d.models.IDRNIFModel(input_channels=3, output_channels=257, base_network_depth=8, base_network_channels=512, skip_layers=None, encoding_fn=None, is_encoding_active=True, normalize_weights=True, geometric_init=True, **kwargs)

Bases: torch.nn.modules.module.Module

The multi-layer MLP model for NIF representation. If provided, it applies positional encoding to the inputs and overrides the input channel information accordingly.

Note

Please check the paper for more information: https://arxiv.org/abs/2003.09852

Usage:

model = IDRNIFModel()
pred_dict = model(points)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(points)
Parameters

points (torch.Tensor) – Tensor containing the points that are processed. Its shape is (batch_size, n_rays, 3) or (n_rays, 3).

Returns

Dictionary containing the computed SDF values (as torch.Tensor of shape (*points.shape[:-1])) and feature vectors (as torch.Tensor of shape (*points.shape[:-1], output_channels - 1)).

Return type

dict

init_geometric(size_in, size_out, layer_index, n_layers)
training: bool
class pynif3d.models.IDRRenderingModel(input_channel_points=3, input_channel_view_dirs=3, input_channel_normals=3, input_channel_features=256, output_channels=3, base_network_depth=4, base_network_channels=512, is_use_view_directions=True, is_use_normals=True, encoding_viewdir_fn=None, is_input_encoding_active=True)

Bases: pynif3d.models.nerf_model.NeRFModel

The multi-layer MLP model for IDR rendering. If provided, it applies positional encoding to the view directions and overrides the input channel information accordingly.

Note

Please check the paper for more information: https://arxiv.org/abs/2003.09852

Usage:

model = IDRRenderingModel()
rgb_values = model(points, features, normals, view_dirs)
Parameters
  • input_channels (int) – The input channel dimension to the model. If positional encoding is used, this value will be overridden. Default is 3 (XYZ).

  • input_channel_view_dirs (int) – The input channel dimension for viewing directions. If positional encoding is used, this value will be overridden. Default is 3 (XYZ).

  • input_channel_normals (int) – The input channel dimension for surface normals. Default is 3 (XYZ).

  • input_channel_features (int) – The input channel dimension for the extracted features. Default value is 256.

  • output_channels (int) – The output channel dimension. Default is 4 (RGBA).

  • base_network_depth (int) – The depth of the network MLP layers. One linear layer will be added to the base network for each increment. Default is 4.

  • base_network_channels (int) – The output dimension of each inner linear layers of the MLP model. A positive integer value is expected. Default is 512.

  • is_use_view_directions (bool) – Boolean flag indicating whether to use view direction (True) or not (False). If True, the view direction block will be added on top of the base MLP layers. Default is True.

  • is_use_normals (bool) – Boolean flag indicating whether to use surface normals (True) or not (False). Default is True.

  • encoding_viewdir_fn – The function that is called in order to apply encoding to the view directions input. Default is PositionalEncoding().

  • is_input_encoding_active (bool) – Boolean flag indicating whether encoding shall be applied to both the base network input and view directions. Default is True.

forward(points, features, normals=None, view_dirs=None)
Parameters
  • points (torch.Tensor) – Tensor containing the points that are processed. Its shape is (batch_size, n_rays, input_channel_points) or (n_rays, input_channel_points).

  • features (torch.Tensor) – Tensor containing the features that are processed. Its shape is (batch_size, n_rays, input_channel_features) or (n_rays, input_channel_features).

  • normals (torch.Tensor) – (Optional) Tensor containing the normals that are processed. Its shape is (batch_size, n_rays, input_channel_normals) or (n_rays, input_channel_normals).

  • view_dirs (torch.Tensor) – (Optional) Tensor containing the view directions that are processed. Its shape is (batch_size, n_rays, input_channel_view_dirs) or (n_rays, input_channel_view_dirs).

Returns

Tensor containing the rendered RGB values. Its shape is (*points.shape[:-1], 3).

Return type

torch.Tensor

training: bool
class pynif3d.models.NeRFModel(input_channels=3, input_channel_view_dirs=3, output_channels=4, base_network_depth=8, base_network_channels=256, skip_layers=None, is_use_view_directions=True, view_dir_network_depth=1, view_dir_network_channels=256, encoding_fn=None, encoding_viewdir_fn=None, is_input_encoding_active=True, init_kwargs=None, normalize_weights=False)

Bases: torch.nn.modules.module.Module

The multi-layer MLP model for NeRF rendering. If provided, it applies positional encoding to the inputs overrides the input channel information accordingly. It can also integrate view direction information into the network.

Usage:

model = NeRF()
prediction = model(points, view_dirs)
Parameters
  • input_channels (int) – The input channel dimension to the model. If positional encoding is used, this value will be overridden. Default is 3 (XYZ).

  • input_channel_view_dirs (int) – The input channel dimension for viewing directions. If positional encoding is used, this value will be overridden. Default is 3 (XYZ).

  • output_channels (int) – The output channel dimension. Default is 4 (RGBA).

  • base_network_depth (int) – The depth of the network MLP layers. One linear layer will be added to the base network for each increment. Default is 8.

  • base_network_channels (int) – The output dimension of each inner linear layers of the MLP model. A positive integer value is expected. Default is 256.

  • skip_layers (Iterable) – The layers to add skip connection. It shall be an iterable of positive integers. Values larger than network_depth will be discarded. Default is [4,].

  • is_use_view_directions (bool) – Boolean flag indicating whether to use view direction (True) or not (False). If True, the view direction block will be added on top of the base MLP layers. Default is True.

  • view_dir_network_depth (int) – The depth of the network that processes view directions. One linear layer for processing view direction will be added to the network for each increment. Default value is 1.

  • view_dir_network_channels (int) – The output dimension of each inner linear layers of the MLP model which processes view directions. A positive integer is expected. Default value is 256.

  • encoding_fn (torch.nn.Module) – The function that is called in order to apply encoding to the NIF model input. Default is PositionalEncoding().

  • encoding_viewdir_fn (torch.nn.Module) – The function that is called in order to apply encoding to the view directions input. Default is PositionalEncoding().

  • is_input_encoding_active (bool) – Boolean flag indicating whether encoding shall be applied to both the base network input and view directions. Default is True.

  • init_kwargs (dict) – Dictionary containing the initialization parameters for the linear layers.

  • normalize_weights (bool) – Boolean flag indicating whether to normalize the linear layer’s weights (True) or not (False).

forward(query_points, view_dirs=None)
Parameters
  • query_points (torch.Tensor) – Tensor containing the points to that are queried. Its shape is (number_of_rays, number_of_points_per_ray, point_dims).

  • view_dirs (torch.Tensor) – (Optional) Tensor containing the view directions. Its shape is (number_of_rays, number_of_points_per_ray, point_dims).

Returns

Tensor containing the prediction result of the model. Its shape is (n_samples, n_rays, output_channel).

Return type

torch.Tensor

training: bool
class pynif3d.models.PixelNeRFNIFModel(input_channel_points: int = 3, input_channel_view_dirs: int = 3, output_channels: int = 4, hidden_channels: int = 128, is_use_view_directions: bool = True, is_point_encoding_active: bool = True, is_view_encoding_active: bool = False, encoding_fn: torch.nn.modules.module.Module = None, encoding_viewdir_fn: torch.nn.modules.module.Module = None, n_resnet_blocks: int = 5, reduce_block_index: int = 3, activation_fn: torch.nn.modules.module.Module = None, init_kwargs: dict = None)

Bases: torch.nn.modules.module.Module

The multi-layer MLP model for PixelNeRF rendering. If provided, it applies positional encoding to the inputs overrides the input channel information accordingly. It can also integrate view direction information into the network.

Parameters
  • input_channel_points (int) – The input channel dimension to the model. If positional encoding is used, this value will be overridden. Default is 3 (XYZ).

  • input_channel_view_dirs (int) – The input channel dimension for viewing directions. If positional encoding is used, this value will be overridden. Default is 3 (XYZ).

  • output_channels (int) – The output channel dimension. Default is 4 (RGBA).

  • hidden_channels (int) – The number of hidden channels contained within each ResNetBlockFC block. Default is 128.

  • is_use_view_directions (bool) – Boolean flag indicating whether to use view direction (True) or not (False). If True, the view direction block will be added on top of the base MLP layers. Default is True.

  • is_point_encoding_active (bool) – Boolean flag indicating whether encoding shall be applied to the input points. Default is True.

  • is_view_encoding_active (bool) – Boolean flag indicating whether encoding shall be applied to the viewing directions. Default is False.

  • encoding_fn (torch.nn.Module) – The function that is called in order to apply encoding to the NIF model input. Default is PositionalEncoding().

  • encoding_viewdir_fn (torch.nn.Module) – The function that is called in order to apply encoding to the view directions input. Default is PositionalEncoding().

  • n_resnet_blocks (int) – The number of ResNetBlockFC blocks that are contained within the base network. Default value is 5.

  • reduce_block_index (int) – The index of the ResNetBlockFC block at which the reduce operation is going to be applied (along the dimension that is related to the number of objects). Default value is 3.

  • activation_fn (torch.nn.Module) – The activation function. Default is ReLU.

  • init_kwargs (dict) – Dictionary containing the initialization parameters for the linear layers.

forward(ray_points: torch.Tensor, camera_poses: torch.Tensor, features: torch.Tensor, view_dirs: Optional[torch.Tensor] = None, **kwargs: dict) torch.Tensor
Parameters
  • ray_points (torch.Tensor) – Tensor containing the ray points to that are going to be processed. Its shape is (n_ray_samples, input_channel_points).

  • camera_poses (torch.Tensor) – Tensor containing the camera poses. Its shape is (n_objects, 3, 4).

  • features (torch.Tensor) – (Optional) Tensor containing the feature vectors. Its shape is (n_views, n_ray_samples, feature_size).

  • view_dirs (torch.Tensor) – (Optional) Tensor containing the view directions. Its shape is (n_ray_samples, input_channel_points).

  • kwargs (dict) –

    • reduce (str): The reduce operation that is applied to the ResNetBlockFC block at reduce_block_index. Currently supported options are “average” and “max”.

Returns

Tensor containing the prediction result of the model. Its

shape is (n_ray_samples, output_channel).

Return type

torch.Tensor

training: bool
class pynif3d.models.PointNet_LocalPool(input_channels=3, point_network_depth=5, point_feature_channels=128, scatter_type='max', feature_grids=None, feature_processing_fn=None, feature_grid_resolution=32, feature_grid_channels=128, padding=0.1, encoding_fn=None)

Bases: torch.nn.modules.module.Module

The point encoder model for Convolutional Occupancy Networks (CON) as described in: https://arxiv.org/abs/2003.04618

Note

This implementation is based on the original one, which can be found at: https://github.com/autonomousvision/convolutional_occupancy_networks

PointNet-based encoder network with ResNet blocks for each point. It takes the input points, applies a variation of PointNet and projects each point to defined plane(s). It returns the plane features. The number of input points is fixed.

Usage:

plane = "xz"
plane_resolution = 256
feature_channels = 32

model = PointNet_LocalPool(
    feature_grids=[plane],
    feature_grid_resolution=plane_resolution,
    feature_grid_channels=feature_channels,
)

features = model(points)
Parameters
  • input_channels (int) – The input layer’s channel size. If encoding_fn is provided, this value will be overridden by encoding_fn.get_dimensions(). Default is 3.

  • point_network_depth (int) – The number of resnet blocks that are connected sequentially. Default is 5.

  • point_feature_channels (int) – The channel size of each Fully-Connected ResNet blocks. Default is 128.

  • scatter_type (str) – The type of the scattering operation. Options are (“mean”, “max”). Default is “max”.

  • feature_grids (iterable) – The iterable object to define the planes of the points to be projected. The options are (“xy”, “yz”, “xz”, “grid”). “grid” cannot be used in combination with other options. Default is [“xz”].

  • feature_processing_fn (instance) – (Optional) The model that processes point features projected to 2D planes. The instance of the pre-initialized model has to be provided. If not provided, the 2D plane processing step will be skipped.

  • feature_grid_resolution (int) – The resolution of the 2D planes that the points are projected to. It has to be a positive integer. Only square planes are supported for now. Default is 32.

  • feature_grid_channels (int) – The channel size of the 2D plane features. It has to be same as the input channel dimensions of the model provided through plane_processing_fn. Default is 128.

  • padding (float) – Padding variable used during the normalization operations. Assign to 0 to cancel any padding. Default is 0.1.

  • encoding_fn (instance) – The function instance that applies encoding to input point coordinates. It has to contain the callable get_dimensions property which returns the resulting dimensions. Default is None.

forward(input_points)
Parameters

input_points (torch.Tensor) – Tensor containing the points that are provided as input to the network. Its shape is (batch_size, n_points, input_channels).

Returns

Dictionary containing the tensors for all the features of the planes.

Return type

dict

generate_coordinate_features(p, c, feature_grid='xz')

Scatters the given features (c) based on given coordinates (p) by using the grid resolution. This is the orthographic point-to-plane projection function.

Parameters
  • p (torch.Tensor) – Tensor containing the locations of the points. Its shape is (batch_size, number_of_points, 3).

  • c (torch.Tensor) – Tensor containing the point features. Its shape is (batch_size, number_of_points, feature_dimensions).

Returns

Tensor containing the scattered features of the points. Its shape is (batch_size, feature_dimensions, grid_resolution, grid_resolution).

Return type

torch.Tensor

pool_local(keys, indices, features)

Applies the max pooling operation to the point features, based on the grid resolution. After pooling, the points within the same pooling region are to the same features.

Parameters
  • keys (list) – List containing the plane IDs. It is expected that such indices exist in indices.

  • indices (dict) – Dictionary containing (plane_id, point_indices) pairs mapping each point’s 1D index to a 2D plane. point_indices is a torch.Tensor with shape (batch_size, 1, point_count).

  • features (torch.Tensor) – Tensor containing the point features. Its shape is (batch_size, point_count, feature_size).

Returns

Tensor with shape (batch_size, point_count, feature_size).

Return type

torch.Tensor

training: bool
class pynif3d.models.ResnetBlockFC(size_in: int, size_out: int, size_inner: int, activation_fn: torch.nn.modules.module.Module = None, init_fc_0_kwargs: dict = None, init_fc_1_kwargs: dict = None, init_fc_s_kwargs: dict = None)

Bases: torch.nn.modules.module.Module

Implementation of the fully-connected ResNet block for Convolutional Occupancy Networks (CON), as described in: https://arxiv.org/abs/2003.04618

Note

This implementation is based on the original one, which can be found at: https://github.com/autonomousvision/convolutional_occupancy_networks

It replaces convolutional layers of vanilla ResNet blocks with linear layers.

Usage:

input_channels = 32
hidden_channels = 32
output_channels = 32

model = ResnetBlockFC(input_channels, output_channels, hidden_channels)
features = model(x)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)
Parameters

x (torch.Tensor) – Tensor with shape (batch_size, n_points, size_in).

Returns

Tensor with shape (batch_size, n_points, size_out).

Return type

torch.Tensor

training: bool
class pynif3d.models.SpatialEncoder(backbone_fn: torch.nn.modules.module.Module = None, backbone_fn_kwargs: dict = None, n_layers: int = 4, pretrained: bool = True)

Bases: torch.nn.modules.module.Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(images, **kwargs)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class pynif3d.models.UNet(output_channels, input_channels=3, network_depth=5, first_layer_channels=64, upconv_mode='transpose', merge_mode='concat', **kwargs)

Bases: torch.nn.modules.module.Module

UNet class for Convolutional Occupancy Networks (CON), as described in: https://arxiv.org/abs/2003.04618

Note

This implementation is based on the original one, which can be found at: https://github.com/autonomousvision/convolutional_occupancy_networks

The U-Net is a convolutional encoder-decoder neural network. Contextual spatial information (from the decoding, expansive pathway) related to the input tensor is merged with information representing the localization of details (from the encoding, compressive pathway).

Usage:

input_channels = 3
output_channels = 1

model = UNet(output_channels, input_channels)
h = model(h)
Parameters
  • output_channels (int) – The number of channels for the output tensor.

  • input_channels (int) – The number of channels in the input tensor. Default is 3.

  • network_depth (int) – The number of convolution blocks. Default is 5.

  • first_layer_channels (int) – The number of convolutional filters for the first convolution. For each depth level, the channel size is multiplied by 2.

  • up_mode (str) – The type of upconvolution (“transpose”, “upsample”).

  • merge_mode (str) – The type of the merge operation (“concat”, “add”).

  • kwargs (dict) –

    • w_init_fn: Callback function for parameter initialization. Default is xavier_normal.

    • w_init_fn_args: The arguments to pass to the w_init_fn function. Optional.

    • b_init_fn: Callback function for bias initialization. Default is constant 0.

    • b_init_fn_args: The arguments to pass the b_init_fn function. Optional.

forward(h)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class pynif3d.models.UNet3D(output_channels, input_channels, final_sigmoid=True, feature_maps=64, num_groups=8, num_levels=4, encoder_pool_type='max', is_segmentation=False, testing=False)

Bases: torch.nn.modules.module.Module

3D Unet class. It applies encoder and decoder as 3D U-Net.

Usage:

input_channels = 16
output_channels = 32

model = UNet3D(output_channels, input_channels)
features = model(x)
Parameters
  • output_channels (int) – number of output segmentation masks; note that the value of out_channels might correspond to either different semantic classes or to different binary segmentation mask. It’s up to the user of the class to interpret the out_channels and use the proper loss criterion during training (i.e. CrossEntropyLoss (multi-class) or BCEWithLogitsLoss (two-class) respectively).

  • input_channels (int) – The number of input channels

  • final_sigmoid (bool) – if True apply element-wise nn. Sigmoid after the final 1x1 convolution, otherwise apply nn.Softmax. MUST be True if nn.BCELoss (two-class) is used to train the model. MUST be False if nn.CrossEntropyLoss (multi-class) is used to train the model.

  • feature_maps (int, tuple) – if int: number of feature maps in the first conv layer of the encoder (default: 64); if tuple: number of feature maps at each level

  • num_groups (int) – number of groups for the GroupNorm

  • num_levels (int) – number of levels in the encoder/decoder path (applied only if f_maps is an int)

  • is_segmentation (bool) – if True (semantic segmentation problem) Sigmoid/Softmax normalization is applied after the final convolution; if False (regression problem) the normalization layer is skipped at the end

  • testing (bool) – if True (testing mode) the final_activation (if present, i.e. is_segmentation=true) will be applied as the last operation during the forward pass; if False the model is in training mode and the final_activation (even if present) won’t be applied; default: False

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool