Predictor¶

`TensorPredictions`¶

class flat_bug.predictor.TensorPredictions(predictions: list[Prepared_Results] | None = None, image: Tensor | None = None, image_path: str | None = None, time: bool = False, **kwargs)¶

Result handling class for combining the results from multiple YOLOv8 detections at different scales into a single object.

TensorPredictions handles a rather complex merging procedure, resizing to remove image padding and scaling effects on the masks and boxes, and non-maximum suppression using mask-IoU or mask-IoS.

TensorPredictions also allows for easy conversion from mask to contours and back, plotting of the results, and (de-)serialization to save and load the results to/from disk.

contour_to_image_coordinates(contour: Tensor, scale: float = 1) → Tensor¶

Converts a contour from mask coordinates to image coordinates.

Parameters:

contour (torch.Tensor) – The contour to convert.
scale (float, optional) – The scale factor to apply to the contour. Defaults to 1.

Returns:

The contour in image coordinates.

Return type:

out (torch.Tensor)

property contours: List[Tensor]¶: This function wraps the openCV.findContours function, and uses openCV.contourArea to select the largest contour for each mask.

fix_boxes() → Self¶

This function simply sets the boxes to match the masks.

It is not strictly needed, but can be used as a sanity check to see if the boxes match the masks. The discrepancy between the boxes and the masks comes about by all the scaling and smoothing of the masks.

TODO: Should probably be removed.

flip(direction: str = 'vertical') → Self¶

Flips the masks, polygons and boxes along the specified axis.

Parameters:: direction (str, optional) – The axis to flip the masks, polygons and boxes along. Defaults to “vertical”. Should be one of “vertical”, “y”, “horizontal” or “x”.
Returns:: The TensorPredictions object with the masks, polygons and boxes flipped.
Return type:: out (Self)

Deserializes a TensorPredictions object from a .pt or .json file, or a dictionary. OBS: Mutates and returns the current object.

Parameters:

data (Union[str, dict]) – The path to the file to load or a dictionary with the deserialized json data.
device (Optional[DeviceLikeType], optional) – The device to load the data to. Defaults to None. If None, the device is set to “cpu”.
dtype (Optional[torch.types._dtype], optional) – The data type to load the data as. Defaults to None. If None, the data type is set to torch.float32.

Returns:

This object with the deserialized data.

Return type:

Self

non_max_suppression(iou_threshold: float, **kwargs) → Self¶: Simply wraps the nms_masks function from yolo_helpers.py, and removes the duplicates from the TensorPredictions object.

offset_scale_pad(offset: Tensor, scale: float, pad: int = 0) → Self¶

Since the image may be padded, the masks and boxes should be offset by the padding-width and scaled by the scale_before factor to match the original image size. Also pads the boxes by pad pixels to be safe.

Parameters:

offset (torch.Tensor) – A vector of length 2 containing the x and y offset of the image. Useful for removing image-padding effects.
scale (float) – The scale factor of the image.
pad (int, optional) – The number of pixels to pad the boxes by. Defaults to 0. (Not to be confused with image-padding, this is about expanding the boxes a bit to ensure they cover the entire mask)

Returns:

The TensorPredictions object with the masks, polygons and boxes offset, scaled and padded.

Return type:

out (Self)

plot(linewidth: int = 2, masks: bool = True, boxes: bool = True, confidence: bool = True, outpath: str | None = None, scale: float = 1, contour_color: Tuple[int, int, int] = (255, 0, 0), box_color: Tuple[int, int, int] = (0, 0, 0), alpha: float = 0.3)¶

Visualizes flatbug predictions from a TensorPredictions object.

Parameters:

linewidth (int, optional) – Linewidth of the segmentation countours and bounding boxes. Default to 2.
masks (bool, optional) – Flag to indicate whether segmentation contours should be included. Default to True.
boxes (bool, optional) – Flag to indicate whether bounding boxes should be included, if False confidences are also omitted. Defaults to True.
confidences (bool, optional) – Flag to indicate whether detection confidences should be included, if boxes is False, this argument is ignored. Defaults to True.
outpath (Optional[str], optional) – Where should the visualization be saved? If outpath is None, then the rasterized visualization is returned as a cv2.UMat/np.ndarray (shape: HWC, colors: BGR). Defaults to None.
scale (float, optional) – Render the visualization at a scale relative to the image size (from which the predictions originate). OBS: Large images and/or scales can be very slow to render. Defaults to 1.
contour_color (Tuple[int, int, int], optional) – RGB color ([0, 255]) to use for contour border and fill. Defaults to (255, 0, 0) (red).
box_color (Tuple[int, int, int], optional) – RGB color ([0, 255]) to use for bounding box and confidence text color. Defaults to (0, 0, 0) (black).
alpha (float, optional) – Transparency of the contour fill ([0, 1]). Defaults to 0.3.

Returns:

If outpath is supplied, it is returned, otherwise the rasterized visualization is returned as as a cv2.UMat/np.ndarray (shape: HWC, colors: BGR).

Return type:

out (Union[cv2.UMat, str])

Saves the serialized prediction results, crops, and overview to the given output directory.

TODO: Add the identifier to the names of the files, so that we can save multiple predictions for the same image or images with the same name.

Parameters:

output_directory (str) – The directory to save the prediction results to.
overview (bool | str, optional) – Whether to save the overview image. Defaults to True. If a string is given, it is interpreted as a path to a directory to save the overview image to.
crops (bool | str, optional) – Whether to save the crops. Defaults to True. If a string is given, it is interpreted as a path to a directory to save the crops to.
metadata (bool | str, optional) – Whether to save the metadata. Defaults to True. If a string is given, it is interpreted as a path to a directory to save the metadata to.
fast (bool, optional) – Whether to use the fast version of the overview image. Defaults to False. Saves the overview image at half the resolution.
mask_crops (bool, optional) – Whether to mask the crops. Defaults to False.
identifier (str | None, optional) – An identifier for the serialized data. Defaults to None.
basename (str | None, optional) – The base name of the image. Defaults to None. If None, the base name is extracted from the image path, which must be set in this case.

Returns:

The path to the directory containing the serialized data - the crops and overview image(s) are also saved here by default.: If the standard location is not used at all, the directory is not created and None is returned instead.

Return type:

str

serialize(outpath: str, save_json: bool = True, save_pt: bool = False, identifier: str = None) → None¶

This function serializes the TensorPredictions object to a .pt file and/or a .json file. The .pt file contains an exact copy of the TensorPredictions object, while the .json file contains the data in a more human-readable format, which can be deserialized into a TensorPredictions object using the ‘load’ function.

Parameters:

outpath (str, optional) – The path to save the serialized data to. Defaults to None.
save_json (bool, optional) – Whether to save the .json file. Defaults to True. Recommended.
save_pt (bool, optional) – Whether to save the .pt file. Defaults to False. Rather disk space wasteful.
identifier (str, optional) – An identifier for the serialized data. Defaults to None.

`Predictor`¶

BATCH_SIZE: int = None¶: The batch size to use for the prediction. This determines how many tiles are processed in parallel. Increasing this value may improve performance, but will also increase memory usage.

EDGE_CASE_MARGIN: int = None¶: The margin to add to the edge of the image to catch instances that are split between tiles. The margin is added to the edge of the image, such that instances on the true edge of the images are not removed.

EXPERIMENTAL_NMS_OPTIMIZATION: bool = None¶: Enables an experimental optimization for the NMS step. This optimization improves the performance of the NMS step when there are many instances in a large image and CUDA is available.

HYPERPARAMETERS: List[str] = ['SCORE_THRESHOLD', 'IOU_THRESHOLD', 'MINIMUM_TILE_OVERLAP', 'EDGE_CASE_MARGIN', 'MIN_MAX_OBJ_SIZE', 'MAX_MASK_SIZE', 'PREFER_POLYGONS', 'EXPERIMENTAL_NMS_OPTIMIZATION', 'TIME', 'TILE_SIZE', 'BATCH_SIZE']¶: The available hyperparameters for the predictor. These can be set using the set_hyperparameters class method.

IOU_THRESHOLD: float = None¶: The IOU threshold used to determine if two instances are duplicates.

MAX_MASK_SIZE: int = None¶: Defines the maximum size of the segmentation masks. Only applies if PREFER_POLYGONS is False.

MINIMUM_TILE_OVERLAP: int = None¶: The minimum - but not necessarily the maximum - overlap between tiles in a single layer of the pyramid. Increasing this value will increase the computation time, but may improve the detection of large instances.

MIN_MAX_OBJ_SIZE: Tuple[int, int] = None¶: Defines the minimum and maximum object size as seen in a single tile. Size is defined as the square root of the pixel area of the bounding box.

PREFER_POLYGONS: bool = None¶: Whether to prefer representing the instance segmentation using polygons instead of masks. This is a much more compact representation, but cannot represent complex shapes (like holes in the mask), only concave polygons.

SCORE_THRESHOLD: float = None¶: The score threshold for the predictions. TODO: This should be called CONFIDENCE_THRESHOLD.

TILE_SIZE: int = None¶: The size of the tiles to split the image into. This is defined by the model and should probably not be changed.

TIME: bool = None¶: Whether to time the different parts of the prediction process. Enabling this will print a verbose output of the timing of the different parts of the prediction process.

pyramid_predictions(image: Tensor | str, path: str | None = None, scale_increment: float = 0.6666666666666666, scale_before: float | int = 1, single_scale: bool = False) → TensorPredictions¶

Performs inference on an image at multiple scales and returns the predictions.

Parameters:

image (Union[torch.Tensor, str]) – The image to run inference on. If a string is given, the image is read from the path. If it is a torch.Tensor, the path must be provided. We assume that floating point images are in the range [0, 1] and integer images are in the range [0, integer_type_max]. (see https://github.com/pytorch/vision/blob/6d7851bd5e2bedc294e40e90532f0e375fcfee04/torchvision/transforms/_functional_tensor.py#L66)
path (Optional[str], optional) – The path to the image. Defaults to None. Must be provided if image is a torch.Tensor.
scale_increment (float, optional) – The scale increment to use when resizing the image. Defaults to 2/3.
scale_before (Union[float, int], optional) – The scale to apply before running inference. Defaults to 1.
single_scale (bool, optional) – Whether to run inference on a single scale. Defaults to False.

Returns:

The predictions for the image.

Return type:

TensorPredictions

set_hyperparameters(**kwargs) → Self¶

Mutably set the hyperparameters for the predictor.

Parameters:: **kwargs – The hyperparameters to set.
Returns:: This object (mutated with the new hyperparameters).
Return type:: Self