echofilter.data package
Contents
echofilter.data package#
Dataset creation and manipulation.
Submodules#
echofilter.data.dataset module#
Convert echograms into Pytorch dataset.
Tools for converting a dataset of echograms (transects) into a Pytorch dataset and sampling from it.
- class echofilter.data.dataset.ConcatDataset(datasets: Iterable[torch.utils.data.dataset.Dataset])[source]#
Bases:
torch.utils.data.dataset.ConcatDatasetDataset as a concatenation of multiple TransectDatasets.
This class is useful to assemble different existing datasets.
- Parameters
datasets (sequence) – List of datasets to be concatenated.
Notes
A subclass of
torch.utils.data.ConcatDatasetwhich supports theinitialise_datapointsmethod.- datasets: List[torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]]#
- class echofilter.data.dataset.StratifiedRandomSampler(data_source)[source]#
Bases:
torch.utils.data.sampler.SamplerSample elements randomly without repetition, stratified across datasets.
- Parameters
data_source (torch.utils.data.ConcatDataset) – Dataset to sample from. Must possess a
cumulative_sizesattribute.
- property num_samples#
- class echofilter.data.dataset.TransectDataset(transect_paths, window_len=128, p_scale_window=0, window_sf=2, num_windows_per_transect=0, use_dynamic_offsets=True, crop_depth=None, transform=None, remove_nearfield=True, nearfield_distance=1.7, nearfield_visible_dist=0.0, remove_offset_turbulence=0, remove_offset_bottom=0)[source]#
Bases:
torch.utils.data.dataset.DatasetLoad a collection of transects as a PyTorch dataset.
- Parameters
transect_paths (list) – Absolute paths to transects.
window_len (int) – Width (number of timestamps) to load. Default is
128.p_scale_window (float, optional) – Probability of rescaling window. Default is
0, which results in no randomization of the window widths.window_sf (float, optional) – Maximum window scale factor. Scale factors will be log-uniformly sampled in the range
1/window_sftowindow_sf. Default is2.num_windows_per_transect (int) – Number of windows to extract for each transect. Start indices for the windows will be equally spaced across the total width of the transect. If this is
0, the number of windows will be inferred automatically based onwindow_lenand the total width of the transect, resulting in a different number of windows for each transect. Default is0.use_dynamic_offsets (bool) – Whether starting indices for each window should be randomly offset. Set to
Truefor training andFalsefor testing. Default isTrue.crop_depth (float) – Maximum depth to include, in metres. Deeper data will be cropped away. Default is
None.transform (callable) – Operations to perform to the dictionary containing a single sample. These are performed before generating the turbulence/bottom/overall mask. Default is
None.remove_nearfield (bool, optional) – Whether to remove turbulence and bottom lines affected by nearfield removal. If
True(default), targets for the line near to the sounder (bottom if upward facing, turbulence otherwise) which are closer than or equal to a distance ofnearfield_distancebecome reduced tonearfield_visible_dist.nearfield_distance (float, optional) – Nearfield distance in metres. Regions closer than the nearfield may have been masked out from the dataset, but their effect will be removed from the targets if
remove_nearfield=True. Default is1.7.nearfield_visible_dist (float, optional) – The distance at which the effect of being to close to the sounder is obvious to the naked eye, and hence the distance which nearfield will be mapped to if
remove_nearfield=True. Default is0.0.remove_offset_turbulence (float, optional) – Line offset built in to the turbulence line. If given, this will be removed from the samples within the dataset. Default is
0.remove_offset_bottom (float, optional) – Line offset built in to the bottom line. If given, this will be removed from the samples within the dataset. Default is
0.
- echofilter.data.dataset.fixup_dataset_sample(sample, remove_nearfield=True, nearfield_distance=1.7, nearfield_visible_dist=0.0, remove_offset_turbulence=0.0, remove_offset_bottom=0.0, crop_depth=None, transform=None)[source]#
Handle a dataset transect sample.
- Parameters
sample (dict) – Transect dictionary.
remove_nearfield (bool, default=True) – Whether to remove turbulence and bottom lines affected by nearfield removal. If
True(default), targets for the line near to the sounder (bottom if upward facing, turbulence otherwise) which are closer than or equal to a distance ofnearfield_distancebecome reduced tonearfield_visible_dist.nearfield_distance (float, default=1.7) – Nearfield distance in metres. Regions closer than the nearfield may have been masked out from the dataset, but their effect will be removed from the targets if
remove_nearfield=True.nearfield_visible_dist (float, default=0) – The distance at which the effect of being to close to the sounder is obvious to the naked eye, and hence the distance which nearfield will be mapped to if
remove_nearfield=True.remove_offset_turbulence (float, default=0) – Line offset built in to the turbulence line. If given, this will be removed from the samples within the dataset.
remove_offset_bottom (float, default=0) – Line offset built in to the bottom line. If given, this will be removed from the samples within the dataset.
crop_depth (float) – Maximum depth to include, in metres. Deeper data will be cropped away. Default is
None.transform (callable, optional) – Operations to perform to the dictionary containing a single sample. These are performed before generating the turbulence/bottom/overall mask.
- Returns
Like
sample, but contents fixed.- Return type
echofilter.data.transforms module#
Transformations and augmentations to be applied to echogram transects.
- class echofilter.data.transforms.ColorJitter(brightness=0, contrast=0)[source]#
Bases:
objectRandomly change the brightness and contrast of a normalized image.
Note that changes are made inplace.
- Parameters
brightness (float or tuple of float (min, max)) – How much to jitter brightness.
brightness_factoris chosen uniformly from[-brightness, brightness]or the given[min, max].brightness_factoris then added to the image.contrast (float or tuple of float (min, max)) – How much to jitter contrast.
contrast_factoris chosen uniformly from[max(0, 1 - contrast), 1 + contrast]or the given[min, max]. Should be non negative numbers.
- class echofilter.data.transforms.Normalize(center, deviation, robust2stdev=True)[source]#
Bases:
objectNormalize offset and scaling of image (mean and standard deviation).
Note that changes are made inplace.
- Parameters
center ({"mean", "median", "pc10"} or float) – If a float, a pre-computed centroid measure of the distribution of samples, such as the pixel mean. If a string, a method to use to determine the center value.
deviation ({"stdev", "mad", "iqr", "idr", "i7r"} or float) – If a float, a pre-computed deviation measure of the distribution of samples. If a string, a method to use to determine the deviation.
robust2stdev (bool, optional) – Whether to convert robust measures to estimates of the standard deviation. Default is
True.
- class echofilter.data.transforms.OptimalCropDepth[source]#
Bases:
objectA transform which crops a sample depthwise to focus on the water column.
The output contains only the space between highest surface and deepest seafloor line measurements.
- class echofilter.data.transforms.RandomCropDepth(p_crop_is_none=0.1, p_crop_is_optimal=0.1, p_crop_is_close=0.4, p_nearfield_side_crop=0.5, fraction_close=0.25)[source]#
Bases:
objectRandomly crop a sample depthwise.
- Parameters
p_crop_is_none (float, optional) – Probability of not doing any crop. Default is
0.1.p_crop_is_optimal (float, optional) – Probability of doing an “optimal” crop, running
optimal_crop_depth. Default is0.1.p_crop_is_close (float, optional) – Probability of doing crop which is zoomed in and close to the “optimal” crop, running
optimal_crop_depth. Default is0.4. If neither no crop, optimal, nor close-to-optimal crop is selected, the crop is randomly sized over the full extent of the range of depths.p_nearfield_side_crop (float, optional) – Probability that the nearfield side is cropped. Default is
0.5.fraction_close (float, optional) – Fraction by which crop is increased/decreased in either direction when doing a close to optimal crop. Default is
0.25.
- class echofilter.data.transforms.RandomCropWidth(max_crop_fraction)[source]#
Bases:
objectRandomly crop a sample in the width dimension.
- Parameters
max_crop_fraction (float) – Maximum amount of material to crop away, as a fraction of the total width. The
crop_fractionwill be sampled uniformly from the range[0, max_crop_fraction]. The crop is always centred.
- class echofilter.data.transforms.RandomElasticGrid(output_size, p=0.5, sigma=8.0, alpha=0.05, order=1)[source]#
Bases:
echofilter.data.transforms.RescaleResample data onto a new grid, elastically deformed from the original grid.
- Parameters
output_size (tuple or int or None) – Desired output size. If tuple, output is matched to output_size. If int, output is square. If
None, the size remains unchanged from the input.p (float, optional) – Probability of performing the RandomGrid operation. Default is
0.5.sigma (float, optional) – Gaussian filter kernel size. Default is
8.0.alpha (float, optional) – Maximum size of image distortions, relative to the length of the side of the image. Default is
0.05.order (int or None, optional) –
Order of the interpolation, for both image and vector elements. For images-like components, the interpolation is 2d. The following values are supported:
0: Nearest-neighbor
1: Linear (default)
2: Quadratic
3: Cubic
If
None, the order is randomly selected from the set{1, 2, 3}.
- class echofilter.data.transforms.RandomGridSampling(*args, p=0.5, **kwargs)[source]#
Bases:
echofilter.data.transforms.RescaleResample data onto a new grid, which is randomly resampled.
- Parameters
output_size (tuple or int) – Desired output size. If tuple, output is matched to output_size. If int, output is square.
p (float, optional) – Probability of performing the RandomGrid operation. Default is
0.5.order (int or None, optional) –
Order of the interpolation, for both image and vector elements. For images-like components, the interpolation is 2d. The following values are supported:
0: Nearest-neighbor
1: Linear (default)
2: Quadratic
3: Cubic
If
None, the order is randomly selected from the set{0, 1, 3}.
- class echofilter.data.transforms.RandomReflection(axis=0, p=0.5)[source]#
Bases:
objectRandomly reflect a sample.
- class echofilter.data.transforms.ReplaceNan(nan_val=0.0)[source]#
Bases:
objectReplace NaNs with a finite float value.
- Parameters
nan_val (float, optional) – Value to replace NaNs with. Default is
0.0.
- class echofilter.data.transforms.Rescale(output_size, order=1)[source]#
Bases:
objectRescale the image(s) in a sample to a given size.
- Parameters
output_size (tuple or int) – Desired output size. If tuple, output is matched to output_size. If int, output is square.
order (int or None, optional) –
Order of the interpolation, for both image and vector elements. For images-like components, the interpolation is 2d. The following values are supported:
0: Nearest-neighbor
1: Linear (default)
2: Quadratic
3: Cubic
If
None, the order is randomly selected as either0or1.
- order2kind = {0: 'nearest', 1: 'linear', 2: 'quadratic', 3: 'cubic'}#
echofilter.data.utils module#
Utility functions for dataset.
- echofilter.data.utils.worker_seed_fn(worker_id)[source]#
Seed builtin
randomandnumpywithtorch.randint().A worker initialization function for
torch.utils.data.DataLoaderobjects which seeds builtinrandomandnumpywithtorch.randint()(which is stable if torch is manually seeded in the main program).- Parameters
worker_id (int) – The ID of the worker.
- echofilter.data.utils.worker_staticseed_fn(worker_id)[source]#
Seed builtin
random,numpy, andtorchwithworker_id.A worker initialization function for
torch.utils.data.DataLoaderobjects which produces the same seed for builtinrandom,numpy, andtorchevery time, so it is the same for every epoch.- Parameters
worker_id (int) – The ID of the worker.