echofilter package#

Subpackages#

Submodules#

echofilter.ev2csv module#

Export raw EV files in CSV format.

echofilter.ev2csv.ev2csv(input, destination, variable_name='Fileset1: Sv pings T1', export_raw=True, ev_app=None, verbose=0)[source]#

Export a single EV file to CSV.

Parameters
  • input (str) – Path to input file.

  • destination (str) – Filename of output destination.

  • variable_name (str, optional) – Name of the Echoview acoustic variable to export. Default is “Fileset1: Sv pings T1”.

  • export_raw (bool, optional) – If True (default), exclusion and threshold settings in the EV file are temporarily disabled before exporting the CSV, in order to ensure all raw data is exported.

  • ev_app (win32com.client.Dispatch object or None, optional) – An object which can be used to interface with the Echoview application, as returned by win32com.client.Dispatch. If None (default), a new instance of the application is opened (and closed on completion).

  • verbose (int, optional) – Level of verbosity. Default is 0.

Returns

destination – Absolute path to destination.

Return type

str

echofilter.ev2csv.get_parser()[source]#

Build parser for ev2csv command line interface.

Returns

parser – CLI argument parser for ev2csv.

Return type

argparse.ArgumentParser

echofilter.ev2csv.main(args=None)[source]#

Run ev2csv command line interface.

echofilter.ev2csv.run_ev2csv(paths, variable_name='Fileset1: Sv pings T1', export_raw=True, source_dir='.', recursive_dir_search=True, output_dir='', suffix=None, keep_ext=False, skip_existing=False, overwrite_existing=False, minimize_echoview=False, hide_echoview='new', verbose=1, dry_run=False)[source]#

Export EV files to raw CSV files.

Parameters
  • paths (iterable) – Paths to input EV files to process, or directories containing EV files. These may be full paths or paths relative to source_dir. For each folder specified, any files with extension "csv" within the folder and all its tree of subdirectories will be processed.

  • variable_name (str, optional) – Name of the Echoview acoustic variable to export. Default is “Fileset1: Sv pings T1”.

  • export_raw (bool, optional) – If True (default), exclusion and threshold settings in the EV file are temporarily disabled before exporting the CSV, in order to ensure all raw data is exported. If False, thresholds and exclusions are used as per the EV file.

  • source_dir (str, optional) – Path to directory where files are found. Default is ".".

  • recursive_dir_search (bool, optional) – How to handle directory inputs in paths. If False, only files (with the correct extension) in the directory will be included. If True, subdirectories will also be walked through to find input files. Default is True.

  • output_dir (str, optional) – Directory where output files will be written. If this is an empty string ("", default), outputs are written to the same directory as each input file. Otherwise, they are written to output_dir, preserving their path relative to source_dir if relative paths were used.

  • suffix (str, optional) – Output filename suffix. Default is "_Sv_raw.csv" if keep_ext=False, or ".Sv_raw.csv" if keep_ext=True. The "_raw" component is excluded if export_raw is False.

  • keep_ext (bool, optional) – Whether to preserve the file extension in the input file name when generating output file name. Default is False, removing the extension.

  • skip_existing (bool, optional) – Whether to skip processing files whose destination paths already exist. If False (default), an error is raised if the destination file already exists.

  • overwrite_existing (bool, optional) – Whether to overwrite existing output files. If False (default), an error is raised if the destination file already exists.

  • minimize_echoview (bool, optional) – If True, the Echoview window being used will be minimized while this function is running. Default is False.

  • hide_echoview ({"never", "new", "always"}, optional) – Whether to hide the Echoview window entirely while the code runs. If hide_echoview="new", the application is only hidden if it was created by this function, and not if it was already running. If hide_echoview="always", the application is hidden even if it was already running. In the latter case, the window will be revealed again when this function is completed. Default is "new".

  • verbose (int, optional) – Level of verbosity. Default is 1.

  • dry_run (bool, optional) – If True, perform a trial run with no changes made. Default is False.

Returns

Paths to generated CSV files.

Return type

list of str

echofilter.generate_shards module#

Convert dataset of CSV exports from Echoview into shards.

echofilter.generate_shards.generate_shard(transect_pth, verbose=False, fail_gracefully=True, **kwargs)[source]#

Shard a single transect.

Wrapper around echofilter.raw.shardloader.segment_and_shard_transect which adds verboseness and graceful failure options.

Parameters
  • transect_pth (str) – Relative path to transect.

  • verbose (bool, optional) – Whether to print which transect is being processed. Default is False.

  • fail_gracefully (bool, optional) – If True, any transect which triggers an errors during processing will be printed out, but processing the rest of the transects will continue. If False, the process will halt with an error as soon as any single transect hits an error. Default is True.

  • **kwargs – See echofilter.raw.shardloader.segment_and_shard_transect().

echofilter.generate_shards.generate_shards(partition, dataset, partitioning_version='firstpass', progress_bar=False, ncores=None, verbose=False, fail_gracefully=True, root_data_dir='/data/dsforce/surveyExports', **kwargs)[source]#

Shard all transections in one partition of a dataset.

Wrapper around echofilter.raw.shardloader.segment_and_shard_transect which adds verboseness and graceful failure options.

Parameters
  • partition (str) – Name of the partition to process ('train', 'validate', 'test', etc).

  • dataset (str) – Name of the dataset to process ('mobile', 'MinasPassage', etc).

  • partitioning_version (str, optional) – Name of the partition version to use process. Default is 'firstpass'.

  • progress_bar (bool, optional) – Whether to output a progress bar using tqdm. Default is False.

  • ncores (int, optional) – Number of cores to use for multiprocessing. To disable multiprocessing, set to 1. Set to None to use all available cores. Default is None.

  • verbose (bool, optional) – Whether to print which transect is being processed. Default is False.

  • fail_gracefully (bool, optional) – If True, any transect which triggers an errors during processing will be printed out, but processing the rest of the transects will continue. If False, the process will halt with an error as soon as any single transect hits an error. Default is True.

  • **kwargs – See echofilter.raw.shardloader.segment_and_shard_transect().

echofilter.generate_shards.get_parser()[source]#

Build parser for command line interface for generating shards.

Returns

parser – CLI argument parser for generating shards.

Return type

argparse.ArgumentParser

echofilter.generate_shards.main(args=None)[source]#

Command line interface for generating dataset shards from CSV files.

echofilter.inference module#

Inference routine.

echofilter.inference.get_color_palette(include_xkcd=True, sort_colors=True)[source]#

Provide a mapping of named colors from matplotlib.

Parameters
  • include_xkcd (bool, default=True) – Whether to include the XKCD color palette in the output. Note that XKCD colors have "xkcd:" prepended to their names to prevent collisions with official named colors from CSS4. See https://xkcd.com/color/rgb/ and https://blog.xkcd.com/2010/05/03/color-survey-results/ for the XKCD colors.

  • sort_colors (bool, default=True) – Whether to sort the colors by hue. Otherwise the colors are grouped together by source, and maintain their default ordering (alphabetized).

Returns

colors – Mapping from names of colors as strings to color value, either as an RGB tuple (fractional, 0 to 1 range) or a hexadecimal string.

Return type

dict

echofilter.inference.hexcolor2rgb8(color)[source]#

Map hexadecimal colors to uint8 RGB.

Parameters

color (str) – A hexadecimal color string, with leading "#". If the input is not a string beginning with "#", it is returned as-is without raising an error.

Returns

RGB color tuple, in uint8 format (0–255).

Return type

tuple

echofilter.inference.import_lines_regions_to_ev(ev_fname, files, target_names=None, nearfield_depth=None, add_nearfield_line=True, lines_cutoff_at_nearfield=None, offsets=None, line_colors=None, line_thicknesses=None, ev_app=None, overwrite=False, common_notes='', verbose=1)[source]#

Write lines and regions to EV file.

Parameters
  • ev_fname (str) – Path to Echoview file to import variables into.

  • files (dict) – Mapping from output keys to filenames.

  • target_names (dict, optional) – Mapping from output keys to output variable names.

  • nearfield_depth (float, optional) – Depth at which nearfield line will be placed. By default, no nearfield line will be added, irrespective of add_nearfield_line.

  • add_nearfield_line (bool, default=True) – Whether to add a nearfield line.

  • lines_cutoff_at_nearfield (list of str, optional) – Which lines (if any) should be clipped at the nearfield depth. By default, no lines will be clipped.

  • offsets (dict, optional) – Amount of offset for each line.

  • line_colors (dict, optional) – Mapping from output keys to line colours.

  • line_thicknesses (dict, optional) – Mapping from output keys to line thicknesses.

  • ev_app (win32com.client.Dispatch object, optional) – An object which can be used to interface with the Echoview application, as returned by win32com.client.Dispatch. By default, a new instance of the application is opened (and closed on completion).

  • overwrite (bool, default=False) – Whether existing lines with target names should be replaced. If a line with the target name already exists and overwrite=False, the line is named with the current datetime to prevent collisions.

  • common_notes (str, default="") – Notes to include for every region.

  • verbose (int, default=1) – Verbosity level.

echofilter.inference.inference_transect(model, timestamps, depths, signals, device, image_height, facing='auto', crop_min_depth=None, crop_max_depth=None, autocrop_threshold=0.35, force_unconditioned=False, data_center='mean', data_deviation='stdev', prenorm_nan_value=None, postnorm_nan_value=- 3, dtype=torch.float32, verbose=0)[source]#

Run inference on a single transect.

Parameters
  • model (echofilter.wrapper.Echofilter) – A pytorch Module wrapped in an Echofilter UI layer.

  • timestamps (array_like) – Sample recording timestamps (in seconds since Unix epoch). Must be a vector.

  • depths (array_like) – Recording depths from the surface (in metres). Must be a vector.

  • signals (array_like) – Echogram Sv data. Must be a matrix shaped (len(timestamps), len(depths)).

  • image_height (int) – Height to resize echogram before passing through model.

  • facing ({"downward", "upward", "auto"}, default="auto") – Orientation in which the echosounder is facing. Default is "auto", in which case the orientation is determined from the ordering of the depth values in the data (increasing = "upward", decreasing = "downward").

  • crop_min_depth (float, optional) – Minimum depth to include in input. By default, there is no minimum depth.

  • crop_max_depth (float, optional) – Maxmimum depth to include in input. By default, there is no maximum depth.

  • autocrop_threshold (float, default=0.35) – Minimum fraction of input height which must be found to be removable for the model to be re-run with an automatically cropped input.

  • force_unconditioned (bool, optional) – Whether to always use unconditioned logit outputs when deteriming the new depth range for automatic cropping.

  • data_center (float or str, default="mean") – Center point to use, which will be subtracted from the Sv signals (i.e. the overall sample mean). If data_center is a string, it specifies the method to use to determine the center value from the distribution of intensities seen in this sample transect.

  • data_deviation (float or str, default="stdev") – Deviation to use to normalise the Sv signals in divisive manner (i.e. the overall sample standard deviation). If data_deviation is a string, it specifies the method to use to determine the center value from the distribution of intensities seen in this sample transect.

  • prenorm_nan_value (float, optional) – If this is set, replace NaN values with a given Sv value before the data normalisation (Gaussian standardisation) step. By default, NaNs are left as they are until after standardising the data.

  • postnorm_nan_value (float, default=-3) – Placeholder value to replace NaNs with. Does nothing if prenorm_nan_value is set.

  • dtype (torch.dtype, default=torch.float) – Datatype to use for model input.

  • verbose (int, default=0) – Level of verbosity.

Returns

Dictionary with fields as output by echofilter.wrapper.Echofilter, plus timestamps and depths.

Return type

dict

echofilter.inference.run_inference(paths, source_dir='.', recursive_dir_search=True, extensions='csv', skip_existing=False, skip_incompatible=False, output_dir='', dry_run=False, continue_on_error=False, overwrite_existing=False, overwrite_ev_lines=False, import_into_evfile=True, generate_turbulence_line=True, generate_bottom_line=True, generate_surface_line=True, add_nearfield_line=True, suffix_file='', suffix_var=None, color_turbulence='orangered', color_turbulence_offset=None, color_bottom='orangered', color_bottom_offset=None, color_surface='green', color_surface_offset=None, color_nearfield='mediumseagreen', thickness_turbulence=2, thickness_turbulence_offset=None, thickness_bottom=2, thickness_bottom_offset=None, thickness_surface=1, thickness_surface_offset=None, thickness_nearfield=1, cache_dir=None, cache_csv=None, suffix_csv='', keep_ext=False, line_status=3, offset_turbulence=1.0, offset_bottom=1.0, offset_surface=1.0, nearfield=1.7, cutoff_at_nearfield=None, lines_during_passive='interpolate-time', collate_passive_length=10, collate_removed_length=10, minimum_passive_length=10, minimum_removed_length=- 1, minimum_patch_area=- 1, patch_mode=None, variable_name='Fileset1: Sv pings T1', export_raw_csv=True, row_len_selector='mode', facing='auto', use_training_standardization=False, prenorm_nan_value=None, postnorm_nan_value=None, crop_min_depth=None, crop_max_depth=None, autocrop_threshold=0.35, image_height=None, checkpoint=None, force_unconditioned=False, logit_smoothing_sigma=0, device=None, hide_echoview='new', minimize_echoview=False, verbose=2)[source]#

Perform inference on input files, and generate output files.

Outputs are written as lines in EVL and regions in EVR file formats.

Parameters
  • paths (iterable or str) – Files and folders to be processed. These may be full paths or paths relative to source_dir. For each folder specified, any files with extension "csv" within the folder and all its tree of subdirectories will be processed.

  • source_dir (str, default=".") – Path to directory where files are found.

  • recursive_dir_search (bool, default=True) – How to handle directory inputs in paths. If False, only files (with the correct extension) in the directory will be included. If True, subdirectories will also be walked through to find input files.

  • extensions (iterable or str, default="csv") – File extensions to detect when running on a directory.

  • skip_existing (bool, default=False) – Skip processing files which already have all outputs present.

  • skip_incompatible (bool, default=False) – Skip processing CSV files which do not seem to contain an exported Echoview transect. If False, an error is raised.

  • output_dir (str, default="") – Directory where output files will be written. If this is an empty string, outputs are written to the same directory as each input file. Otherwise, they are written to output_dir, preserving their path relative to source_dir if relative paths were used.

  • dry_run (bool, default=False) – If True, perform a trial run with no changes made.

  • continue_on_error (bool, default=False) – Continue running on remaining files if one file hits an error.

  • overwrite_existing (bool, default=False) – Overwrite existing outputs without producing a warning message. If False, an error is generated if files would be overwritten.

  • overwrite_ev_lines (bool, default=False) – Overwrite existing lines within the Echoview file without warning. If False (default), the current datetime will be appended to line variable names in the event of a collision.

  • import_into_evfile (bool, default=True) – Whether to import the output lines and regions into the EV file, whenever the file being processed in an EV file.

  • generate_turbulence_line (bool, default=True) – Whether to output an evl file for the turbulence line. If this is False, the turbulence line is also never imported into Echoview.

  • generate_bottom_line (bool, default=True) – Whether to output an evl file for the bottom line. If this is False, the bottom line is also never imported into Echoview.

  • generate_surface_line (bool, default=True) – Whether to output an evl file for the surface line. If this is False, the surface line is also never imported into Echoview.

  • add_nearfield_line (bool, default=True) – Whether to add a nearfield line to the EV file in Echoview.

  • suffix_file (str, default="") – Suffix to append to output artifacts (evl and evr files), between the name of the file and the extension. If suffix_file begins with an alphanumeric character, "-" is prepended.

  • suffix_var (str, optional) – Suffix to append to line and region names when imported back into EV file. If suffix_var begins with an alphanumeric character, "-" is prepended. By default, suffix_var will match suffix_file if it is set, and will be “_echofilter” otherwise.

  • color_turbulence (str, default="orangered") – Color to use for the turbulence line when it is imported into Echoview. This can either be the name of a supported color from matplotlib.colors, or a hexadecimal color, or a string representation of an RGB color to supply directly to Echoview (such as “(0,255,0)”).

  • color_turbulence_offset (str, optional) – Color to use for the offset turbulence line when it is imported into Echoview. By default, color_turbulence is used.

  • color_bottom (str, default="orangered") – Color to use for the bottom line when it is imported into Echoview. This can either be the name of a supported color from matplotlib.colors, or a hexadecimal color, or a string representation of an RGB color to supply directly to Echoview (such as “(0,255,0)”).

  • color_bottom_offset (str, optional) – Color to use for the offset bottom line when it is imported into Echoview. By default, color_bottom is used.

  • color_surface (str, default="green") – Color to use for the surface line when it is imported into Echoview. This can either be the name of a supported color from matplotlib.colors, or a hexadecimal color, or a string representation of an RGB color to supply directly to Echoview (such as “(0,255,0)”).

  • color_surface_offset (str, optional) – Color to use for the offset surface line when it is imported into Echoview. By default, color_surface is used.

  • color_nearfield (str, default="mediumseagreen") – Color to use for the nearfield line when it is created in Echoview. This can either be the name of a supported color from matplotlib.colors, or a hexadecimal color, or a string representation of an RGB color to supply directly to Echoview (such as “(0,255,0)”).

  • thickness_turbulence (int, default=2) – Thickness with which the turbulence line will be displayed in Echoview.

  • thickness_turbulence_offset (str, optional) – Thickness with which the offset turbulence line will be displayed in Echoview. By default, thickness_turbulence is used.

  • thickness_bottom (int, default=2) – Thickness with which the bottom line will be displayed in Echoview.

  • thickness_bottom_offset (str, optional) – Thickness with which the offset bottom line will be displayed in Echoview. By default, thickness_bottom is used.

  • thickness_surface (int, default=1) – Thickness with which the surface line will be displayed in Echoview.

  • thickness_surface_offset (str, optional) – Thickness with which the offset surface line will be displayed in Echoview. By default, thickness_surface is used.

  • thickness_nearfield (int, default=1) – Thickness with which the nearfield line will be displayed in Echoview.

  • cache_dir (str, optional) – Path to directory where downloaded checkpoint files should be cached. By default, an OS-appropriate application-specific default cache directory is used.

  • cache_csv (str, optional) – Path to directory where CSV files generated from EV inputs should be cached. By default, EV files which are exported to CSV files are temporary files, deleted after this program has completed. If cache_csv="", the CSV files are cached in the same directory as the input EV files.

  • suffix_csv (str, default="") – Suffix used for cached CSV files which are exported from EV files. If suffix_file begins with an alphanumeric character, a delimiter is prepended. The delimiter is "." if keep_ext=True or "-" if keep_ext=False.

  • keep_ext (bool, default=False) – Whether to preserve the file extension in the input file name when generating output file name. Default is False, removing the extension.

  • line_status (int, default=3) –

    Status to use for the lines. Must be one of:

    • 0 : none

    • 1 : unverified

    • 2 : bad

    • 3 : good

  • offset_turbulence (float, default=1.0) – Offset for turbulence line, which moves the turbulence line deeper.

  • offset_bottom (float, default=1.0) – Offset for bottom line, which moves the line to become more shallow.

  • offset_surface (float, default=1.0) – Offset for surface line, which moves the surface line deeper.

  • nearfield (float, default=1.7) – Nearfield approach distance, in metres. If the echogram is downward facing, the nearfield cutoff depth will be at a depth equal to the nearfield distance. If the echogram is upward facing, the nearfield cutoff will be nearfield meters above the deepest depth recorded in the input data. When processing an EV file, by default a nearfield line will be added at the nearfield cutoff depth. To prevent this behaviour, use the –no-nearfield-line argument.

  • cutoff_at_nearfield (bool, optional) – Whether to cut-off the turbulence line (for downfacing data) or bottom line (for upfacing) when it is closer to the echosounder than the nearfield distance. By default, the bottom line is clipped (for upfacing data), but the turbulence line is not clipped (even with downfacing data).

  • lines_during_passive (str, default="interpolate-time") –

    Method used to handle line depths during collection periods determined to be passive recording instead of active recording. Options are:

    "interpolate-time"

    depths are linearly interpolated from active recording periods, using the time at which recordings where made.

    "interpolate-index"

    depths are linearly interpolated from active recording periods, using the index of the recording.

    "predict"

    the model’s prediction for the lines during passive data collection will be kept; the nature of the prediction depends on how the model was trained.

    "redact"

    no depths are provided during periods determined to be passive data collection.

    "undefined"

    depths are replaced with the placeholder value used by Echoview to denote undefined values, which is -10000.99.

  • collate_passive_length (int, default=10) – Maximum interval, in ping indices, between detected passive regions which will removed to merge consecutive passive regions together into a single, collated, region.

  • collate_passive_length – Maximum interval, in ping indices, between detected blocks (vertical rectangles) marked for removal which will also be removed to merge consecutive removed blocks together into a single, collated, region.

  • minimum_passive_length (int, default=10) – Minimum length, in ping indices, which a detected passive region must have to be included in the output. Set to -1 to omit all detected passive regions from the output.

  • minimum_removed_length (int, default=-1) – Minimum length, in ping indices, which a detected removal block (vertical rectangle) must have to be included in the output. Set to -1 to omit all detected removal blocks from the output (default). Recommended minimum length is 10.

  • minimum_patch_area (int, default=-1) – Minimum area, in pixels, which a detected removal patch (contour/polygon) region must have to be included in the output. Set to -1 to omit all detected patches from the output (default). Recommended minimum length 25.

  • patch_mode (str, optional) –

    Type of mask patches to use. Must be supported by the model checkpoint used. Should be one of:

    "merged"

    Target patches for training were determined after merging as much as possible into the turbulence and bottom lines.

    "original"

    Target patches for training were determined using original lines, before expanding the turbulence and bottom lines.

    "ntob"

    Target patches for training were determined using the original bottom line and the merged turbulence line.

    By default, "merged" is used if downfacing and "ntob" is used if upfacing.

  • variable_name (str, default="Fileset1: Sv pings T1") – Name of the Echoview acoustic variable to load from EV files.

  • export_raw_csv (bool, default=True) – If True (default), exclusion and threshold settings in the EV file are temporarily disabled before exporting the CSV, in order to ensure all raw data is exported. If False, thresholds and exclusions are used as per the EV file.

  • row_len_selector (str, default="mode") – Method used to handle input csv files with different number of Sv values across time (i.e. a non-rectangular input). See echofilter.raw.loader.transect_loader() for options.

  • facing ({"downward", "upward", "auto"}, default="auto") – Orientation in which the echosounder is facing. Default is "auto", in which case the orientation is determined from the ordering of the depth values in the data (increasing = "upward", decreasing = "downward").

  • use_training_standardization (bool, default=False) – Whether to use the exact normalization center and deviation values as used during training. If False (default), the center and deviation are determined per sample, using the same method methodology as used to determine the center and deviation values for training.

  • prenorm_nan_value (float, optional) – If this is set, replace NaN values with a given Sv value before the data normalisation (Gaussian standardisation) step. By default, NaNs are left as they are until after standardising the data.

  • postnorm_nan_value (float, optional) – Placeholder value to replace NaNs with. Does nothing if prenorm_nan_value is set. By default this is set to the value used to train the model.

  • crop_min_depth (float, optional) – Minimum depth to include in input. By default, there is no minimum depth.

  • crop_max_depth (float, optional) – Maxmimum depth to include in input. By default, there is no maximum depth.

  • autocrop_threshold (float, default=0.35) – Minimum fraction of input height which must be found to be removable for the model to be re-run with an automatically cropped input.

  • image_height (int, optional) – Height in pixels of input to model. The data loaded from the csv will be resized to this height (the width of the image is unchanged). By default, the height matches that used when the model was trained.

  • checkpoint (str, optional) – A path to a checkpoint file, or name of a checkpoint known to this package (listed in echofilter/checkpoints.yaml). By default, the first checkpoint in checkpoints.yaml is used.

  • force_unconditioned (bool, default=False) – Whether to always use unconditioned logit outputs. If False (default) conditional logits will be used if the checkpoint loaded is for a conditional model.

  • logit_smoothing_sigma (float, optional) – Standard deviation over which logits will be smoothed before being converted into output. Disabled by default.

  • device (str or torch.device, optional) – Name of device on which the model will be run. By default, the first available CUDA GPU is used if any are found, and otherwise the CPU is used. Set to "cpu" to use the CPU even if a CUDA GPU is available.

  • hide_echoview ({"never", "new", "always"}, default="new") – Whether to hide the Echoview window entirely while the code runs. If hide_echoview="new", the application is only hidden if it was created by this function, and not if it was already running. If hide_echoview="always", the application is hidden even if it was already running. In the latter case, the window will be revealed again when this function is completed.

  • minimize_echoview (bool, default=False) – If True, the Echoview window being used will be minimized while this function is running.

  • verbose (int, default=2) – Verbosity level. Set to 0 to disable print statements, or elevate to a higher number to increase verbosity.

echofilter.path module#

Path utilities.

echofilter.path.check_if_windows()[source]#

Check if the operating system is Windows.

Returns

Whether the OS is Windows.

Return type

bool

echofilter.path.determine_destination(fname, fname_full, source_dir, output_dir)[source]#

Determine where destination should be placed for a file, preserving subtree paths.

Parameters
  • fname (str) – Original input path.

  • fname_full (str) – Path to file, either absolute or relative; possibly containing source_dir.

  • source_dir (str) – Path to a directory where the file bearing name fname is expected to be located.

  • output_dir (str) – Path to root output directory.

Returns

Path to where file can be found, either absolute or relative.

Return type

str

echofilter.path.determine_file_path(fname, source_dir)[source]#

Determine the path to use to an input file.

Parameters
  • fname (str) – Path to an input file. Either an absolute path, or a path relative to to source_dir, or a path relative to the working directory.

  • source_dir (str) – Path to a directory where the file bearing name fname is expected to be located.

Returns

Path to where file can be found, either absolute or relative.

Return type

str

echofilter.path.parse_files_in_folders(files_or_folders, source_dir, extension, recursive=True)[source]#

Walk through folders and find suitable files.

Parameters
  • files_or_folders (iterable) – List of files and folders.

  • source_dir (str) – Root directory within which elements of files_or_folders may be found.

  • extension (str or Collection) – Extension (or list of extensions) which files within directories must bear to be included, without leading '.', for instance '.csv'. Note that explicitly given files are always used.

  • recursive (bool, optional) – Whether to walk through the tree of files in a subfolders of a directory input. If False, only files in the folder itself and not its child folders will be included.

Yields

str – Paths to explicitly given files and files within directories with extension extension.

echofilter.plotting module#

Plotting utilities.

echofilter.plotting.ensure_axes_inverted(axes=None, dir='y')[source]#

Invert axis direction, if not already inverted.

Parameters
  • axes (matplotlib.axes or None) – The axes to invert. If None, the current axes are used (default).

  • dir ({"x", "y", "xy"}) – The axis to invert. Default is "y".

echofilter.plotting.plot_indicator_hatch(indicator, xx=None, ymin=None, ymax=None, hatch='//', color='k')[source]#

Plot a hatch across indicated segments along the x-axis of a plot.

Parameters
  • indicator (numpy.ndarray vector) – Whether to include or exclude each column along the x-axis. Included columns are indicated with non-zero values.

  • xx (numpy.ndarray vector, optional) – Values taken by indicator along the x-axis. If None (default), the indices of indicator are used: arange(len(indicator)).

  • ymin (float, optional) – The lower y-value of the extent of the hatching. If None (default), the minimum y-value of the current axes is used.

  • ymax (float, optional) – The upper y-value of the extent of the hatching. If None (default), the maximum y-value of the current axes is used.

  • hatch (str, optional) – Hatching pattern to use. Default is "//".

  • color (color, optional) – Color of the hatching pattern. Default is black.

echofilter.plotting.plot_mask_hatch(*args, hatch='//', color='k', border=False)[source]#

Plot hatching according to a mask shape.

Parameters
  • X (array-like, optional) –

    The coordinates of the values in Z.

    X and Y must both be 2-D with the same shape as Z (e.g. created via numpy.meshgrid), or they must both be 1-D such that len(X) == M is the number of columns in Z and len(Y) == N is the number of rows in Z.

    If not given, they are assumed to be integer indices, i.e. X = range(M), Y = range(N).

  • Y (array-like, optional) –

    The coordinates of the values in Z.

    X and Y must both be 2-D with the same shape as Z (e.g. created via numpy.meshgrid), or they must both be 1-D such that len(X) == M is the number of columns in Z and len(Y) == N is the number of rows in Z.

    If not given, they are assumed to be integer indices, i.e. X = range(M), Y = range(N).

  • Z (array-like(N, M)) – Indicator for which locations should be hatched. If Z is not a boolean array, any location where Z > 0 will be hatched.

  • hatch (str, optional) – The hatching pattern to apply. Default is “//”.

  • color (color, optional) – The color of the hatch. Default is black.

  • border (bool, optional) – Whether to include border around hatch. Default is False.

echofilter.plotting.plot_transect(transect, signal_type=None, x_scale='index', show_regions=True, turbulence_color='#a6cee3', bottom_color='#b2df8a', surface_color='#4ba82a', passive_color=[0.4, 0.4, 0.4], removed_color=None, linewidth=1, cmap=None)[source]#

Plot a transect.

Parameters
  • transect (dict) – Transect values.

  • signal_type (str, optional) – The signal to plot as a heatmap. Default is "Sv" if present, or “signals” if not. If this is "Sv_masked", the mask (given by transect["mask"]) is used to mask transect["Sv"] before plotting.

  • x_scale ({"index", "timestamp" "time"}, optional) – Scaling for x-axis. If "timestamp", the number of seconds since the Unix epoch is shown; if "time", the amount of time in seconds since the start of the transect is shown. Default is "index".

  • show_regions (bool, optional) – Whether to show segments of data maked as removed or passive with hatching. Passive data is shown with "/" oriented lines, other removed timestamps with "\" oriented lines. Default is True.

  • turbulence_color (color, optional) – Color of turbulence line. Default is "#a6cee3".

  • bottom_color (color, optional) – Color of bottom line. Default is "#b2df8a".

  • surface_color (color, optional) – Color of surface line. Default is "#d68ade".

  • passive_color (color, optional) – Color of passive segment hatching. Default is [.4, .4, .4].

  • removed_color (color, optional) – Color of removed segment hatching. Default is "r" if cmap is "viridis", and "b" otherwise.

  • linewidth (int) – Width of lines. Default is 2.

  • cmap (str, optional) – Name of a registered matplotlib colormap. If None (default), the current default colormap is used.

echofilter.plotting.plot_transect_predictions(transect, prediction, linewidth=1, cmap=None)[source]#

Plot the generated output for a transect against its ground truth data.

  • Ground truth data is shown in black, predictions in white.

  • Passive regions are hatched in / direction for ground truth, for prediciton.

  • Removed regions are hatched in direction for ground truth, / for prediction.

Parameters
  • transect (dict) – Ground truth data for the transect.

  • prediction (dict) – Predictions for the transect.

  • linewidth (int) – Width of lines. Default is 2.

  • cmap (str, optional) – Name of a registered matplotlib colormap. If None (default), the current default colormap is used.

echofilter.train module#

Model training routine.

echofilter.train.build_dataset(dataset_name, data_dir, sample_shape, train_partition=None, val_partition=None, crop_depth=None, random_crop_args=None)[source]#

Construct a pytorch Dataset.

Parameters
  • dataset_name (str) – Name of the dataset. This can optionally be a list of multiple datasets joined with "+".

  • data_dir (str) – Path to root data directory, containing the dataset.

  • sample_shape (iterable of length 2) – The shape which will be used for training.

  • train_partition (str, optional) – Name of the partition to use for training. Can optionally be a list of multiple partitions joined with "+". Default is "train" (except for stationary2 where it is mixed).

  • val_partition (str, optional) – Name of the partition to use for validation. Can optionally be a list of multiple partitions joined with "+". Default is "validate" (except for stationary2 where it is mixed).

  • crop_depth (float or None, optional) – Depth at which to crop samples. Default is None.

  • random_crop_args (dict, optional) – Arguments to control the random crop used during training. Default is an empty dict, which uses the default arguments of :class`echofilter.data.transforms.RandomCropDepth`.

Returns

  • dataset_train (echofilter.data.dataset.TransectDataset) – Dataset of training samples.

  • dataset_val (echofilter.data.dataset.TransectDataset) – Dataset of validation samples.

  • dataset_augval (echofilter.data.dataset.TransectDataset) – Dataset of validation samples, appyling the training augmentation stack.

echofilter.train.generate_from_file(fname, *args, **kwargs)[source]#

Generate an output for a sample transect, specified by its file path.

echofilter.train.generate_from_shards(fname, *args, **kwargs)[source]#

Generate an output for a sample transect, specified by a path to sharded data.

echofilter.train.generate_from_transect(model, transect, sample_shape, device, dtype=torch.float32)[source]#

Generate an output for a sample transect, .

echofilter.train.meters_to_csv(meters, is_best, dirname='.', filename='meters.csv')[source]#

Export performance metrics to CSV format.

Parameters
  • meters (dict of dict) – Collection of output meters, as a nested dictionary.

  • is_best (bool) – Whether this model state is the best so far. If True, the CSV file will be copied to "model_best.meters.csv".

  • dirname (str, optional) – Path to directory in which the checkpoint will be saved. Default is "." (current directory of the executed script).

  • filename (str, optional) – Format for the output file. Default is "meters.csv".

echofilter.train.save_checkpoint(state, is_best, dirname='.', fname_fmt='checkpoint{}.pt', dup=None)[source]#

Save a model checkpoint, using torch.save().

Parameters
  • state (dict) – Model checkpoint state to record.

  • is_best (bool) – Whether this model state is the best so far. If True, the best checkpoint (by default named "checkpoint_best.pt") will be overwritten with this state.

  • dirname (str, optional) – Path to directory in which the checkpoint will be saved. Default is "." (current directory of the executed script).

  • fname_fmt (str, optional) – Format for the file name(s) of the saved checkpoint(s). Must include one string argument output. Default is "checkpoint{}.pt".

  • dup (str or None) – If this is not None, a duplicate copy of the checkpoint is recorded in accordance with fname_fmt. By default the duplicate output file name will be styled as "checkpoint_<dup>.pt".

echofilter.train.train(data_dir='/data/dsforce/surveyExports', dataset_name='mobile', train_partition=None, val_partition=None, sample_shape=(128, 512), crop_depth=None, resume='', restart='', log_name=None, log_name_append=None, conditional=False, n_block=6, latent_channels=32, expansion_factor=1, expand_only_on_down=False, blocks_per_downsample=(2, 1), blocks_before_first_downsample=(2, 1), always_include_skip_connection=True, deepest_inner='horizontal_block', intrablock_expansion=6, se_reduction=4, downsampling_modes='max', upsampling_modes='bilinear', depthwise_separable_conv=True, residual=True, actfn='InplaceReLU', kernel_size=5, use_mixed_precision=None, amp_opt='O1', device='cuda', multigpu=False, n_worker=8, batch_size=16, stratify=True, n_epoch=20, seed=None, print_freq=50, optimizer='adam', schedule='constant', lr=0.1, momentum=0.9, base_momentum=None, weight_decay=1e-05, warmup_pct=0.2, warmdown_pct=0.7, anneal_strategy='cos', overall_loss_weight=0.0)[source]#

Train a model.

echofilter.train.train_epoch(loader, model, criterion, optimizer, device, epoch, dtype=torch.float32, print_freq=10, schedule_data=None, use_mixed_precision=False, continue_through_error=True)[source]#

Train a model through a single epoch of the dataset.

Parameters
  • loader (iterable, torch.utils.data.DataLoader) – Dataloader.

  • model (callable, echofilter.nn.wrapper.Echofilter) – Model.

  • criterion (callable, torch.nn.modules.loss._Loss) – Loss function.

  • device (str or torch.device) – Which device the data should be loaded onto.

  • epoch (int) – Which epoch is being performed.

  • dtype (str or torch.dtype) – Datatype which which the data should be loaded.

  • print_freq (int, optional) – Number of batches between reporting progress. Default is 10.

  • schedule_data (dict or None) – If a learning rate schedule is being used, this may be passed as a dictionary with the key "scheduler" mapping to the learning rate schedule as a callable.

  • use_mixed_precision (bool) – Whether to use apex.amp.scale_loss() to automatically scale the loss. Default is False.

  • continue_through_error (bool) – Whether to catch errors within an individual batch, ignore them and continue running training on the rest of the batches. If there are five or more errors while processing the batch, training will halt regardless of continue_through_error. Default is True.

Returns

  • average_loss (float) – Average loss as given by criterion (weighted equally for each sample in loader).

  • meters (dict of dict) – Each key is a strata of the model output, each mapping to a their own dictionary of evaluation criterions: “Accuracy”, “Precision”, “Recall”, “F1 Score”, “Jaccard”.

  • examples (tuple of torch.Tensor) – Tuple of (example_input, example_data, example_output).

  • timing (tuple of floats) – Tuple of (batch_time, data_time).

echofilter.train.validate(loader, model, criterion, device, dtype=torch.float32, print_freq=10, prefix='Test', num_examples=32)[source]#

Validate the model’s performance on the validation partition.

Parameters
  • loader (iterable, torch.utils.data.DataLoader) – Dataloader.

  • model (callable, echofilter.nn.wrapper.Echofilter) – Model.

  • criterion (callable, torch.nn.modules.loss._Loss) – Loss function.

  • device (str or torch.device) – Which device the data should be loaded onto.

  • dtype (str or torch.dtype) – Datatype which which the data should be loaded.

  • print_freq (int, optional) – Number of batches between reporting progress. Default is 10.

  • prefix (str, optional) – Prefix string to prepend to progress meter names. Default is "Test".

  • num_examples (int, optional) – Number of example inputs to return. Default is 32.

Returns

  • average_loss (float) – Average loss as given by criterion (weighted equally for each sample in loader).

  • meters (dict of dict) – Each key is a strata of the model output, each mapping to a their own dictionary of evaluation criterions: “Accuracy”, “Precision”, “Recall”, “F1 Score”, “Jaccard”.

  • examples (tuple of torch.Tensor) – Tuple of (example_input, example_data, example_output).

echofilter.utils module#

General utility functions.

echofilter.utils.first_nonzero(arr, axis=- 1, invalid_val=- 1)[source]#

Find the index of the first non-zero element in an array.

Parameters
  • arr (numpy.ndarray) – Array to search.

  • axis (int, optional) – Axis along which to search for a non-zero element. Default is -1.

  • invalid_val (any, optional) – Value to return if all elements are zero. Default is -1.

echofilter.utils.get_indicator_onoffsets(indicator)[source]#

Find the onsets and offsets of nonzero entries in an indicator.

Parameters

indicator (1d numpy.ndarray) – Input vector, which is sometimes zero and sometimes nonzero.

Returns

  • onsets (list) – Onset indices, where each entry is the start of a sequence of nonzero values in the input indicator.

  • offsets (list) – Offset indices, where each entry is the last in a sequence of nonzero values in the input indicator, such that indicator[onsets[i] : offsets[i] + 1] != 0.

echofilter.utils.last_nonzero(arr, axis=- 1, invalid_val=- 1)[source]#

Find the index of the last non-zero element in an array.

Parameters
  • arr (numpy.ndarray) – Array to search.

  • axis (int, optional) – Axis along which to search for a non-zero element. Default is -1.

  • invalid_val (any, optional) – Value to return if all elements are zero. Default is -1.

echofilter.utils.mode(a, axis=None, keepdims=False, **kwargs)[source]#

Return an array of the modal (most common) value in the passed array.

If there is more than one such value, only the smallest is returned.

Parameters
  • a (array_like) – n-dimensional array of which to find mode(s).

  • axis (int or None, optional) – Axis or axes along which the mode is computed. The default, axis=None, will sum all of the elements of the input array. If axis is negative it counts from the last to the first axis.

  • keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. Default is False.

  • **kwargs – Additional arguments as per scipy.stats.mode().

Returns

mode_along_axis – An array with the same shape as a, with the specified axis removed. If keepdims=True and either a is a 0-d array or axis is None, a scalar is returned.

Return type

numpy.ndarray

See also

scipy.stats.mode