Inference operations#

In this section, we describe the inference process, its outputs and inputs. Inference is the process of generating predictions from the model, and is the principal functionality of echofilter.

Processing overview#

This is an overview of how files are processed in the inference pipeline.

First, the setup:

  • If a directory input was given, determine list of files to process.

  • Download the model checkpoint, if necessary.

  • Load the model from the checkpoint into memory.

  • If any file to process is an EV file, open Echoview.

  • If it was not already open, hide the Echoview window.

After the model is loaded from its checkpoint, each file is processed in turn. The processing time for an individual file scales linearly with the number of pings in the file (twice as many pings = twice as long to process).

Each file is processed in the following steps:

  • If the input is an EV file, export the Sv data to CSV format.

    • By default, the Sv data is taken from "Fileset1: Sv pings T1".

    • Unless --cache-csv is provided, the CSV file is output to a temporary file, which is deleted after the CSV file is imported.

  • Import the Sv data from the CSV file. (If the input was a CSV file, this is the input; if the input was an EV file this is the CSV file generated from the EV file in the preceding step.)

  • Rescale the height of the Sv input to have the number of pixels expected by the model.

  • Automatically determine whether the echosounder recording is upfacing or downfacing, based on the order of the Depths data in the CSV file.

    • If the orientation was manually specified, issue a warning if it does not match the detected orientation.

    • Reflect the data in the Depth dimension if it is upfacing, so that the shallowest samples always occur first, and deepest last.

  • Normalise the distribution of the Sv intensities to match that expected by the model.

  • Split the input data into segments

    • Detect temporal discontinuities between pings.

    • Split the input Sv data into segments such that each segment contains contiguous pings.

  • Pass the each segment of the input through the model to generate output probabilities.

  • Crop the depth dimension down to zoom in on the most salient data.

    • If upfacing, crop the top off the echogram to show only 2m above the shallowest estimated surface line depth.

    • If downfacing, crop the bottom off the echogram only 2m below the deepest estimated bottom line depth.

    • If more than 35% of the echogram’s height (threshold value set with --autocrop-threshold) was cropped away, pass the cropped Sv data through the model to get better predictions based on the zoomed in data.

  • Line boundary probabilities are converted into output depths.

    • The boundary probabilities at each pixel are integrated to make a cumulative probability distribution across depth, \(p(\text{depth} > \text{boundary location})\).

    • The output boundary depth is estimated as the depth at which the cumulative probability distribution first exceeds 50%.

  • Bottom, surface, and turbulence lines are output to EVL files.

    • Note: there is no EVL file for the nearfield line since it is at a constant depth as provided by the user and not generated by the model.

  • Regions are generated:

    • Regions are collated if there is a small gap between consecutive passive data or bad data regions.

    • Regions which are too small (fewer than 10 pings for rectangles) are dropped.

    • All regions are written to a single EVR file.

  • If the input was an EV file, the lines and regions are imported into the EV file, and a nearfield line is added.

Simulating processing#

To see which files will be processed by a command and what the output will be, run echofilter with the --dry-run argument.

Input#

Echofilter can process two types of file as its input: .EV files and .CSV files. The EV file input is more user-friendly, but requires the Windows operating system, and a fully operational Echoview application (i.e. with an Echoview dongle). The CSV file format can be processed without Echoview, but must be generated in advance from the .EV file on a system with Echoview. The CSV files must contain raw Sv data (without thresholding or masking) and in the format produced by exporting Sv data from Echoview. These raw CSV files can be exported using the utility ev2csv, which is provided as a separate executable in the echofilter package.

If the input path is a directory, all files in the directory are processed. By default, all subdirectories are recursively processed; this behaviour can be disabled with the --no-recursive-dir-search argument. All files in the directory (and subdirectories) with an appropriate file extension will be processed. By default, files with a .CSV or .EV file extension (case insensitive) which will be processed. The file extensions to include can be set with the --extension argument.

Multiple input files or directories can also be specified (each separated by a space).

By default, when processing an EV file, the Sv data is taken from the "Fileset1: Sv pings T1" variable. This can be changed with the --variable-name argument.

Loading model#

The model used to process the data is loaded from a checkpoint file. The executable echofilter.exe comes with its default model checkpoint bundled as part of the release. Aside from this, the first time a particular model is used, the checkpoint file will be downloaded over the internet. The checkpoint file will be cached on your system and will not need to be downloaded again unless you clear your cache.

Multiple models are available to select from. These can be shown by running the command echofilter --list-checkpoints. The default model will be highlighted in the output. In general, it is recommended to use the default checkpoint. See Model checkpoints below for more details.

When running echofilter for inference, the checkpoint can be specified with the --checkpoint argument.

If you wish to use a custom model which is not built in to echofilter, specify a path to the checkpoint file using the --checkpoint argument.

Output#

Output files#

For each input file, echofilter produces the following output files:

<input>.bottom.evl

An Echoview line file containing the depth of the bottom line.

<input>.regions.evr

An Echoview region file containing spatiotemporal definitions of passive recording rectangle regions, bad data full-vertical depth rectangle regions, and bad data anomaly polygonal (contour) regions.

<input>.surface.evl

An Echoview line file containing the depth of the surface line.

<input>.turbulence.evl

An Echoview line file containing the depth of the turbulence line.

where <input> is the path to an input file, stripped of its file extension. There is no EVL file for the nearfield line, since it is a virtual line of fixed depth added to the EV file during the Importing outputs into EV file step.

By default, the output files are located in the same directory as the file being processed. The output directory can be changed with the --output-dir argument, and a user-defined suffix can be added to the output file names using the --suffix argument.

If the output files already exist, by default echofilter will stop running and raise an error. If you want to overwrite output files which already exist, supply the --overwrite-files argument. If you want to skip inputs whose output files all already exist, supply the --skip argument. Note: if both --skip and --overwrite-files are supplied, inputs whose outputs all exist will be skipped and those inputs for which only some of the outputs exist will have existing outputs overwritten.

Specific outputs can be dropped by supplying the corresponding argument --no-bottom-line, --no-surface-line, or --no-turbulence-line respectively. To drop particular types of region entirely from the EVR output, use --minimum-passive-length -1, --minimum-removed-length -1, or --minimum-patch-area -1 respectively. By default, bad data regions (rectangles and contours) are not included in the EVR file. To include these, set --minimum-removed-length and --minimum-patch-area to non-negative values.

The lines written to the EVL files are the raw output from the model and do not include any offset.

Importing outputs into EV file#

If the input file is an Echoview EV file, by default echofilter will import the output files into the EV file and save the EV file (overwriting the original EV file). The behaviour can be disabled by supplying the --no-ev-import argument.

All lines will be imported twice: once at the original depth and a second time with an offset included. This offset ensures the exclusion of data biased by the acoustic deadzone, and provides a margin of safety at the bottom depth of the entrained air. The offset moves the surface and turbulence lines downwards (deeper), and the bottom line upwards (shallower). The default offset is 1m for all three lines, and can be set using the --offset argument. A different offset can be used for each line by providing the --offset-bottom, --offset-surface, and --offset-turbulence arguments.

The names of the objects imported into the EV file have the suffix "_echofilter" appended to them, to indicate the source of the line/region. However, if the --suffix argument was provided, that suffix is used instead. A custom suffix for the variable names within the EV file can be specified using the --suffix-var argument.

If the variable name to be used for a line is already in use, the default behaviour is to append the current datetime to the new variable name. To instead overwrite existing line variables, supply the --overwrite-ev-lines argument. Note that existing regions will not be overwritten (only lines).

By default, a nearfield line is also added to the EV file at a fixed range of 1.7m from the transducer position. The nearfield distance can be changed as appropriate for the echosounder in use by setting the --nearfield parameter.

The colour and thickness of the lines can be customised using the --color-surface, --thickness-surface (etc) arguments. See echofilter --list-colors to see the list of supported colour names.