Inference operations -------------------- In this section, we describe the :term:`inference` process, its outputs and inputs. Inference is the process of generating predictions from the :term:`model`, and is the principal functionality of :ref:`echofilter`. Processing overview ~~~~~~~~~~~~~~~~~~~ This is an overview of how files are processed in the :term:`inference` pipeline. First, the setup: - If a directory input was given, determine list of files to process. - Download the model :term:`checkpoint`, if necessary. - Load the :term:`model` from the :term:`checkpoint` into memory. - If any file to process is an :term:`EV file`, open :term:`Echoview`. - If it was not already open, hide the Echoview window. After the :term:`model` is loaded from its checkpoint, each file is processed in turn. The processing time for an individual file scales linearly with the number of :term:`pings` in the file (twice as many pings = twice as long to process). Each file is processed in the following steps: - If the input is an :term:`EV file`, export the :term:`Sv` data to :term:`CSV` format. - By default, the :term:`Sv` data is taken from ``"Fileset1: Sv pings T1"``. - Unless ``--cache-csv`` is provided, the :term:`CSV file` is output to a temporary file, which is deleted after the :term:`CSV file` is imported. - Import the :term:`Sv` data from the :term:`CSV file`. (If the input was a :term:`CSV file`, this is the input; if the input was an :term:`EV file` this is the :term:`CSV file` generated from the :term:`EV file` in the preceding step.) - Rescale the height of the :term:`Sv` input to have the number of pixels expected by the :term:`model`. - Automatically determine whether the :term:`echosounder` recording is :term:`upfacing` or :term:`downfacing`, based on the order of the Depths data in the :term:`CSV file`. - If the orientation was manually specified, issue a warning if it does not match the detected orientation. - Reflect the data in the Depth dimension if it is :term:`upfacing`, so that the shallowest :term:`samples` always occur first, and deepest last. - Normalise the distribution of the :term:`Sv` intensities to match that expected by the :term:`model`. - Split the input data into segments - Detect temporal discontinuities between :term:`pings`. - Split the input :term:`Sv` data into segments such that each segment contains contiguous :term:`pings`. - Pass the each segment of the input through the :term:`model` to generate output probabilities. - Crop the depth dimension down to zoom in on the most salient data. - If :term:`upfacing`, crop the top off the echogram to show only 2m above the shallowest estimated :term:`surface line` depth. - If :term:`downfacing`, crop the bottom off the echogram only 2m below the deepest estimated :term:`bottom line` depth. - If more than 35% of the echogram's height (threshold value set with ``--autocrop-threshold``) was cropped away, pass the cropped :term:`Sv` data through the :term:`model` to get better predictions based on the zoomed in data. - Line boundary probabilities are converted into output depths. - The boundary probabilities at each pixel are integrated to make a cumulative probability distribution across depth, :math:`p(\text{depth} > \text{boundary location})`. - The output boundary depth is estimated as the depth at which the cumulative probability distribution first exceeds 50%. - Bottom, surface, and turbulence lines are output to :term:`EVL` files. - Note: there is no EVL file for the :term:`nearfield line` since it is at a constant depth as provided by the user and not generated by the :term:`model`. - Regions are generated: - Regions are collated if there is a small gap between consecutive :term:`passive data` or :term:`bad data regions`. - Regions which are too small (fewer than 10 pings for rectangles) are dropped. - All regions are written to a single :term:`EVR` file. - If the input was an :term:`EV file`, the lines and regions are imported into the :term:`EV file`, and a :term:`nearfield line` is added. Simulating processing ~~~~~~~~~~~~~~~~~~~~~ To see which files will be processed by a command and what the output will be, run :ref:`echofilter` with the ``--dry-run`` argument. Input ~~~~~ :ref:`Echofilter` can process two types of file as its input: .EV files and .CSV files. The :term:`EV file` input is more user-friendly, but requires the Windows operating system, and a fully operational :term:`Echoview` application (i.e. with an Echoview dongle). The :term:`CSV file` format can be processed without Echoview, but must be generated in advance from the .EV file on a system with Echoview. The :term:`CSV files` must contain raw :term:`Sv` data (without thresholding or masking) and in the format produced by exporting :term:`Sv` data from Echoview. These raw :term:`CSV files` can be exported using the utility :ref:`ev2csv`, which is provided as a separate executable in the :ref:`echofilter` package. If the input path is a directory, all files in the directory are processed. By default, all subdirectories are recursively processed; this behaviour can be disabled with the ``--no-recursive-dir-search`` argument. All files in the directory (and subdirectories) with an appropriate file extension will be processed. By default, files with a .CSV or .EV file extension (case insensitive) which will be processed. The file extensions to include can be set with the ``--extension`` argument. Multiple input files or directories can also be specified (each separated by a space). By default, when processing an :term:`EV file`, the :term:`Sv` data is taken from the ``"Fileset1: Sv pings T1"`` variable. This can be changed with the ``--variable-name`` argument. Loading model ~~~~~~~~~~~~~ The :term:`model` used to process the data is loaded from a :term:`checkpoint` file. The executable :term:`echofilter.exe` comes with its default model checkpoint bundled as part of the release. Aside from this, the first time a particular model is used, the checkpoint file will be downloaded over the internet. The checkpoint file will be cached on your system and will not need to be downloaded again unless you clear your cache. Multiple models are available to select from. These can be shown by running the command ``echofilter --list-checkpoints``. The default model will be highlighted in the output. In general, it is recommended to use the default checkpoint. See :ref:`Model checkpoints` below for more details. When running :ref:`echofilter` for :term:`inference`, the checkpoint can be specified with the ``--checkpoint`` argument. If you wish to use a custom model which is not built in to :term:`echofilter`, specify a path to the checkpoint file using the ``--checkpoint`` argument. Output ~~~~~~ Output files ^^^^^^^^^^^^ For each input file, :ref:`echofilter` produces the following output files: .bottom.evl An Echoview line file containing the depth of the :term:`bottom line`. .regions.evr An Echoview region file containing spatiotemporal definitions of :term:`passive` recording rectangle regions, :term:`bad data` full-vertical depth rectangle regions, and :term:`bad data` anomaly polygonal (contour) regions. .surface.evl An Echoview line file containing the depth of the :term:`surface line`. .turbulence.evl An Echoview line file containing the depth of the :term:`turbulence line`. where is the path to an input file, stripped of its file extension. There is no :term:`EVL` file for the :term:`nearfield line`, since it is a virtual line of fixed depth added to the :term:`EV file` during the :ref:`Importing outputs into EV file` step. By default, the output files are located in the same directory as the file being processed. The output directory can be changed with the ``--output-dir`` argument, and a user-defined suffix can be added to the output file names using the ``--suffix`` argument. If the output files already exist, by default :ref:`echofilter` will stop running and raise an error. If you want to overwrite output files which already exist, supply the ``--overwrite-files`` argument. If you want to skip inputs whose output files all already exist, supply the ``--skip`` argument. Note: if both ``--skip`` and ``--overwrite-files`` are supplied, inputs whose outputs all exist will be skipped and those inputs for which only some of the outputs exist will have existing outputs overwritten. Specific outputs can be dropped by supplying the corresponding argument ``--no-bottom-line``, ``--no-surface-line``, or ``--no-turbulence-line`` respectively. To drop particular types of region entirely from the :term:`EVR` output, use ``--minimum-passive-length -1``, ``--minimum-removed-length -1``, or ``--minimum-patch-area -1`` respectively. By default, :term:`bad data` regions (rectangles and contours) are not included in the :term:`EVR` file. To include these, set ``--minimum-removed-length`` and ``--minimum-patch-area`` to non-negative values. The lines written to the :term:`EVL` files are the raw output from the model and do not include any offset. .. _Importing outputs into EV file: Importing outputs into EV file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If the input file is an Echoview :term:`EV file`, by default :ref:`echofilter` will import the output files into the :term:`EV file` and save the :term:`EV file` (overwriting the original :term:`EV file`). The behaviour can be disabled by supplying the ``--no-ev-import`` argument. All lines will be imported twice: once at the original depth and a second time with an offset included. This offset ensures the exclusion of data biased by the acoustic deadzone, and provides a margin of safety at the bottom depth of the :term:`entrained air`. The offset moves the :term:`surface` and :term:`turbulence` lines downwards (deeper), and the :term:`bottom line` upwards (shallower). The default offset is 1m for all three lines, and can be set using the ``--offset`` argument. A different offset can be used for each line by providing the ``--offset-bottom``, ``--offset-surface``, and ``--offset-turbulence`` arguments. The names of the objects imported into the :term:`EV file` have the suffix ``"_echofilter"`` appended to them, to indicate the source of the line/region. However, if the ``--suffix`` argument was provided, that suffix is used instead. A custom suffix for the variable names within the EV file can be specified using the ``--suffix-var`` argument. If the variable name to be used for a line is already in use, the default behaviour is to append the current datetime to the new variable name. To instead overwrite existing line variables, supply the ``--overwrite-ev-lines`` argument. Note that existing regions will not be overwritten (only lines). By default, a :term:`nearfield line` is also added to the :term:`EV file` at a fixed range of 1.7m from the :term:`transducer` position. The :term:`nearfield distance` can be changed as appropriate for the :term:`echosounder` in use by setting the ``--nearfield`` parameter. The colour and thickness of the lines can be customised using the ``--color-surface``, ``--thickness-surface`` (etc) arguments. See ``echofilter --list-colors`` to see the list of supported colour names. .. raw:: latex \clearpage