echofilter.raw package
Contents
echofilter.raw package#
Echoview output file loading and generation, post-processing and shard generation.
Submodules#
echofilter.raw.loader module#
Input/Output handling for raw Echoview files.
- echofilter.raw.loader.evl_loader(fname, special_to_nan=True, return_status=False)[source]#
EVL file loader
- Parameters
fname (str) – Path to .evl file.
special_to_nan (bool, optional) – Whether to replace the special value, -10000.99, which indicates no depth value, with NaN. https://support.echoview.com/WebHelp/Reference/File_formats/Export_file_formats/Special_Export_Values.htm
- Returns
numpy.ndarray of floats – Timestamps, in seconds.
numpy.ndarary of floats – Depth, in metres.
numpy.ndarary of ints, optional – Status codes.
- echofilter.raw.loader.evl_reader(fname)[source]#
EVL file reader
- Parameters
fname (str) – Path to .evl file.
- Returns
A generator which yields the timestamp (in seconds), depth (in metres), and status (int) for each entry. Note that the timestamp is not corrected for timezone (so make sure your timezones are internally consistent).
- Return type
generator
- echofilter.raw.loader.evl_writer(fname, timestamps, depths, status=1, line_ending='\r\n', pad=False)[source]#
EVL file writer
- Parameters
fname (str) – Destination of output file.
timestamps (array_like) – Timestamps for each node in the line.
depths (array_like) – Depths (in meters) for each node in the line.
status (0, 1, 2, or 3; optional) –
Status for the line.
0 : none
1 : unverified
2 : bad
3 : good
Default is 1 (unverified). For more details on line status, see https://support.echoview.com/WebHelp/Using_Echoview/Echogram/Lines/About_Line_Status.htm
pad (bool, optional) – Whether to pad the line with an extra datapoint half a pixel before the first and after the last given timestamp. Default is False.
line_ending (str, optional) – Line ending. Default is “rn” the standard line ending on Windows/DOS, as per the specification for the file format. https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm Set to “n” to get Unix-style line endings instead.
Notes
For more details on the format specification, see https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm#Line_definition_file_format
- echofilter.raw.loader.evr_writer(fname, rectangles=[], contours=[], common_notes='', default_region_type=0, line_ending='\r\n')[source]#
EVR file writer.
Writes regions to an Echoview region file.
- Parameters
fname (str) – Destination of output file.
rectangles (list of dictionaries, optional) – Rectangle region definitions. Default is an empty list. Each rectangle region must implement fields “depths” and “timestamps”, which indicate the extent of the rectangle. Optionally, “creation_type”, “region_name”, “region_type”, and “notes” may be set. If these are not given, the default creation_type is 4 and region_type is set by default_region_type.
contours (list of dictionaries) – Contour region definitions. Default is an empty list. Each contour region must implement a “points” field containing a
numpy.ndarray
shaped (n, 2) defining the co-ordinates of nodes along the (open) contour in units of timestamp and depth. Optionally, “creation_type”, “region_name”, “region_type”, and “notes” may be set. If these are not given, the default creation_type is 2 and region_type is set by default_region_type.common_notes (str, optional) – Notes to include for every region. Default is “”, an empty string.
default_region_type (int, optional) –
The region type to use for rectangles and contours which do not define a “region_type” field. Possible region types are
0 : bad (no data)
1 : analysis
2 : marker
3 : fishtracks
4 : bad (empty water)
Default is 0.
line_ending (str, optional) – Line ending. Default is “rn” the standard line ending on Windows/DOS, as per the specification for the file format. https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm Set to “n” to get Unix-style line endings instead.
Notes
For more details on the format specification, see: https://support.echoview.com/WebHelp/Reference/File_formats/Export_file_formats/2D_Region_definition_file_format.htm
- echofilter.raw.loader.get_partition_data(partition, dataset='mobile', partitioning_version='firstpass', root_data_dir='/data/dsforce/surveyExports')[source]#
Loads partition metadata.
- Parameters
- Returns
Metadata for all transects in the partition. Each row is a single sample.
- Return type
pandas.DataFrame
- echofilter.raw.loader.get_partition_list(partition, dataset='mobile', full_path=False, partitioning_version='firstpass', root_data_dir='/data/dsforce/surveyExports', sharded=False)[source]#
Get a list of transects in a single partition.
- Parameters
transect_pth (str) – Relative path to transect, excluding “_Sv_raw.csv”.
dataset (str, optional) – Name of dataset. Default is “mobile”.
full_path (bool, optional) – Whether to return the full path to the sample. If False, only the relative path (from the dataset directory) is returned. Default is False.
partitioning_version (str, optional) – Name of partitioning method.
root_data_dir (str, optional) – Path to root directory where data is located.
sharded (bool, optional) – Whether to return path to sharded version of data. Default is False.
- Returns
Path for each sample in the partition.
- Return type
- echofilter.raw.loader.load_transect_data(transect_pth, dataset='mobile', root_data_dir='/data/dsforce/surveyExports')[source]#
Load all data for one transect.
- Parameters
- Returns
timestamps (numpy.ndarray) – Timestamps (in seconds since Unix epoch), with each entry corresponding to each row in the signals data.
depths (numpy.ndarray) – Depths from the surface (in metres), with each entry corresponding to each column in the signals data.
signals (numpy.ndarray) – Echogram Sv data, shaped (num_timestamps, num_depths).
turbulence (numpy.ndarray) – Depth of turbulence line, shaped (num_timestamps, ).
bottom (numpy.ndarray) – Depth of bottom line, shaped (num_timestamps, ).
- echofilter.raw.loader.remove_trailing_slash(s)[source]#
Remove trailing forward slashes from a string.
- echofilter.raw.loader.timestamp2evdtstr(timestamp)[source]#
Converts a timestamp into an Echoview-compatible datetime string, in the format “CCYYMMDD HHmmSSssss”, where:
CC: centuryYY: yearMM: monthDD: dayHH: hourmm: minuteSS: secondssss: 0.1 milliseconds
- echofilter.raw.loader.transect_loader(fname, skip_lines=0, warn_row_overflow=None, row_len_selector='mode')[source]#
Loads an entire survey transect CSV.
- Parameters
fname (str) – Path to survey CSV file.
skip_lines (int, optional) – Number of initial entries to skip. Default is 0.
warn_row_overflow (bool or int, optional) – Whether to print a warning message if the number of elements in a row exceeds the expected number. If this is an int, this is the number of times to display the warnings before they are supressed. If this is True, the number of outputs is unlimited. If None, the maximum number of underflow and overflow warnings differ: if row_len_selector is “init” or “min”, underflow always produces a message and the overflow messages stop at 2; otherwise the values are reversed. Default is None.
row_len_selector ({"init", "min", "max", "median", "mode"}, optional) – The method used to determine which row length (number of depth samples) to use. Default is “mode”, the most common row length across all the measurement timepoints.
- Returns
numpy.ndarray – Timestamps for each row, in seconds. Note: not corrected for timezone (so make sure your timezones are internally consistent).
numpy.ndarray – Depth of each column, in metres.
numpy.ndarray – Survey signal (Sv, for instance). Units match that of the file.
- echofilter.raw.loader.transect_reader(fname)[source]#
Creates a generator which iterates through a survey csv file.
- Parameters
fname (str) – Path to survey CSV file.
- Returns
Yields a tupule of (metadata, data), where metadata is a dict, and data is a
numpy.ndarray
. Each yield corresponds to a single row in the data. Every row (except for the header) is yielded.- Return type
generator
- echofilter.raw.loader.write_transect_regions(fname, transect, depth_range=None, passive_key='is_passive', removed_key='is_removed', patches_key='mask_patches', collate_passive_length=0, collate_removed_length=0, minimum_passive_length=0, minimum_removed_length=0, minimum_patch_area=0, name_suffix='', common_notes='', line_ending='\r\n', verbose=0, verbose_indent=0)[source]#
Convert a transect dictionary to a set of regions and write as an EVR file.
- Parameters
fname (str) – Destination of output file.
transect (dict) – Transect dictionary.
depth_range (array_like or None, optional) – The minimum and maximum depth extents (in any order) of the passive and removed block regions. If this is None (default), the minimum and maximum of transect[“depths”] is used.
passive_key (str, optional) – Field name to use for passive data identification. Default is “is_passive”.
removed_key (str, optional) – Field name to use for removed blocks. Default is “is_removed”.
patches_key (str, optional) – Field name to use for the mask of patch regions. Default is “mask_patches”.
collate_passive_length (int, optional) – Maximum distance (in indices) over which passive regions should be merged together, closing small gaps between them. Default is 0.
collate_removed_length (int, optional) – Maximum distance (in indices) over which removed blocks should be merged together, closing small gaps between them. Default is 0.
minimum_passive_length (int, optional) – Minimum length (in indices) a passive region must have to be included in the output. Set to -1 to omit all passive regions from the output. Default is 0.
minimum_removed_length (int, optional) – Minimum length (in indices) a removed block must have to be included in the output. Set to -1 to omit all removed regions from the output. Default is 0.
minimum_patch_area (float, optional) – Minimum amount of area (in input pixel space) that a patch must occupy in order to be included in the output. Set to 0 to include all patches, no matter their area. Set to -1 to omit all patches. Default is 0.
name_suffix (str, optional) – Suffix to append to variable names. Default is “”, an empty string.
common_notes (str, optional) – Notes to include for every region. Default is “”, an empty string.
line_ending (str, optional) – Line ending. Default is “rn” the standard line ending on Windows/DOS, as per the specification for the file format, https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm Set to “n” to get Unix-style line endings instead.
verbose (int, optional) – Verbosity level. Default is 0.
verbose_indent (int, optional) – Level of indentation (number of preceding spaces) before verbosity messages. Default is 0.
echofilter.raw.manipulate module#
Manipulating lines and masks contained in Echoview files.
- echofilter.raw.manipulate.find_nonzero_region_boundaries(v)[source]#
Find the start and end indices for nonzero regions of a vector.
- Parameters
v (array_like) – A vector.
- Returns
starts (numpy.ndarray) – Indices for start of regions of nonzero elements in vector v
ends (numpy.ndarray) – Indices for end of regions of nonzero elements in vector v (exclusive).
Notes
For i in range(len(starts)), the set of values v[starts[i]:ends[i]] are nonzero. Values in the range v[ends[i]:starts[i+1]] are zero.
- echofilter.raw.manipulate.find_passive_data(signals, n_depth_use=38, threshold=25.0, deviation=None)[source]#
Find segments of Sv recording which correspond to passive recording.
- Parameters
signals (array_like) – Two-dimensional array of Sv values, shaped [timestamps, depths].
n_depth_use (int, optional) – How many Sv depths to use, starting with the first depths (closest to the sounder device). If None all depths are used. Default is 38.
threshold (float, optional) – Threshold for start/end of passive regions. Default is 25.
deviation (float, optional) – Threshold for start/end of passive regions is deviation times the interquartile-range of the difference between samples at neigbouring timestamps. Default is None. Only one of threshold and deviation should be set.
- Returns
passive_start (numpy.ndarray) – Indices of rows of signals at which passive segments start.
passive_end (numpy.ndarray) – Indices of rows of signals at which passive segments end.
Notes
Works by looking at the difference between consecutive recordings and finding large deviations.
- echofilter.raw.manipulate.find_passive_data_v2(signals, n_depth_use=38, threshold_inner=None, threshold_init=None, deviation=None, sigma_depth=0, sigma_time=1)[source]#
Find segments of Sv recording which correspond to passive recording.
- Parameters
signals (array_like) – Two-dimensional array of Sv values, shaped [timestamps, depths].
n_depth_use (int, optional) – How many Sv depths to use, starting with the first depths (closest to the sounder device). If None all depths are used. Default is 38. The median is taken across the depths, after taking the temporal derivative.
threshold_inner (float, optional) – Theshold to apply to the temporal derivative of the signal when detected fine-tuned start/end of passive regions. Default behaviour is to use a threshold automatically determined using deviation if it is set, and otherwise use a threshold of 35.0.
threshold_init (float, optional) – Theshold to apply during the initial scan of the start/end of passive regions, which seeds the fine-tuning search. Default behaviour is to use a threshold automatically determined using deviation if it is set, and otherwise use a threshold of 12.0.
deviation (float, optional) – Set threshold_inner to be deviation times the standard deviation of the temporal derivative of the signal. The standard deviation is robustly estimated based on the interquartile range. If this is set, threshold_inner must not be None. Default is None
sigma_depth (float, optional) – Width of kernel for filtering signals across second dimension (depth). Default is 0 (no filter).
sigma_time (float, optional) – Width of kernel for filtering signals across second dimension (time). Default is 1. Set to 0 to not filter.
- Returns
passive_start (numpy.ndarray) – Indices of rows of signals at which passive segments start.
passive_end (numpy.ndarray) – Indices of rows of signals at which passive segments end.
Notes
Works by looking at the difference between consecutive recordings and finding large deviations.
- echofilter.raw.manipulate.fix_surface_line(timestamps, d_surface, is_passive)[source]#
Fix anomalies in the surface line.
- Parameters
timestamps (array_like sized (N, )) – Timestamps for each ping.
d_surface (array_like sized (N, )) – Surface line depths.
is_passive (array_like sized (N, )) – Indicator for passive data. Values for the surface line during passive data collection will not be used.
- Returns
fixed_surface (numpy.ndarray) – Surface line depths, with anomalies replaced with median filtered values and passive data replaced with linear interpolation. Has the same size and dtype as d_surface.
is_replaced (boolean numpy.ndarray sized (N, )) – Indicates which datapoints were replaced. Note that passive data is always replaced and is marked as such.
- echofilter.raw.manipulate.fixup_lines(timestamps, depths, mask, t_turbulence=None, d_turbulence=None, t_bottom=None, d_bottom=None)[source]#
Extend existing turbulence/bottom lines based on masked target Sv output.
- Parameters
timestamps (array_like) – Shaped (num_timestamps, ).
depths (array_like) – Shaped (num_depths, ).
mask (array_like) – Boolean array, where True denotes kept entries. Shaped (num_timestamps, num_depths).
t_turbulence (array_like, optional) – Sampling times for existing turbulence line.
d_turbulence (array_like, optional) – Depth of existing turbulence line.
t_bottom (array_like, optional) – Sampling times for existing bottom line.
d_bottom (array_like, optional) – Depth of existing bottom line.
- Returns
d_turbulence_new (numpy.ndarray) – Depth of new turbulence line.
d_bottom_new (numpy.ndarray) – Depth of new bottom line.
- echofilter.raw.manipulate.join_transect(transects)[source]#
Joins segmented transects together into a single dictionary.
- Parameters
transects (iterable of dict) – Transect segments, each with the same fields and compatible shapes.
- Yields
dict – Transect data.
- echofilter.raw.manipulate.load_decomposed_transect_mask(sample_path)[source]#
Loads a raw and masked transect and decomposes the mask into turbulence and bottom lines, and passive and removed regions.
- Parameters
sample_path (str) – Path to sample, without extension. The raw data should be located at
sample_path + "_Sv_raw.csv"
.- Returns
A dictionary with keys:
- ”timestamps”numpy.ndarray
Timestamps (in seconds since Unix epoch), for each recording timepoint.
- ”depths”numpy.ndarray
Depths from the surface (in metres), with each entry corresponding to each column in the signals data.
- ”Sv”numpy.ndarray
Echogram Sv data, shaped (num_timestamps, num_depths).
- ”mask”numpy.ndarray
Logical array indicating which datapoints were kept (True) and which removed (False) for the masked Sv output. Shaped (num_timestamps, num_depths).
- ”turbulence”numpy.ndarray
For each timepoint, the depth of the shallowest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”bottom”numpy.ndarray
For each timepoint, the depth of the deepest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”is_passive”numpy.ndarray
Logical array showing whether a timepoint is of passive data. Shaped (num_timestamps, ). All passive recording data should be excluded by the mask.
- ”is_removed”numpy.ndarray
Logical array showing whether a timepoint is entirely removed by the mask. Shaped (num_timestamps, ). Does not include periods of passive recording.
- ”is_upward_facing”bool
Indicates whether the recording source is located at the deepest depth (i.e. the seabed), facing upwards. Otherwise, the recording source is at the shallowest depth (i.e. the surface), facing downwards.
- Return type
- echofilter.raw.manipulate.make_lines_from_mask(mask, depths=None, max_gap_squash=1.0)[source]#
Determines turbulence and bottom lines for a mask array.
- Parameters
mask (array_like) – A two-dimensional logical array, where for each row dimension 1 takes the value False for some unknown continuous stretch at the start and end of the column, with True values between these two masked-out regions.
depths (array_like, optional) – Depth of each sample point along dim 1 of mask. Must be either monotonically increasing or monotonically decreasing. Default is the index of mask, arange(mask.shape[1]).
max_gap_squash (float, optional) – Maximum gap to merge together, in metres. Default is 1..
- Returns
d_turbulence (numpy.ndarray) – Depth of turbulence line. This is the line of smaller depth which separates the False region of mask from the central region of True values. (If depths is monotonically increasing, this is for the start of the columns of mask, otherwise it is at the end.)
d_bottom (numpy.ndarray) – Depth of bottom line. As for d_turbulence, but for the other end of the array.
- echofilter.raw.manipulate.make_lines_from_masked_csv(fname)[source]#
Load a masked csv file output from Echoview and generate lines which reproduce the mask.
- Parameters
fname (str) – Path to file containing masked Echoview output data in csv format.
- Returns
timestamps (numpy.ndarray) – Sample timestamps.
d_turbulence (numpy.ndarray) – Depth of turbulence line.
d_bottom (numpy.ndarray) – Depth of bottom line.
- echofilter.raw.manipulate.remove_anomalies_1d(signal, thr=5, thr2=4, kernel=201, kernel2=31, return_filtered=False)[source]#
Remove anomalies from a temporal signal.
Applies a median filter to the data, and replaces datapoints which deviate from the median filtered signal by more than some threshold with the median filtered data. This process is repeated until no datapoints deviate from the filtered line by more than the threshold.
- Parameters
signal (array_like) – The signal to filter.
thr (float, optional) – The initial threshold will be thr times the standard deviation of the residuals. The standard deviation is robustly estimated from the interquartile range. Default is 5.
thr2 (float, optional) – The threshold for repeated iterations will be thr2 times the standard deviation of the remaining residuals. The standard deviation is robustly estimated from interdecile range. Default is 4.
kernel (int, optional) – The kernel size for the initial median filter. Default is 201.
kernel2 (int, optional) – The kernel size for subsequent median filters. Default is 31.
return_filtered (bool, optional) – If True, the median filtered signal is also returned. Default is False.
- Returns
signal (numpy.ndarray like signal) – The input signal with anomalies replaced with median values.
is_replaced (bool numpy.ndarray shaped like signal) – Indicator for which datapoints were replaced.
filtered (numpy.ndarray like signal, optional) – The final median filtered signal. Returned if return_filtered=True.
See also
- echofilter.raw.manipulate.split_transect(timestamps=None, threshold=20, percentile=97.5, **transect)[source]#
Splits a transect into segments each containing contiguous recordings.
- Parameters
timestamps (array_like) – A 1-d array containing the timestamp at which each recording was measured. The sampling is assumed to high-frequency with occassional gaps.
threshold (int, optional) – Threshold for splitting timestamps into segments. Any timepoints further apart than threshold times the percentile percentile of the difference between timepoints will be split apart into new segments. Default is 20.
percentile (float, optional) – The percentile at which to sample the timestamp intervals to establish a baseline typical interval. Default is 97.5.
**kwargs – Arbitrary additional transect variables, which will be split into segments as appropriate in accordance with timestamps.
- Yields
dict – Containing segmented data, key/value pairs as per given in **kwargs in addition to timestamps.
- echofilter.raw.manipulate.write_lines_for_masked_csv(fname_mask, fname_turbulence=None, fname_bottom=None)[source]#
Write new turbulence and bottom lines based on csv containing masked Echoview output.
- Parameters
fname_mask (str) – Path to input file containing masked Echoview output data in csv format.
fname_turbulence (str, optional) – Destination of generated turbulence line, written in evl format. If None (default), the output name is <fname_base>_mask-turbulence.evl, where <fname_base> is fname_mask without extension and without any occurence of the substrings _Sv_raw or _Sv in the base file name.
fname_bottom (str) – Destination of generated bottom line, written in evl format. If None (default), the output name is <fname_base>_mask-bottom.evl.
echofilter.raw.metadata module#
Dataset metadata, relevant for loading correct data.
- echofilter.raw.metadata.recall_passive_edges(sample_path, timestamps)[source]#
Defines passive data edges for samples within known datasets.
- Parameters
sample_path (str) – Path to sample.
timestamps (array_like vector) – Vector of timestamps in sample.
- Returns
passive_starts (numpy.ndarray or None) – Indices indicating the onset of passive data collection periods, or None if passive metadata is unavailable for this sample.
passive_ends (numpy.ndarray or None) – Indices indicating the offset of passive data collection periods, or None if passive metadata is unavailable for this sample.
finder_version (absent or str) – If passive_starts and passive_ends, this string may be present to indicate which passive finder algorithm works best for this dataset.
echofilter.raw.shardloader module#
Converting raw data into shards, and loading data from shards.
- echofilter.raw.shardloader.load_transect_from_shards(transect_rel_pth, i1=0, i2=None, dataset='mobile', segment=0, root_data_dir='/data/dsforce/surveyExports', **kwargs)#
Load transect data from shard files.
- Parameters
transect_rel_pth (str) – Relative path to transect.
i1 (int, optional) – Index of first sample to retrieve. Default is 0, the first sample.
i2 (int, optional) – Index of last sample to retrieve. As-per python convention, the range i1 to i2 is inclusive on the left and exclusive on the right, so datapoint i2 - 1 is the right-most datapoint loaded. Default is None, which loads everything up to and including to the last sample.
dataset (str, optional) – Name of dataset. Default is “mobile”.
segment (int, optional) – Which segment to load. Default is 0.
root_data_dir (str) – Path to root directory where data is located.
**kwargs – As per
load_transect_from_shards_abs()
.
- Returns
- Return type
- echofilter.raw.shardloader.load_transect_from_shards_abs(transect_abs_pth, i1=0, i2=None, pad_mode='edge')[source]#
Load transect data from shard files.
- Parameters
transect_abs_pth (str) – Absolute path to transect shard directory.
i1 (int, optional) – Index of first sample to retrieve. Default is 0, the first sample.
i2 (int, optional) – Index of last sample to retrieve. As-per python convention, the range i1 to i2 is inclusive on the left and exclusive on the right, so datapoint i2 - 1 is the right-most datapoint loaded. Default is None, which loads everything up to and including to the last sample.
pad_mode (str, optional) – Padding method for out-of-bounds inputs. Must be supported by
numpy.pad()
, such as “contast”, “reflect”, or “edge”. If the mode is “contast”, the array will be padded with zeros. Default is “edge”.
- Returns
A dictionary with keys:
- ”timestamps”numpy.ndarray
Timestamps (in seconds since Unix epoch), for each recording timepoint. The number of entries, num_timestamps, is equal to i2 - i1.
- ”depths”numpy.ndarray
Depths from the surface (in metres), with each entry corresponding to each column in the signals data.
- ”Sv”numpy.ndarray
Echogram Sv data, shaped (num_timestamps, num_depths).
- ”mask”numpy.ndarray
Logical array indicating which datapoints were kept (True) and which removed (False) for the masked Sv output. Shaped (num_timestamps, num_depths).
- ”turbulence”numpy.ndarray
For each timepoint, the depth of the shallowest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”bottom”numpy.ndarray
For each timepoint, the depth of the deepest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”is_passive”numpy.ndarray
Logical array showing whether a timepoint is of passive data. Shaped (num_timestamps, ). All passive recording data should be excluded by the mask.
- ”is_removed”numpy.ndarray
Logical array showing whether a timepoint is entirely removed by the mask. Shaped (num_timestamps, ). Does not include periods of passive recording.
- ”is_upward_facing”bool
Indicates whether the recording source is located at the deepest depth (i.e. the seabed), facing upwards. Otherwise, the recording source is at the shallowest depth (i.e. the surface), facing downwards.
- Return type
- echofilter.raw.shardloader.load_transect_from_shards_rel(transect_rel_pth, i1=0, i2=None, dataset='mobile', segment=0, root_data_dir='/data/dsforce/surveyExports', **kwargs)[source]#
Load transect data from shard files.
- Parameters
transect_rel_pth (str) – Relative path to transect.
i1 (int, optional) – Index of first sample to retrieve. Default is 0, the first sample.
i2 (int, optional) – Index of last sample to retrieve. As-per python convention, the range i1 to i2 is inclusive on the left and exclusive on the right, so datapoint i2 - 1 is the right-most datapoint loaded. Default is None, which loads everything up to and including to the last sample.
dataset (str, optional) – Name of dataset. Default is “mobile”.
segment (int, optional) – Which segment to load. Default is 0.
root_data_dir (str) – Path to root directory where data is located.
**kwargs – As per
load_transect_from_shards_abs()
.
- Returns
- Return type
- echofilter.raw.shardloader.load_transect_segments_from_shards_abs(transect_abs_pth, segments=None)[source]#
Load transect data from shard files.
- Parameters
transect_abs_pth (str) – Absolute path to transect shard segments directory.
segments (iterable or None) – Which segments to load. If None (default), all segments are loaded.
- Returns
- Return type
- echofilter.raw.shardloader.load_transect_segments_from_shards_rel(transect_rel_pth, dataset='mobile', segments=None, root_data_dir='/data/dsforce/surveyExports')[source]#
Load transect data from shard files.
- Parameters
transect_rel_pth (str) – Relative path to transect.
dataset (str, optional) – Name of dataset. Default is “mobile”.
segments (iterable or None) – Which segments to load. If None (default), all segments are loaded.
root_data_dir (str) – Path to root directory where data is located.
**kwargs – As per
load_transect_from_shards_abs()
.
- Returns
- Return type
- echofilter.raw.shardloader.segment_and_shard_transect(transect_pth, dataset='mobile', max_depth=None, shard_len=128, root_data_dir='/data/dsforce/surveyExports')[source]#
Creates a sharded copy of a transect, with the transect cut into segments based on recording starts/stops. Each segment is split across multiple files (shards) for efficient loading.
- Parameters
transect_pth (str) – Relative path to transect, excluding “_Sv_raw.csv”.
dataset (str, optional) – Name of dataset. Default is “mobile”.
max_depth (float or None, optional) – The maximum depth to include in the saved shard. Data corresponding to deeper locations is omitted to save on load time and memory when the shard is loaded. If None, no cropping is applied. Default is None.
shard_len (int, optional) – Number of timestamp samples to include in each shard. Default is 128.
root_data_dir (str) – Path to root directory where data is located.
Notes
The segments will be written to the directories <root_data_dir>_sharded/<dataset>/transect_path/<segment>/ For the contents of each directory, see write_transect_shards.
- echofilter.raw.shardloader.shard_transect(transect_pth, dataset='mobile', max_depth=None, shard_len=128, root_data_dir='/data/dsforce/surveyExports')#
Creates a sharded copy of a transect, with the transect cut into segments based on recording starts/stops. Each segment is split across multiple files (shards) for efficient loading.
- Parameters
transect_pth (str) – Relative path to transect, excluding “_Sv_raw.csv”.
dataset (str, optional) – Name of dataset. Default is “mobile”.
max_depth (float or None, optional) – The maximum depth to include in the saved shard. Data corresponding to deeper locations is omitted to save on load time and memory when the shard is loaded. If None, no cropping is applied. Default is None.
shard_len (int, optional) – Number of timestamp samples to include in each shard. Default is 128.
root_data_dir (str) – Path to root directory where data is located.
Notes
The segments will be written to the directories <root_data_dir>_sharded/<dataset>/transect_path/<segment>/ For the contents of each directory, see write_transect_shards.
- echofilter.raw.shardloader.write_transect_shards(dirname, transect, max_depth=None, shard_len=128)[source]#
Creates a sharded copy of a transect, with the transect cut by timestamp and split across multiple files.
- Parameters
dirname (str) – Path to output directory.
transect (dict) – Observed values for the transect. Should already be segmented.
max_depth (float or None, optional) – The maximum depth to include in the saved shard. Data corresponding to deeper locations is omitted to save on load time and memory when the shard is loaded. If None, no cropping is applied. Default is None.
shard_len (int, optional) – Number of timestamp samples to include in each shard. Default is 128.
Notes
The output will be written to the directory dirname, and will contain:
a file named “shard_size.txt”, which contains the sharding metadata: total number of samples, and shard size;
a directory for each shard, named 0, 1, … Each shard directory will contain files:
depths.npy
timestamps.npy
Sv.npy
mask.npy
turbulence.npy
bottom.npy
is_passive.npy
is_removed.npy
is_upward_facing.npy
which contain pickled numpy dumps of the matrices for each shard.
echofilter.raw.utils module#
Loader utility functions.
- echofilter.raw.utils.integrate_area_of_contour(x, y, closed=None, preserve_sign=False)[source]#
Compute the area within a contour, using Green’s algorithm.
- Parameters
x (array_like vector) – x co-ordinates of nodes along the contour.
y (array_like vector) – y co-ordinates of nodes along the contour.
closed (bool or None, optional) – Whether the contour is already closed. If False, it will be closed before deterimining the area. If None (default), it is automatically determined as to whether the contour is already closed, and is closed if necessary.
preserve_sign (bool, optional) – Whether to preserve the sign of the area. If True, the area is positive if the contour is anti-clockwise and negative if it is clockwise oriented. Default is False, which always returns a positive area.
- Returns
area – The integral of the area witihn the contour.
- Return type
Notes
https://en.wikipedia.org/wiki/Green%27s_theorem#Area_calculation
- echofilter.raw.utils.interp1d_preserve_nan(x, y, x_samples, nan_threshold=0.0, bounds_error=False, **kwargs)[source]#
Interpolate a 1-D function, preserving NaNs.
x and y are arrays of values used to approximate some function f:
y = f(x)
. We exclude NaNs for the interpolation and then mask out entries which are adjacent (or close to) a NaN in the input.- Parameters
x ((N,) array_like) – A 1-D array of real values. Must not contain NaNs.
y ((...,N,...) array_like) – A N-D array of real values. The length of y along the interpolation axis must be equal to the length of x. May contain NaNs.
x_samples (array_like) – A 1-D array of real values at which the interpolation function will be sampled.
nan_threshold (float, optional) – Minimum amount of influence a NaN must have on an output sample for it to become a NaN. Default is 0. i.e. any influence.
bounds_error (bool, optional) – If True, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If False (default), out of bounds values are assigned value fill_value (whose default is NaN).
**kwargs – Additional keyword arguments are as per
scipy.interpolate.interp1d()
.
- Returns
y_samples – The result of interpolating, with sample points close to NaNs in the input returned as NaN.
- Return type
(…,N,…) np.ndarray
- echofilter.raw.utils.medfilt1d(signal, kernel_size, axis=- 1, pad_mode='reflect')[source]#
Median filter in 1d, with support for selecting padding mode.
- Parameters
- Returns
filtered – The filtered signal.
- Return type
array_like
See also
-
,-
- echofilter.raw.utils.pad1d(array, pad_width, axis=0, **kwargs)[source]#
Pad an array along a single axis only.
- Parameters
- Returns
Padded array.
- Return type
numpy.ndarary
See also
- echofilter.raw.utils.squash_gaps(mask, max_gap_squash, axis=- 1, inplace=False)[source]#
Merge small gaps between zero values in a boolean array.
- Parameters
mask (boolean array) – The input mask, with small gaps between zero values which will be squashed with zeros.
max_gap_squash (int) – Maximum length of gap to squash.
axis (int, optional) – Axis on which to operate. Default is -1.
inplace (bool, optional) – Whether to operate on the original array. If False, a copy is created and returned.
- Returns
merged_mask – Mask as per the input, but with small gaps squashed.
- Return type
boolean array