echofilter.raw package
Contents
echofilter.raw package#
Echoview output file loading and generation, post-processing and shard generation.
Submodules#
echofilter.raw.loader module#
Input/Output handling for raw Echoview files.
- echofilter.raw.loader.evdtstr2timestamp(datestr, timestr=None)[source]#
Convert an Echoview-compatible datetime string into a Unix epoch timestamp.
- Parameters
- Returns
timestamp – Number of seconds since Unix epoch.
- Return type
- echofilter.raw.loader.evl_loader(fname, special_to_nan=True, return_status=False)[source]#
EVL file loader.
- Parameters
fname (str) – Path to .evl file.
special_to_nan (bool, optional) – Whether to replace the special value,
-10000.99
, which indicates no depth value, with NaN. https://support.echoview.com/WebHelp/Reference/File_formats/Export_file_formats/Special_Export_Values.htm
- Returns
numpy.ndarray of floats – Timestamps, in seconds.
numpy.ndarary of floats – Depth, in metres.
numpy.ndarary of ints, optional – Status codes.
- echofilter.raw.loader.evl_reader(fname)[source]#
EVL file reader.
- Parameters
fname (str) – Path to .evl file.
- Returns
A generator which yields the timestamp (in seconds), depth (in metres), and status (int) for each entry. Note that the timestamp is not corrected for timezone (so make sure your timezones are internally consistent).
- Return type
generator
- echofilter.raw.loader.evl_writer(fname, timestamps, depths, status=1, line_ending='\r\n', pad=False)[source]#
EVL file writer.
- Parameters
fname (str) – Destination of output file.
timestamps (array_like) – Timestamps for each node in the line.
depths (array_like) – Depths (in meters) for each node in the line.
status (0, 1, 2, or 3; optional) –
Status for the line.
0
: none1
: unverified2
: bad3
: good
Default is
1
(unverified). For more details on line status, see https://support.echoview.com/WebHelp/Using_Echoview/Echogram/Lines/About_Line_Status.htmpad (bool, optional) – Whether to pad the line with an extra datapoint half a pixel before the first and after the last given timestamp. Default is
False
.line_ending (str, optional) – Line ending. Default is
"\r\n"
the standard line ending on Windows/DOS, as per the specification for the file format. https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm Set to"\n"
to get Unix-style line endings instead.
Notes
For more details on the format specification, see https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm#Line_definition_file_format
- echofilter.raw.loader.evr_reader(fname, parse_echofilter_regions=True)[source]#
Echoview region file (EVR) reader.
- Parameters
- Returns
regions_passive (list of tuples, optional) – Start and end timestamps for passive regions.
regions_removed (list of tuples, optional) – Start and end timestamps for removed vertical bands.
regions_patch (list of lists, optional) – Start and end timestamps for bad data patches.
regions_other (list of dicts) – Dictionary mapping creation type to points defining each region.
- echofilter.raw.loader.evr_writer(fname, rectangles=None, contours=None, common_notes='', default_region_type=0, line_ending='\r\n')[source]#
EVR file writer.
Writes regions to an Echoview region file.
- Parameters
fname (str) – Destination of output file.
rectangles (list of dictionaries, optional) – Rectangle region definitions. Default is an empty list. Each rectangle region must implement fields
"depths"
and"timestamps"
, which indicate the extent of the rectangle. Optionally,"creation_type"
,"region_name"
,"region_type"
, and"notes"
may be set. If these are not given, the default creation_type is 4 and region_type is set bydefault_region_type
.contours (list of dictionaries) – Contour region definitions. Default is an empty list. Each contour region must implement a
"points"
field containing anumpy.ndarray
shaped (n, 2) defining the co-ordinates of nodes along the (open) contour in units of timestamp and depth. Optionally,"creation_type"
,"region_name"
,"region_type"
, and"notes"
may be set. If these are not given, the default creation_type is 2 and region_type is set bydefault_region_type
.common_notes (str, optional) – Notes to include for every region. Default is
""
, an empty string.default_region_type (int, optional) –
The region type to use for rectangles and contours which do not define a
"region_type"
field. Possible region types are0
: bad (no data)1
: analysis2
: marker3
: fishtracks4
: bad (empty water)
Default is
0
.line_ending (str, optional) – Line ending. Default is
"\r\n"
the standard line ending on Windows/DOS, as per the specification for the file format. https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm Set to"\n"
to get Unix-style line endings instead.
Notes
For more details on the format specification, see: https://support.echoview.com/WebHelp/Reference/File_formats/Export_file_formats/2D_Region_definition_file_format.htm
- echofilter.raw.loader.get_partition_data(partition, dataset='mobile', partitioning_version='firstpass', root_data_dir='/data/dsforce/surveyExports')[source]#
Load partition metadata.
- Parameters
- Returns
Metadata for all transects in the partition. Each row is a single sample.
- Return type
pandas.DataFrame
- echofilter.raw.loader.get_partition_list(partition, dataset='mobile', full_path=False, partitioning_version='firstpass', root_data_dir='/data/dsforce/surveyExports', sharded=False)[source]#
Get a list of transects in a single partition.
- Parameters
transect_pth (str) – Relative path to transect, excluding
"_Sv_raw.csv"
.dataset (str, optional) – Name of dataset. Default is
"mobile"
.full_path (bool, optional) – Whether to return the full path to the sample. If
False
, only the relative path (from the dataset directory) is returned. Default isFalse
.partitioning_version (str, optional) – Name of partitioning method.
root_data_dir (str, optional) – Path to root directory where data is located.
sharded (bool, optional) – Whether to return path to sharded version of data. Default is
False
.
- Returns
Path for each sample in the partition.
- Return type
- echofilter.raw.loader.load_transect_data(transect_pth, dataset='mobile', root_data_dir='/data/dsforce/surveyExports')[source]#
Load all data for one transect.
- Parameters
- Returns
timestamps (numpy.ndarray) – Timestamps (in seconds since Unix epoch), with each entry corresponding to each row in the
signals
data.depths (numpy.ndarray) – Depths from the surface (in metres), with each entry corresponding to each column in the
signals
data.signals (numpy.ndarray) – Echogram Sv data, shaped (num_timestamps, num_depths).
turbulence (numpy.ndarray) – Depth of turbulence line, shaped (num_timestamps, ).
bottom (numpy.ndarray) – Depth of bottom line, shaped (num_timestamps, ).
- echofilter.raw.loader.regions2mask(timestamps, depths, regions_passive=None, regions_removed=None, regions_patch=None, regions_other=None)[source]#
Convert regions to mask.
Takes the output from :func:evr_reader` and returns a set of masks.
- Parameters
timestamps (array_like) – Timestamps for each node in the line.
depths (array_like) – Depths (in meters) for each node in the line.
regions_passive (list of tuples, optional) – Start and end timestamps for passive regions.
regions_removed (list of tuples, optional) – Start and end timestamps for removed vertical bands.
regions_patch (list of lists, optional) – Start and end timestamps for bad data patches.
regions_other (list of dicts) – Dictionary mapping creation type to points defining each region.
- Returns
transect –
A dictionary with keys:
- ”is_passive”numpy.ndarray
Logical array showing whether a timepoint is of passive data. Shaped
(num_timestamps, )
. All passive recording data should be excluded by the mask.
- ”is_removed”numpy.ndarray
Logical array showing whether a timepoint is entirely removed by the mask. Shaped
(num_timestamps, )
.
- ”mask_patches”numpy.ndarray
Logical array indicating which datapoints are inside a patch from regions_patch (
True
) and should be excluded by the mask. Shaped(num_timestamps, num_depths)
.
- ”mask”numpy.ndarray
Logical array indicating which datapoints should be kept (
True
) and which are marked as removed (False
) by one of the other three outputs. Shaped(num_timestamps, num_depths)
.
- Return type
- echofilter.raw.loader.remove_trailing_slash(s)[source]#
Remove trailing forward slashes from a string.
- echofilter.raw.loader.timestamp2evdtstr(timestamp)[source]#
Convert a timestamp into an Echoview-compatible datetime string.
The output is in the format “CCYYMMDD HHmmSSssss”, where:
CC: centuryYY: yearMM: monthDD: dayHH: hourmm: minuteSS: secondssss: 0.1 milliseconds
- echofilter.raw.loader.transect_loader(fname, skip_lines=0, warn_row_overflow=None, row_len_selector='mode')[source]#
Load an entire survey transect CSV.
- Parameters
fname (str) – Path to survey CSV file.
skip_lines (int, optional) – Number of initial entries to skip. Default is 0.
warn_row_overflow (bool or int, optional) – Whether to print a warning message if the number of elements in a row exceeds the expected number. If this is an int, this is the number of times to display the warnings before they are supressed. If this is
True
, the number of outputs is unlimited. IfNone
, the maximum number of underflow and overflow warnings differ: ifrow_len_selector
is"init"
or"min"
, underflow always produces a message and the overflow messages stop at 2; otherwise the values are reversed. Default isNone
.row_len_selector ({"init", "min", "max", "median", "mode"}, optional) – The method used to determine which row length (number of depth samples) to use. Default is
"mode"
, the most common row length across all the measurement timepoints.
- Returns
numpy.ndarray – Timestamps for each row, in seconds. Note: not corrected for timezone (so make sure your timezones are internally consistent).
numpy.ndarray – Depth of each column, in metres.
numpy.ndarray – Survey signal (Sv, for instance). Units match that of the file.
- echofilter.raw.loader.transect_reader(fname)[source]#
Create a generator which iterates through a survey csv file.
- Parameters
fname (str) – Path to survey CSV file.
- Returns
Yields a tupule of (metadata, data), where metadata is a dict, and data is a
numpy.ndarray
. Each yield corresponds to a single row in the data. Every row (except for the header) is yielded.- Return type
generator
- echofilter.raw.loader.write_transect_regions(fname, transect, depth_range=None, passive_key='is_passive', removed_key='is_removed', patches_key='mask_patches', collate_passive_length=0, collate_removed_length=0, minimum_passive_length=0, minimum_removed_length=0, minimum_patch_area=0, name_suffix='', common_notes='', line_ending='\r\n', verbose=0, verbose_indent=0)[source]#
Convert a transect dictionary to a set of regions and write as an EVR file.
- Parameters
fname (str) – Destination of output file.
transect (dict) – Transect dictionary.
depth_range (array_like or None, optional) – The minimum and maximum depth extents (in any order) of the passive and removed block regions. If this is
None
(default), the minimum and maximum oftransect["depths"]
is used.passive_key (str, optional) – Field name to use for passive data identification. Default is
"is_passive"
.removed_key (str, optional) – Field name to use for removed blocks. Default is
"is_removed"
.patches_key (str, optional) – Field name to use for the mask of patch regions. Default is
"mask_patches"
.collate_passive_length (int, optional) – Maximum distance (in indices) over which passive regions should be merged together, closing small gaps between them. Default is
0
.collate_removed_length (int, optional) – Maximum distance (in indices) over which removed blocks should be merged together, closing small gaps between them. Default is
0
.minimum_passive_length (int, optional) – Minimum length (in indices) a passive region must have to be included in the output. Set to -1 to omit all passive regions from the output. Default is
0
.minimum_removed_length (int, optional) – Minimum length (in indices) a removed block must have to be included in the output. Set to -1 to omit all removed regions from the output. Default is
0
.minimum_patch_area (float, optional) – Minimum amount of area (in input pixel space) that a patch must occupy in order to be included in the output. Set to
0
to include all patches, no matter their area. Set to-1
to omit all patches. Default is0
.name_suffix (str, optional) – Suffix to append to variable names. Default is
""
, an empty string.common_notes (str, optional) – Notes to include for every region. Default is
""
, an empty string.line_ending (str, optional) – Line ending. Default is
"\r\n"
the standard line ending on Windows/DOS, as per the specification for the file format, https://support.echoview.com/WebHelp/Using_Echoview/Exporting/Exporting_data/Exporting_line_data.htm Set to"\n"
to get Unix-style line endings instead.verbose (int, optional) – Verbosity level. Default is
0
.verbose_indent (int, optional) – Level of indentation (number of preceding spaces) before verbosity messages. Default is
0
.
echofilter.raw.manipulate module#
Manipulating lines and masks contained in Echoview files.
- echofilter.raw.manipulate.find_nonzero_region_boundaries(v)[source]#
Find the start and end indices for nonzero regions of a vector.
- Parameters
v (array_like) – A vector.
- Returns
starts (numpy.ndarray) – Indices for start of regions of nonzero elements in vector
v
ends (numpy.ndarray) – Indices for end of regions of nonzero elements in vector
v
(exclusive).
Notes
For
i
inrange(len(starts))
, the set of valuesv[starts[i]:ends[i]]
are nonzero. Values in the rangev[ends[i]:starts[i+1]]
are zero.
- echofilter.raw.manipulate.find_passive_data(signals, n_depth_use=38, threshold=25.0, deviation=None)[source]#
Find segments of Sv recording which correspond to passive recording.
- Parameters
signals (array_like) – Two-dimensional array of Sv values, shaped [timestamps, depths].
n_depth_use (int, optional) – How many Sv depths to use, starting with the first depths (closest to the sounder device). If
None
all depths are used. Default is38
.threshold (float, optional) – Threshold for start/end of passive regions. Default is
25
.deviation (float, optional) – Threshold for start/end of passive regions is
deviation
times the interquartile-range of the difference between samples at neigbouring timestamps. Default isNone
. Only one ofthreshold
anddeviation
should be set.
- Returns
passive_start (numpy.ndarray) – Indices of rows of
signals
at which passive segments start.passive_end (numpy.ndarray) – Indices of rows of
signals
at which passive segments end.
Notes
Works by looking at the difference between consecutive recordings and finding large deviations.
- echofilter.raw.manipulate.find_passive_data_v2(signals, n_depth_use=38, threshold_inner=None, threshold_init=None, deviation=None, sigma_depth=0, sigma_time=1)[source]#
Find segments of Sv recording which correspond to passive recording.
- Parameters
signals (array_like) – Two-dimensional array of Sv values, shaped [timestamps, depths].
n_depth_use (int, optional) – How many Sv depths to use, starting with the first depths (closest to the sounder device). If
None
all depths are used. Default is38
. The median is taken across the depths, after taking the temporal derivative.threshold_inner (float, optional) – Theshold to apply to the temporal derivative of the signal when detected fine-tuned start/end of passive regions. Default behaviour is to use a threshold automatically determined using
deviation
if it is set, and otherwise use a threshold of35.0
.threshold_init (float, optional) – Theshold to apply during the initial scan of the start/end of passive regions, which seeds the fine-tuning search. Default behaviour is to use a threshold automatically determined using
deviation
if it is set, and otherwise use a threshold of12.0
.deviation (float, optional) – Set
threshold_inner
to bedeviation
times the standard deviation of the temporal derivative of the signal. The standard deviation is robustly estimated based on the interquartile range. If this is set,threshold_inner
must not beNone
. Default isNone
sigma_depth (float, optional) – Width of kernel for filtering signals across second dimension (depth). Default is
0
(no filter).sigma_time (float, optional) – Width of kernel for filtering signals across second dimension (time). Default is
1
. Set to0
to not filter.
- Returns
passive_start (numpy.ndarray) – Indices of rows of
signals
at which passive segments start.passive_end (numpy.ndarray) – Indices of rows of
signals
at which passive segments end.
Notes
Works by looking at the difference between consecutive recordings and finding large deviations.
- echofilter.raw.manipulate.fix_surface_line(timestamps, d_surface, is_passive)[source]#
Fix anomalies in the surface line.
- Parameters
timestamps (array_like sized (N, )) – Timestamps for each ping.
d_surface (array_like sized (N, )) – Surface line depths.
is_passive (array_like sized (N, )) – Indicator for passive data. Values for the surface line during passive data collection will not be used.
- Returns
fixed_surface (numpy.ndarray) – Surface line depths, with anomalies replaced with median filtered values and passive data replaced with linear interpolation. Has the same size and dtype as
d_surface
.is_replaced (boolean numpy.ndarray sized (N, )) – Indicates which datapoints were replaced. Note that passive data is always replaced and is marked as such.
- echofilter.raw.manipulate.fixup_lines(timestamps, depths, mask, t_turbulence=None, d_turbulence=None, t_bottom=None, d_bottom=None)[source]#
Extend existing turbulence/bottom lines based on masked target Sv output.
- Parameters
timestamps (array_like) – Shaped (num_timestamps, ).
depths (array_like) – Shaped (num_depths, ).
mask (array_like) – Boolean array, where
True
denotes kept entries. Shaped (num_timestamps, num_depths).t_turbulence (array_like, optional) – Sampling times for existing turbulence line.
d_turbulence (array_like, optional) – Depth of existing turbulence line.
t_bottom (array_like, optional) – Sampling times for existing bottom line.
d_bottom (array_like, optional) – Depth of existing bottom line.
- Returns
d_turbulence_new (numpy.ndarray) – Depth of new turbulence line.
d_bottom_new (numpy.ndarray) – Depth of new bottom line.
- echofilter.raw.manipulate.join_transect(transects)[source]#
Join segmented transects together into a single dictionary.
- Parameters
transects (iterable of dict) – Transect segments, each with the same fields and compatible shapes.
- Yields
dict – Transect data.
- echofilter.raw.manipulate.load_decomposed_transect_mask(sample_path)[source]#
Load a raw and masked transect and decompose the mask.
The mask is decomposed into turbulence and bottom lines, and passive and removed regions.
- Parameters
sample_path (str) – Path to sample, without extension. The raw data should be located at
sample_path + "_Sv_raw.csv"
.- Returns
A dictionary with keys:
- ”timestamps”numpy.ndarray
Timestamps (in seconds since Unix epoch), for each recording timepoint.
- ”depths”numpy.ndarray
Depths from the surface (in metres), with each entry corresponding to each column in the
signals
data.
- ”Sv”numpy.ndarray
Echogram Sv data, shaped (num_timestamps, num_depths).
- ”mask”numpy.ndarray
Logical array indicating which datapoints were kept (
True
) and which removed (False
) for the masked Sv output. Shaped (num_timestamps, num_depths).
- ”turbulence”numpy.ndarray
For each timepoint, the depth of the shallowest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”bottom”numpy.ndarray
For each timepoint, the depth of the deepest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”is_passive”numpy.ndarray
Logical array showing whether a timepoint is of passive data. Shaped (num_timestamps, ). All passive recording data should be excluded by the mask.
- ”is_removed”numpy.ndarray
Logical array showing whether a timepoint is entirely removed by the mask. Shaped (num_timestamps, ). Does not include periods of passive recording.
- ”is_upward_facing”bool
Indicates whether the recording source is located at the deepest depth (i.e. the seabed), facing upwards. Otherwise, the recording source is at the shallowest depth (i.e. the surface), facing downwards.
- Return type
- echofilter.raw.manipulate.make_lines_from_mask(mask, depths=None, max_gap_squash=1.0)[source]#
Determine turbulence and bottom lines for a mask array.
- Parameters
mask (array_like) – A two-dimensional logical array, where for each row dimension 1 takes the value
False
for some unknown continuous stretch at the start and end of the column, withTrue
values between these two masked-out regions.depths (array_like, optional) – Depth of each sample point along dim 1 of
mask
. Must be either monotonically increasing or monotonically decreasing. Default is the index ofmask
,arange(mask.shape[1])
.max_gap_squash (float, optional) – Maximum gap to merge together, in metres. Default is
1.
.
- Returns
d_turbulence (numpy.ndarray) – Depth of turbulence line. This is the line of smaller depth which separates the
False
region ofmask
from the central region ofTrue
values. (Ifdepths
is monotonically increasing, this is for the start of the columns ofmask
, otherwise it is at the end.)d_bottom (numpy.ndarray) – Depth of bottom line. As for
d_turbulence
, but for the other end of the array.
- echofilter.raw.manipulate.make_lines_from_masked_csv(fname)[source]#
Load a masked csv file and convert its mask to lines.
- Parameters
fname (str) – Path to file containing masked Echoview output data in csv format.
- Returns
timestamps (numpy.ndarray) – Sample timestamps.
d_turbulence (numpy.ndarray) – Depth of turbulence line.
d_bottom (numpy.ndarray) – Depth of bottom line.
- echofilter.raw.manipulate.pad_transect(transect, pad=32, pad_mode='reflect', previous_padding='diff')[source]#
Pad a transect in the timestamps dimension (axis 0).
- Parameters
transect (dict) – A dictionary of transect data.
pad (int, default=32) – Amount of padding to add.
pad_mode (str, default="reflect") – Padding method for out-of-bounds inputs. Must be supported by
numpy.pad()
, such as"contast"
,"reflect"
, or"edge"
. If the mode is"contast"
, the array will be padded with zeros.previous_padding ({"diff", "add", "noop"}, default="diff") –
How to handle this padding if the transect has already been padded.
"diff"
Extend the padding up to the target
pad
value."add"
Add this padding irrespective of pre-existing padding.
"noop"
Don’t add any new padding if previously padded.
- Returns
transect – Like input
transect
, but with all time-like dimensions extended with padding and fields"_pad_start"
and"_pad_end"
changed to indicate the total padding (including any pre-existing padding).- Return type
- echofilter.raw.manipulate.remove_anomalies_1d(signal, thr=5, thr2=4, kernel=201, kernel2=31, return_filtered=False)[source]#
Remove anomalies from a temporal signal.
Apply a median filter to the data, and replaces datapoints which deviate from the median filtered signal by more than some threshold with the median filtered data. This process is repeated until no datapoints deviate from the filtered line by more than the threshold.
- Parameters
signal (array_like) – The signal to filter.
thr (float, optional) – The initial threshold will be
thr
times the standard deviation of the residuals. The standard deviation is robustly estimated from the interquartile range. Default is5
.thr2 (float, optional) – The threshold for repeated iterations will be
thr2
times the standard deviation of the remaining residuals. The standard deviation is robustly estimated from interdecile range. Default is4
.kernel (int, optional) – The kernel size for the initial median filter. Default is
201
.kernel2 (int, optional) – The kernel size for subsequent median filters. Default is
31
.return_filtered (bool, optional) – If
True
, the median filtered signal is also returned. Default isFalse
.
- Returns
signal (numpy.ndarray like signal) – The input signal with anomalies replaced with median values.
is_replaced (bool numpy.ndarray shaped like signal) – Indicator for which datapoints were replaced.
filtered (numpy.ndarray like signal, optional) – The final median filtered signal. Returned if
return_filtered=True
.
See also
- echofilter.raw.manipulate.split_transect(timestamps=None, threshold=20, percentile=97.5, max_length=- 1, pad_length=32, pad_on='max', **transect)[source]#
Split a transect into segments each containing contiguous recordings.
- Parameters
timestamps (array_like) – A 1-d array containing the timestamp at which each recording was measured. The sampling is assumed to high-frequency with occassional gaps.
threshold (int, optional) – Threshold for splitting timestamps into segments. Any timepoints further apart than
threshold
times thepercentile
percentile of the difference between timepoints will be split apart into new segments. Default is20
.percentile (float, optional) – The percentile at which to sample the timestamp intervals to establish a baseline typical interval. Default is
97.5
.max_length (int, default=-1) – Maximum length of each segment. Set to
0
or-1
to disable (default).pad_length (int, default=32) – Amount of overlap between the segments. Set to
0
to disable.pad_on ({"max", "thr", "all", "none"}, default="max") – Apply overlap padding when the transect is split due to either the total length exceeding the maximum (
"max"
), the time delta exceeding the threshold ("thr"
), or both ("all"
).**kwargs – Arbitrary additional transect variables, which will be split into segments as appropriate in accordance with
timestamps
.
- Yields
dict – Containing segmented data, key/value pairs as per given in
**kwargs
in addition totimestamps
.
- echofilter.raw.manipulate.write_lines_for_masked_csv(fname_mask, fname_turbulence=None, fname_bottom=None)[source]#
Write turbulence and bottom lines based on masked csv file.
- Parameters
fname_mask (str) – Path to input file containing masked Echoview output data in csv format.
fname_turbulence (str, optional) – Destination of generated turbulence line, written in evl format. If
None
(default), the output name is<fname_base>_mask-turbulence.evl
, where<fname_base>
isfname_mask
without extension and without any occurence of the substrings_Sv_raw
or_Sv
in the base file name.fname_bottom (str) – Destination of generated bottom line, written in evl format. If
None
(default), the output name is<fname_base>_mask-bottom.evl
.
echofilter.raw.metadata module#
Dataset metadata, relevant for loading correct data.
- echofilter.raw.metadata.recall_passive_edges(sample_path, timestamps)[source]#
Define passive data edges for samples within known datasets.
- Parameters
sample_path (str) – Path to sample.
timestamps (array_like vector) – Vector of timestamps in sample.
- Returns
passive_starts (numpy.ndarray or None) – Indices indicating the onset of passive data collection periods, or
None
if passive metadata is unavailable for this sample.passive_ends (numpy.ndarray or None) – Indices indicating the offset of passive data collection periods, or
None
if passive metadata is unavailable for this sample.finder_version (absent or str) – If
passive_starts
andpassive_ends
, this string may be present to indicate which passive finder algorithm works best for this dataset.
echofilter.raw.shardloader module#
Converting raw data into shards, and loading data from shards.
- echofilter.raw.shardloader.load_transect_from_shards(transect_rel_pth, i1=0, i2=None, dataset='mobile', segment=0, root_data_dir='/data/dsforce/surveyExports', **kwargs)#
Load transect data from shard files.
- Parameters
transect_rel_pth (str) – Relative path to transect.
i1 (int, optional) – Index of first sample to retrieve. Default is
0
, the first sample.i2 (int, optional) – Index of last sample to retrieve. As-per python convention, the range
i1
toi2
is inclusive on the left and exclusive on the right, so datapoint i2 - 1 is the right-most datapoint loaded. Default isNone
, which loads everything up to and including to the last sample.dataset (str, optional) – Name of dataset. Default is
"mobile"
.segment (int, optional) – Which segment to load. Default is
0
.root_data_dir (str) – Path to root directory where data is located.
**kwargs – As per
load_transect_from_shards_abs()
.
- Returns
- Return type
- echofilter.raw.shardloader.load_transect_from_shards_abs(transect_abs_pth, i1=0, i2=None, pad_mode='edge')[source]#
Load transect data from shard files.
- Parameters
transect_abs_pth (str) – Absolute path to transect shard directory.
i1 (int, optional) – Index of first sample to retrieve. Default is
0
, the first sample.i2 (int, optional) – Index of last sample to retrieve. As-per python convention, the range
i1
toi2
is inclusive on the left and exclusive on the right, so datapoint i2 - 1 is the right-most datapoint loaded. Default isNone
, which loads everything up to and including to the last sample.pad_mode (str, optional) – Padding method for out-of-bounds inputs. Must be supported by
numpy.pad()
, such as"contast"
,"reflect"
, or"edge"
. If the mode is"contast"
, the array will be padded with zeros. Default is “edge”.
- Returns
A dictionary with keys:
- ”timestamps”numpy.ndarray
Timestamps (in seconds since Unix epoch), for each recording timepoint. The number of entries,
num_timestamps
, is equal to i2 - i1.
- ”depths”numpy.ndarray
Depths from the surface (in metres), with each entry corresponding to each column in the
signals
data.
- ”Sv”numpy.ndarray
Echogram Sv data, shaped (num_timestamps, num_depths).
- ”mask”numpy.ndarray
Logical array indicating which datapoints were kept (
True
) and which removed (False
) for the masked Sv output. Shaped (num_timestamps, num_depths).
- ”turbulence”numpy.ndarray
For each timepoint, the depth of the shallowest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”bottom”numpy.ndarray
For each timepoint, the depth of the deepest datapoint which should be included for the mask. Shaped (num_timestamps, ).
- ”is_passive”numpy.ndarray
Logical array showing whether a timepoint is of passive data. Shaped (num_timestamps, ). All passive recording data should be excluded by the mask.
- ”is_removed”numpy.ndarray
Logical array showing whether a timepoint is entirely removed by the mask. Shaped (num_timestamps, ). Does not include periods of passive recording.
- ”is_upward_facing”bool
Indicates whether the recording source is located at the deepest depth (i.e. the seabed), facing upwards. Otherwise, the recording source is at the shallowest depth (i.e. the surface), facing downwards.
- Return type
- echofilter.raw.shardloader.load_transect_from_shards_rel(transect_rel_pth, i1=0, i2=None, dataset='mobile', segment=0, root_data_dir='/data/dsforce/surveyExports', **kwargs)[source]#
Load transect data from shard files.
- Parameters
transect_rel_pth (str) – Relative path to transect.
i1 (int, optional) – Index of first sample to retrieve. Default is
0
, the first sample.i2 (int, optional) – Index of last sample to retrieve. As-per python convention, the range
i1
toi2
is inclusive on the left and exclusive on the right, so datapoint i2 - 1 is the right-most datapoint loaded. Default isNone
, which loads everything up to and including to the last sample.dataset (str, optional) – Name of dataset. Default is
"mobile"
.segment (int, optional) – Which segment to load. Default is
0
.root_data_dir (str) – Path to root directory where data is located.
**kwargs – As per
load_transect_from_shards_abs()
.
- Returns
- Return type
- echofilter.raw.shardloader.load_transect_segments_from_shards_abs(transect_abs_pth, segments=None)[source]#
Load transect data from shard files.
- Parameters
transect_abs_pth (str) – Absolute path to transect shard segments directory.
segments (iterable or None) – Which segments to load. If
None
(default), all segments are loaded.
- Returns
- Return type
- echofilter.raw.shardloader.load_transect_segments_from_shards_rel(transect_rel_pth, dataset='mobile', segments=None, root_data_dir='/data/dsforce/surveyExports')[source]#
Load transect data from shard files.
- Parameters
transect_rel_pth (str) – Relative path to transect.
dataset (str, optional) – Name of dataset. Default is
"mobile"
.segments (iterable or None) – Which segments to load. If
None
(default), all segments are loaded.root_data_dir (str) – Path to root directory where data is located.
**kwargs – As per
load_transect_from_shards_abs()
.
- Returns
- Return type
- echofilter.raw.shardloader.segment_and_shard_transect(transect_pth, dataset='mobile', max_depth=None, shard_len=128, root_data_dir='/data/dsforce/surveyExports')[source]#
Create a sharded copy of a transect.
The transect is cut into segments based on recording starts/stops. Each segment is split across multiple files (shards) for efficient loading.
- Parameters
transect_pth (str) – Relative path to transect, excluding
"_Sv_raw.csv"
.dataset (str, optional) – Name of dataset. Default is
"mobile"
.max_depth (float or None, optional) – The maximum depth to include in the saved shard. Data corresponding to deeper locations is omitted to save on load time and memory when the shard is loaded. If
None
, no cropping is applied. Default isNone
.shard_len (int, optional) – Number of timestamp samples to include in each shard. Default is
128
.root_data_dir (str) – Path to root directory where data is located.
Notes
The segments will be written to the directories
<root_data_dir>_sharded/<dataset>/transect_path/<segment>/
For the contents of each directory, seewrite_transect_shards
.
- echofilter.raw.shardloader.shard_transect(transect_pth, dataset='mobile', max_depth=None, shard_len=128, root_data_dir='/data/dsforce/surveyExports')#
Create a sharded copy of a transect.
The transect is cut into segments based on recording starts/stops. Each segment is split across multiple files (shards) for efficient loading.
- Parameters
transect_pth (str) – Relative path to transect, excluding
"_Sv_raw.csv"
.dataset (str, optional) – Name of dataset. Default is
"mobile"
.max_depth (float or None, optional) – The maximum depth to include in the saved shard. Data corresponding to deeper locations is omitted to save on load time and memory when the shard is loaded. If
None
, no cropping is applied. Default isNone
.shard_len (int, optional) – Number of timestamp samples to include in each shard. Default is
128
.root_data_dir (str) – Path to root directory where data is located.
Notes
The segments will be written to the directories
<root_data_dir>_sharded/<dataset>/transect_path/<segment>/
For the contents of each directory, seewrite_transect_shards
.
- echofilter.raw.shardloader.write_transect_shards(dirname, transect, max_depth=None, shard_len=128)[source]#
Create a sharded copy of a transect.
The transect is cut by timestamp and split across multiple files.
- Parameters
dirname (str) – Path to output directory.
transect (dict) – Observed values for the transect. Should already be segmented.
max_depth (float or None, optional) – The maximum depth to include in the saved shard. Data corresponding to deeper locations is omitted to save on load time and memory when the shard is loaded. If
None
, no cropping is applied. Default isNone
.shard_len (int, optional) – Number of timestamp samples to include in each shard. Default is
128
.
Notes
The output will be written to the directory
dirname
, and will contain:a file named
"shard_size.txt"
, which contains the sharding metadata: total number of samples, and shard size;a directory for each shard, named 0, 1, … Each shard directory will contain files:
depths.npy
timestamps.npy
Sv.npy
mask.npy
turbulence.npy
bottom.npy
is_passive.npy
is_removed.npy
is_upward_facing.npy
which contain pickled numpy dumps of the matrices for each shard.
echofilter.raw.utils module#
Loader utility functions.
- echofilter.raw.utils.fillholes2d(arr, nan_thr=2, interp_method='linear', inplace=False)[source]#
Interpolate to replace NaN values in 2d gridded array data.
- Parameters
arr (2d numpy.ndarray) – Array in 2d which, may contain NaNs.
nan_thr (int, default=2) – Minimum number of NaN values needed in a row/column for it to be included in the (rectangular) area where NaNs are fixed.
interp_method (str, default="linear") – Interpolation method.
inplace (bool, default=False) – Whether to update arr instead of a copy.
- Returns
arr – Like input
arr
, but with NaN values replaced with interpolated values.- Return type
2d numpy.ndarray
- echofilter.raw.utils.integrate_area_of_contour(x, y, closed=None, preserve_sign=False)[source]#
Compute the area within a contour, using Green’s algorithm.
- Parameters
x (array_like vector) – x co-ordinates of nodes along the contour.
y (array_like vector) – y co-ordinates of nodes along the contour.
closed (bool or None, optional) – Whether the contour is already closed. If
False
, it will be closed before deterimining the area. IfNone
(default), it is automatically determined as to whether the contour is already closed, and is closed if necessary.preserve_sign (bool, optional) – Whether to preserve the sign of the area. If
True
, the area is positive if the contour is anti-clockwise and negative if it is clockwise oriented. Default isFalse
, which always returns a positive area.
- Returns
area – The integral of the area witihn the contour.
- Return type
Notes
https://en.wikipedia.org/wiki/Green%27s_theorem#Area_calculation
- echofilter.raw.utils.interp1d_preserve_nan(x, y, x_samples, nan_threshold=0.0, bounds_error=False, **kwargs)[source]#
Interpolate a 1-D function, preserving NaNs.
Inputs
x
andy
are arrays of values used to approximate some function f:y = f(x)
. We exclude NaNs for the interpolation and then mask out entries which are adjacent (or close to) a NaN in the input.- Parameters
x ((N,) array_like) – A 1-D array of real values. Must not contain NaNs.
y ((...,N,...) array_like) – A N-D array of real values. The length of
y
along the interpolation axis must be equal to the length ofx
. May contain NaNs.x_samples (array_like) – A 1-D array of real values at which the interpolation function will be sampled.
nan_threshold (float, optional) – Minimum amount of influence a NaN must have on an output sample for it to become a NaN. Default is
0.
i.e. any influence.bounds_error (bool, optional) – If
True
, a ValueError is raised any time interpolation is attempted on a value outside of the range ofx
(where extrapolation is necessary). IfFalse
(default), out of bounds values are assigned valuefill_value
(whose default is NaN).**kwargs – Additional keyword arguments are as per
scipy.interpolate.interp1d()
.
- Returns
y_samples – The result of interpolating, with sample points close to NaNs in the input returned as NaN.
- Return type
(…,N,…) np.ndarray
- echofilter.raw.utils.medfilt1d(signal, kernel_size, axis=- 1, pad_mode='reflect')[source]#
Median filter in 1d, with support for selecting padding mode.
- Parameters
- Returns
filtered – The filtered signal.
- Return type
array_like
See also
- echofilter.raw.utils.pad1d(array, pad_width, axis=0, **kwargs)[source]#
Pad an array along a single axis only.
- Parameters
- Returns
Padded array.
- Return type
numpy.ndarary
See also
- echofilter.raw.utils.squash_gaps(mask, max_gap_squash, axis=- 1, inplace=False)[source]#
Merge small gaps between zero values in a boolean array.
- Parameters
mask (boolean array) – The input mask, with small gaps between zero values which will be squashed with zeros.
max_gap_squash (int) – Maximum length of gap to squash.
axis (int, optional) – Axis on which to operate. Default is
-1
.inplace (bool, optional) – Whether to operate on the original array. If
False
, a copy is created and returned.
- Returns
merged_mask – Mask as per the input, but with small gaps squashed.
- Return type
boolean array