By default, mc_prep_clean
runs automatically when mc_read_files()
or mc_read_data()
are called. mc_prep_clean
checks the time-series
in the myClim object in Raw-format for missing, duplicated, and disordered records.
The function can either directly regularize microclimatic
time-series to a constant time-step, remove duplicated records, and
fill missing values with NA (resolve_conflicts=TRUE
); or it can
insert new states (tags) see mc_states_insert to highlight records with conflicts
i.e. duplicated datetime but different measurement values (resolve_conflicts=FALSE
)
but not perform the cleaning itself. When there were no conflicts,
cleaning is performed in both cases (resolve_conflicts=TRUE or FALSE
) See details.
mc_prep_clean(data, silent = FALSE, resolve_conflicts = TRUE, tolerance = NULL)
myClim object in Raw-format. see myClim-package
if true, then cleaning log table and progress bar is not printed in console (default FALSE), see mc_info_clean()
by default the object is automatically cleaned and conflict measurements with closest original datetime to rounded datetime are selected, see details. (default TRUE) If FALSE and conflict records exist the function returns the original, uncleaned object with tags (states) "clean_conflict" highlighting records with duplicated datetime but different measurement values.When conflict records does not exist, object is cleaned in both TRUE and FALSE cases.
list of tolerance values for each physical unit see mc_data_physical. Format is list(unit_name=tolerance_value). If maximal difference of conflict values is lower then tolerance, conflict is resolved without warning. If NULL, then tolerance is not applied (default NULL) see details.
cleaned myClim object in Raw-format (default) resolve_conflicts=TRUE
or resolve_conflicts=FALSE
but no conflicts exist
cleaning log is by default printed in console, but can be called also later by mc_info_clean()
non cleaned myClim object in Raw-format with "clean_conflict" tags resolve_conflicts=FALSE
and conflicts exist
The function mc_prep_clean
can be used in two different ways depending on
the parameter resolve_conflicts
. When resolve_conflicts=TRUE
, the function
performs automatic cleaning and returns a cleaned myClim object. When resolve_conflicts=FALSE
,
and myClim object contains conflicts (rows with identical time, but different measured value),
the function returns the original, uncleaned object with tags (states) see mc_states_insert
highlighting records with duplicated datetime but different measured values.
When there were no conflicts, cleaning is performed in both cases (resolve_conflicts=TRUE OR FALSE
)
Processing the data with mc_prep_clean
and resolving the conflicts is a mandatory step
required for further data handling in the myClim
library.
This function guarantee that all time series are in chronological order,
have regular time-step and no duplicated records.
Function mc_prep_clean
use either time-step provided by user during data import with mc_read
(used time-step is permanently stored in logger metadata mc_LoggerMetadata;
or if time-step is not provided by the user (NA),than myClim automatically
detects the time-step from input time series based on the last 100 records.
In case of irregular time series, function returns warning and skip (does not read) the file.
In cases when the user provides a time-step during data import in mc_read
functions
instead of relying on automatic step detection, and the provided step does not correspond
with the actual records (i.e., the logger records data every 900 seconds but the user
provides a step of 3600 seconds), the myClim rounding routine consolidates multiple
records into an identical datetime. The resulting value corresponds to the one closest
to the provided step (i.e., in an original series like ...9:50, 10:05, 10:20, 10:35, 10:50, 11:05...,
the new record would be 10:00, and the value will be taken from the original record at 10:05).
This process generates numerous warnings in resolve_conflicts=TRUE
and a multitude of tags
in resolve_conflicts=FALSE
.
The tolerance
parameter is designed for situations where the logger does not perform optimally,
but the user still needs to extract and analyze the data. In some cases, loggers may record
multiple rows with identical timestamps but with slightly different microclimate values,
due to the limitations of sensor resolution and precision.
By using the tolerance
parameter, myClim will automatically select one of these values
and resolve the conflict without generating additional warnings. It is strongly recommended
to set the tolerance
value based on the sensor's resolution and precision.
In case the time-step is regular, but is not nicely rounded, function rounds
the time series to the closest nice time and shifts original data.
E.g., original records in 10 min regular step c(11:58, 12:08, 12:18, 12:28)
are shifted to newly generated nice sequence c(12:00, 12:10, 12:20, 12:30).
Note that microclimatic records are not modified but only shifted.
Maximum allowed shift of time series is 30 minutes. For example, when the time-step
is 2h (e.g. 13:33, 15:33, 17:33), the measurement times are shifted to (13:30, 15:30, 17:30).
When you have 2h time step and wish to go to the whole hour
(13:33 -> 14:00, 15:33 -> 16:00) the only way is aggregation -
use mc_agg(period="2 hours")
command after data cleaning.
cleaned_data <- mc_prep_clean(mc_data_example_raw)
#> 5 loggers
#> datetime range: 2020-10-06 09:00:00 - 2021-02-01
#> detected steps: (900s = 15min)
#> locality_id serial_number logger_name start_date end_date
#> 1 A1E05 91184101 Thermo_1 2020-10-28 08:45:00 2021-02-01
#> 2 A1E05 92201058 Dendro_1 2020-10-31 12:00:00 2021-02-01
#> 3 A2E32 94184103 TMS_1 2020-10-16 06:15:00 2021-02-01
#> 4 A2E32 20024338 HOBO_U23-001A_1 2020-10-09 08:00:00 2021-02-01
#> 5 A6W79 94184102 TMS_1 2020-10-06 09:00:00 2021-02-01
#> step_seconds count_duplicities count_missing count_disordered rounded
#> 1 900 0 0 0 FALSE
#> 2 900 0 0 0 FALSE
#> 3 900 0 0 0 FALSE
#> 4 900 0 0 0 FALSE
#> 5 900 0 0 0 FALSE