| Title: | Accelerated Estimation of Robust Location and Scale |
|---|---|
| Description: | Estimates robust location and scale parameters using platform-specific Single Instruction, Multiple Data (SIMD) vectorization and Intel Threading Building Blocks (TBB) for parallel processing. Implements a novel variance-weighted ensemble estimator that adaptively combines all available statistics. Methods include logistic M-estimators, the estimators of Rousseeuw and Croux (1993), the Gini mean difference, the scaled Median Absolute Deviation (MAD), the scaled Interquartile Range (IQR), and unbiased standard deviations. Achieves substantial speedups over existing implementations through an 'Rcpp' backend with fused single-buffer algorithms that halve memory traffic for MAD and M-scale estimation, and a unified dispatcher that automatically selects the optimal estimator based on sample size. |
| Authors: | Dennis Alexis Valin Dittrich [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-4438-8276>) |
| Maintainer: | Dennis Alexis Valin Dittrich <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.5.4 |
| Built: | 2026-05-30 09:13:55 UTC |
| Source: | https://github.com/davdittrich/robscale |
Computes the mean absolute deviation from the median, scaled by a consistency constant for asymptotic normality under the Gaussian model.
adm( x, center = NULL, constant = 1.2533141373155, na.rm = FALSE, ci = FALSE, level = 0.95 )adm( x, center = NULL, constant = 1.2533141373155, na.rm = FALSE, ci = FALSE, level = 0.95 )
x |
A numeric vector. |
center |
Optional numeric scalar giving the central value from which to
measure the average absolute distance. Defaults to the median of
|
constant |
Consistency constant for asymptotic normality at the
Gaussian. Defaults to |
na.rm |
Logical. If |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The average distance to the median (ADM) is defined as
where is the consistency constant and
is the sample median. When center is supplied, it replaces the
sample median.
Statistical Properties.
The default constant ensures that the
ADM is a consistent estimator of the standard deviation under
the Gaussian model. At the normal distribution, the ADM achieves an
asymptotic relative efficiency (ARE) of 0.88 compared to the sample
standard deviation.
While the ADM is less efficient than the standard deviation for purely
Gaussian data, it offers superior resistance to "implosion" (the estimate
collapsing to zero). Its implosion breakdown point is
, meaning it only collapses if all but one
observation are identical. Conversely, its explosion breakdown point is
, similar to the sample mean. These properties make the ADM the
ideal implosion fallback for the M-estimator of scale in
robScale.
Computational Performance.
This implementation employs a tiered selection strategy: optimal sorting
networks for and adaptive selection
(Floyd–Rivest or pdqselect, depending on cache-derived thresholds) for
larger datasets. This avoids the full sort required by the standard
median function.
A single numeric value: the scaled mean absolute deviation from the
center. Returns NA if x has length zero after removal of
NAs.
Nair, K. R. (1947) A Note on the Mean Deviation from the Median. Biometrika, 34(3/4), 360–362. doi:10.2307/2332448
Rousseeuw, P. J. and Verboven, S. (2002) Robust estimation in very small samples. Computational Statistics & Data Analysis, 40(4), 741–758. doi:10.1016/S0167-9473(02)00078-6
mad for the median absolute deviation from the
median;
robScale for the M-estimator of scale that uses the ADM as
an implosion fallback.
adm(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) adm(x) # with consistency constant adm(x, constant = 1) # raw mean absolute deviation # Supply a known center adm(x, center = 4.0)adm(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) adm(x) # with consistency constant adm(x, constant = 1) # raw mean absolute deviation # Supply a known center adm(x, center = 4.0)
Returns the consistency constant or finite-sample correction factor for a given scale estimator and sample size.
get_consistency_constant(method, n = NULL)get_consistency_constant(method, n = NULL)
method |
Character string specifying the estimator.
Options: |
n |
Integer. The sample size (used for |
For "c4", "sn", and "qn", the returned value depends
on n. For "gmd", "mad", and "iqr", the
returned value is the asymptotic constant (independent of n).
A single numeric value: the consistency constant or correction factor.
get_consistency_constant("mad") # 1.4826 get_consistency_constant("c4", 5) # c4(5) get_consistency_constant("qn", 10) # finite-sample Qn factorget_consistency_constant("mad") # 1.4826 get_consistency_constant("c4", 5) # c4(5) get_consistency_constant("qn", 10) # finite-sample Qn factor
Computes the Gini mean difference, scaled by a consistency constant for asymptotic normality under the Gaussian model.
gmd(x, constant = 0.886226925452758, na.rm = FALSE, ci = FALSE, level = 0.95)gmd(x, constant = 0.886226925452758, na.rm = FALSE, ci = FALSE, level = 0.95)
x |
A numeric vector. |
constant |
Consistency constant for asymptotic normality at the
Gaussian. Defaults to |
na.rm |
Logical. If |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The Gini mean difference is defined as
where are the order statistics and
is the consistency constant. The computation requires a full sort
(O(n log n)).
Statistical Properties.
The default constant ensures that the
GMD is a consistent estimator of under the Gaussian model. The
GMD achieves an asymptotic relative efficiency (ARE) of 0.98 compared
to the sample standard deviation, making it the most efficient robust
alternative in this package. Its breakdown point is , approximately 29.3%.
If ci = FALSE (default), a single numeric value: the scaled
Gini mean difference. Returns 0 if n < 2; returns NA
if x has length zero after removal of NAs.
If ci = TRUE, an object of class "robscale_ci".
Gini, C. (1912) Variabilita e mutabilita. Bologna.
scale_robust for the unified dispatcher;
qn and sn for high-breakdown scale estimators.
gmd(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) gmd(x) # with consistency constant gmd(x, constant = 1) # raw Gini mean difference # Asymptotic confidence interval gmd(x, ci = TRUE)gmd(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) gmd(x) # with consistency constant gmd(x, constant = 1) # raw Gini mean difference # Asymptotic confidence interval gmd(x, ci = TRUE)
Computes the interquartile range, scaled by a consistency constant for asymptotic normality under the Gaussian model.
iqr_scaled( x, constant = 0.741301109252801, na.rm = FALSE, ci = FALSE, level = 0.95 )iqr_scaled( x, constant = 0.741301109252801, na.rm = FALSE, ci = FALSE, level = 0.95 )
x |
A numeric vector. |
constant |
Consistency constant for asymptotic normality at the
Gaussian. Defaults to
|
na.rm |
Logical. If |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The scaled IQR is defined as
where denotes the Type 7 quantile (R default) and is
the consistency constant.
Statistical Properties.
The default constant ensures consistency for under the Gaussian
model. The IQR achieves an asymptotic relative efficiency (ARE) of
0.37 compared to the sample standard deviation. Its breakdown point is 25%.
Computational Performance.
Unlike IQR, which requires a full sort, this
implementation uses O(n) selection via the pdqselect algorithm for each
quantile, providing a substantial speedup for large datasets.
If ci = FALSE (default), a single numeric value: the scaled
interquartile range. Returns 0 if n < 2; returns NA
if x has length zero after removal of NAs.
If ci = TRUE, an object of class "robscale_ci".
IQR for the base R (unscaled) interquartile range;
scale_robust for the unified dispatcher;
mad_scaled for the scaled median absolute deviation.
iqr_scaled(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) iqr_scaled(x) # with consistency constant iqr_scaled(x, constant = 1) # raw IQR # Asymptotic confidence interval iqr_scaled(x, ci = TRUE)iqr_scaled(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) iqr_scaled(x) # with consistency constant iqr_scaled(x, constant = 1) # raw IQR # Asymptotic confidence interval iqr_scaled(x, ci = TRUE)
Computes the median absolute deviation from the median (or a user-supplied center), scaled by a consistency constant for asymptotic normality under the Gaussian model.
mad_scaled( x, center = NULL, constant = 1.4826022185056, na.rm = FALSE, ci = FALSE, level = 0.95 )mad_scaled( x, center = NULL, constant = 1.4826022185056, na.rm = FALSE, ci = FALSE, level = 0.95 )
x |
A numeric vector. |
center |
Optional numeric scalar giving the central value from which to
measure absolute deviations. Defaults to the median of |
constant |
Consistency constant for asymptotic normality at the
Gaussian. Defaults to |
na.rm |
Logical. If |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The scaled MAD is defined as
Statistical Properties. The MAD achieves a 50% breakdown point, meaning it tolerates up to half the data being contaminated. Its asymptotic relative efficiency (ARE) is 36.8% compared to the sample standard deviation.
Computational Performance.
Unlike mad, this implementation uses O(n) selection
with adaptive algorithm dispatch: Floyd-Rivest for moderate n, pdqselect
for large n (threshold derived from per-core L2 cache size at startup),
and sorting networks for n 16.
If ci = FALSE (default), a single numeric value: the scaled
median absolute deviation. Returns 0 if n = 1; returns
NA if x has length zero after removal of NAs.
If ci = TRUE, an object of class "robscale_ci".
mad for the base R implementation;
scale_robust for the unified dispatcher;
robScale for the M-estimate of scale that uses the MAD as
a starting value.
mad_scaled(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) mad_scaled(x) # with consistency constant mad_scaled(x, constant = 1) # raw median absolute deviation # Supply a known center mad_scaled(x, center = 4.0) # Asymptotic confidence interval mad_scaled(x, ci = TRUE)mad_scaled(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) mad_scaled(x) # with consistency constant mad_scaled(x, constant = 1) # raw median absolute deviation # Supply a known center mad_scaled(x, center = 4.0) # Asymptotic confidence interval mad_scaled(x, ci = TRUE)
Print a robscale_ci object
## S3 method for class 'robscale_ci' print(x, digits = 4, ...)## S3 method for class 'robscale_ci' print(x, digits = 4, ...)
x |
A |
digits |
Number of significant digits to display. Default: 4. |
... |
Further arguments passed to or from other methods. |
The object x, invisibly.
Print a robscale_ensemble_ci object
## S3 method for class 'robscale_ensemble_ci' print(x, digits = 4, ...)## S3 method for class 'robscale_ensemble_ci' print(x, digits = 4, ...)
x |
A |
digits |
Number of significant digits to display. Default: 4. |
... |
Further arguments passed to or from other methods. |
The object x, invisibly.
Computes the robust estimator of scale proposed by Rousseeuw and Croux (1993).
qn( x, constant = 2.21914446598508, finite.corr = TRUE, na.rm = FALSE, ci = FALSE, level = 0.95 )qn( x, constant = 2.21914446598508, finite.corr = TRUE, na.rm = FALSE, ci = FALSE, level = 0.95 )
x |
A numeric vector of observations. |
constant |
Consistency constant. Default is |
finite.corr |
Logical; if |
na.rm |
Logical; if |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The estimator is defined as the -th order statistic of the
absolute pairwise differences for
. Specifically, where is the
set of absolute differences and with
.
Statistical Properties.
is a highly robust estimator with a 50% breakdown point.
Unlike the Median Absolute Deviation (MAD), does not require a
prior location estimate, making it more robust in asymmetric distributions.
At the Gaussian distribution, it achieves an asymptotic relative
efficiency (ARE) of 0.82, significantly higher than the 0.37 achieved by the
MAD.
Computational Performance.
While the naive calculation of requires space and
time, this implementation employs a specialized algorithm.
The implementation uses a tiered execution strategy:
Optimal sorting networks for very small samples ().
These networks eliminate branch misprediction in the target regimes of
extremely small samples.
A Croux–Rousseeuw weighted-median refinement algorithm for medium and large datasets.
Cache-aware parallelization via Intel TBB (Threading Building Blocks) for large-scale data, with thresholds derived from the detected per-core L2 cache size.
If ci = FALSE (default), the estimator of scale
(a scalar). If ci = TRUE, an object of class "robscale_ci".
Rousseeuw, P. J., and Croux, C. (1993). Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association, 88(424), 1273–1283. doi:10.1080/01621459.1993.10476408
Akinshin, A. (2022). Finite-sample Rousseeuw-Croux scale estimators. arXiv preprint arXiv:2209.12268.
sn for the scale estimator;
robScale for the M-estimator of scale;
adm for the average distance to median.
qn(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) qn(x) # Asymptotic confidence interval qn(x, ci = TRUE)qn(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) qn(x) # Asymptotic confidence interval qn(x, ci = TRUE)
Computes the robust M-estimate of location for very small samples using the
logistic function of Rousseeuw & Verboven (2002).
robLoc( x, scale = NULL, na.rm = FALSE, maxit = 80L, tol = sqrt(.Machine$double.eps) )robLoc( x, scale = NULL, na.rm = FALSE, maxit = 80L, tol = sqrt(.Machine$double.eps) )
x |
A numeric vector. |
scale |
Optional numeric scalar giving a known scale. When supplied, the MAD is replaced by this value and the minimum sample size for iteration is lowered from 4 to 3 (see ‘Details’). |
na.rm |
Logical. If |
maxit |
Maximum number of Newton–Raphson iterations. Defaults to 80. |
tol |
Convergence tolerance. Iteration stops when the Newton step
satisfies |
The M-estimate of location is defined as the solution to the
estimating equation
where is a fixed auxiliary scale (defaulting to the MAD) and
is the logistic psi function:
Statistical Properties.
The logistic psi function is bounded, smooth (), and
strictly monotone. These properties ensure that the resulting M-estimator is
both robust to outliers and numerically stable. At the Gaussian distribution,
the logistic M-estimator of location achieves high efficiency, with an
asymptotic relative efficiency (ARE) of 0.98 compared to the sample
mean.
Small-Sample Strategy.
Following Rousseeuw & Verboven (2002), location and scale are estimated
separately. In robLoc, the auxiliary scale remains fixed
throughout the Newton–Raphson iteration. This "decoupled" approach avoids
the instabilities often encountered in small samples when using simultaneous
location–scale iteration (e.g., Huber's Proposal 2).
Numerical Computation.
The estimating equation is solved via Newton–Raphson iteration starting from
the sample median. Because the derivative of the logistic psi satisfies
,
the Newton step is computationally efficient, requiring no additional
transcendental calls beyond the tanh evaluations used for the psi
function itself.
The denominator of the Newton step is the observed Fisher information
recomputed at each iteration using the current estimate . This
distinguishes the implementation from IRLS (iteratively reweighted least
squares), which uses a fixed expected-information denominator and converges
linearly. Using the observed Hessian yields true Newton–Raphson: the
fixed-point derivative at the solution satisfies , giving quadratic local convergence. On typical data,
convergence is achieved in two to four iterations; the maxit bound
of 80 is a conservative safety limit.
Performance and SIMD.
The C++ kernel dispatches tanh to the fastest available backend:
Apple Accelerate on macOS, glibc libmvec (AVX2 4-wide)
on Linux x86-64, SLEEF when libmvec is absent, or #pragma omp simd
as a portable fallback. A fused AVX2 kernel accumulates and
in a single pass, halving memory reads.
Fallback Mechanism.
For extremely small samples where iteration may be unreliable, the function
returns the median directly:
If scale is unknown: (since the MAD of 3 points is
insufficiently robust).
If scale is supplied: .
A single numeric value: the robust M-estimate of location.
Returns NA if x has length zero (after removal of
NAs when na.rm = TRUE).
Rousseeuw, P. J. and Verboven, S. (2002) Robust estimation in very small samples. Computational Statistics & Data Analysis, 40(4), 741–758. doi:10.1016/S0167-9473(02)00078-6
median for the starting value;
mad for the auxiliary scale;
robScale for the companion scale estimator;
qn and sn for high-efficiency scale
estimators.
robLoc(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) robLoc(x) # Known scale lowers the minimum sample size to 3 robLoc(c(1, 2, 3), scale = 1.5) # Outlier resistance x_clean <- c(2.0, 3.1, 2.7, 2.9, 3.3) x_dirty <- replace(x_clean, 5, 100) c(robLoc(x_clean), robLoc(x_dirty)) # barely moves c(mean(x_clean), mean(x_dirty)) # destroyedrobLoc(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) robLoc(x) # Known scale lowers the minimum sample size to 3 robLoc(c(1, 2, 3), scale = 1.5) # Outlier resistance x_clean <- c(2.0, 3.1, 2.7, 2.9, 3.3) x_dirty <- replace(x_clean, 5, 100) c(robLoc(x_clean), robLoc(x_dirty)) # barely moves c(mean(x_clean), mean(x_dirty)) # destroyed
Computes the robust M-estimate of scale for very small samples using the
function of Rousseeuw & Verboven (2002).
robScale( x, loc = NULL, fallback = c("adm", "na"), implbound = 1e-04, na.rm = FALSE, maxit = 80L, tol = sqrt(.Machine$double.eps), ci = FALSE, level = 0.95 )robScale( x, loc = NULL, fallback = c("adm", "na"), implbound = 1e-04, na.rm = FALSE, maxit = 80L, tol = sqrt(.Machine$double.eps), ci = FALSE, level = 0.95 )
x |
A numeric vector. |
loc |
Optional numeric scalar giving a known location. When supplied,
the observations are centered at |
fallback |
Character string specifying the fallback behavior when the MAD
collapses to zero or the sample size is too small for iteration.
Must be one of |
implbound |
Numeric scalar specifying the threshold for MAD implosion.
Defaults to |
na.rm |
Logical. If |
maxit |
Maximum number of Newton–Raphson iterations. Defaults to 80. |
tol |
Convergence tolerance. Iteration stops when the relative
change in the scale estimate falls below |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The M-estimate of scale is defined as the solution to the
estimating equation
where the location is fixed at the sample median,
is the expected value of under the Gaussian model, and
is a smooth rho function (Rousseeuw & Verboven, 2002, Sec. 4.2):
The tuning constant is chosen to satisfy
.
Statistical Properties. This estimator is designed for high robustness and efficiency. It achieves a 50% breakdown point, meaning the estimate remains reliable even if half the sample is contaminated by outliers. At the Gaussian distribution, the logistic M-estimator of scale achieves an asymptotic relative efficiency (ARE) of 0.55 compared to the sample standard deviation.
Numerical Computation.
The M-scale estimating equation is solved
by Newton–Raphson iteration, starting from the MAD. Each step computes
and accumulates two sums in a single pass:
(the rho sum) and
(the derivative sum). The NR update is
with convergence test .
When the derivative sum degenerates, the iteration falls back to a
multiplicative half-step. Quadratic convergence yields 3–4 iterations
on typical data. Location is held fixed at the sample median, following
the decoupled approach of Rousseeuw and Verboven (2002) that avoids the
positive-feedback instabilities of simultaneous location–scale estimation.
Performance and SIMD.
The C++ kernel dispatches tanh to the fastest available backend:
Apple Accelerate on macOS, glibc libmvec (AVX2 4-wide)
on Linux x86-64, SLEEF when libmvec is absent, or #pragma omp simd
as a portable fallback.
Known location.
When loc is supplied, the observations are centered as
and the initial scale is set to
rather than the MAD. This lowers the minimum sample size from 4 to 3
(Rousseeuw & Verboven, 2002, Sec. 5).
Fallback Mechanism and Implosion.
Robust scale estimators like the MAD can "implode" (collapse to zero) if more
than 50
or the sample size is too small for reliable iteration (, or
if location is known):
If fallback = "adm" (default), the function returns the
scaled Average Distance to the Median (adm). The ADM
is highly resistant to implosion (breakdown point ).
If fallback = "na", the function returns NA.
If ci = FALSE (default), a single numeric value: the robust
M-estimate of scale. Returns NA if x has length zero (after
removal of NAs when na.rm = TRUE) or if the MAD collapses
and fallback = "na".
If ci = TRUE, an object of class "robscale_ci".
Rousseeuw, P. J. and Verboven, S. (2002) Robust estimation in very small samples. Computational Statistics & Data Analysis, 40(4), 741–758. doi:10.1016/S0167-9473(02)00078-6
adm for the implosion fallback;
mad for the starting value and classical alternative;
robLoc for the companion location estimator;
qn and sn for high-efficiency scale
estimators.
robScale(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) robScale(x) robScale(x, loc = 5) # known location # Outlier resistance x_clean <- c(2.0, 3.1, 2.7, 2.9, 3.3) x_dirty <- replace(x_clean, 5, 100) c(robScale(x_clean), robScale(x_dirty)) # barely moves c(sd(x_clean), sd(x_dirty)) # destroyed # MAD implosion: identical values cause MAD = 0 robScale(c(5, 5, 5, 5, 6)) # falls back to adm() # Asymptotic confidence interval robScale(x, ci = TRUE)robScale(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) robScale(x) robScale(x, loc = 5) # known location # Outlier resistance x_clean <- c(2.0, 3.1, 2.7, 2.9, 3.3) x_dirty <- replace(x_clean, 5, 100) c(robScale(x_clean), robScale(x_dirty)) # barely moves c(sd(x_clean), sd(x_dirty)) # destroyed # MAD implosion: identical values cause MAD = 0 robScale(c(5, 5, 5, 5, 6)) # falls back to adm() # Asymptotic confidence interval robScale(x, ci = TRUE)
Unified dispatcher for robust scale estimation. Automatically selects between a variance-weighted ensemble of 7 estimators (for small samples) and the Gini Mean Difference (for large samples), or returns a specific estimator by name.
scale_robust( x, method = c("ensemble", "gmd", "sd", "mad", "iqr", "sn", "qn", "robScale"), auto_switch = TRUE, threshold = 20L, n_boot = 200L, na.rm = FALSE, ci = FALSE, level = 0.95, boot_method = c("auto", "bca", "percentile", "parametric", "analytical") )scale_robust( x, method = c("ensemble", "gmd", "sd", "mad", "iqr", "sn", "qn", "robScale"), auto_switch = TRUE, threshold = 20L, n_boot = 200L, na.rm = FALSE, ci = FALSE, level = 0.95, boot_method = c("auto", "bca", "percentile", "parametric", "analytical") )
x |
A numeric vector of observations. |
method |
Character string specifying the estimation method.
Options: |
auto_switch |
Logical. If |
threshold |
Integer. Sample size at which the ensemble auto-switches to GMD. Default: 20 (research-backed; see ‘Details’). |
n_boot |
Integer. Number of bootstrap replicates for the ensemble weighting or bootstrap CI. Default: 200. |
na.rm |
Logical. If |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
boot_method |
CI method. For single named methods: |
Ensemble method.
When method = "ensemble", the function computes a
variance-weighted combination of 7 scale estimators:
sd_c4 — unbiased standard deviation
gmd — Gini mean difference
mad_scaled — median absolute deviation
iqr_scaled — scaled interquartile range
sn — Sn estimator of Rousseeuw & Croux
qn — Qn estimator of Rousseeuw & Croux
robScale — logistic M-estimator of scale
The weights are determined by bootstrap resampling: each estimator's
inverse variance across n_boot resamples determines its
contribution. Estimators with lower sampling variance receive higher
weight.
Automatic switching.
When auto_switch = TRUE and method = "ensemble" and
n >= threshold, the function returns gmd(x) directly.
The GMD achieves 98% asymptotic relative efficiency at the Gaussian while
being computationally cheaper than the ensemble. Named methods (e.g.\
method = "qn") are always dispatched as requested; auto_switch
never overrides an explicit method choice.
Individual methods.
When a specific method is requested, scale_robust bypasses the
ensemble and calls the corresponding C++ entry point directly. With
ci = TRUE, the default boot_method = "auto" returns the
analytical interval: chi-squared for "sd", ARE-based normal
approximation for all others. Pass boot_method = "bca",
"percentile", or "parametric" to obtain a bootstrap CI
instead.
If ci = FALSE (default), a single numeric value: the scale
estimate (NA when n < 2). If ci = TRUE with a single
method, a "robscale_ci" object. If ci = TRUE with
method = "ensemble", a "robscale_ensemble_ci" object
containing the ensemble estimate, bootstrap CI, and a table of
per-estimator results.
Individual estimators: sd_c4, gmd,
mad_scaled, iqr_scaled, sn,
qn, robScale;
get_consistency_constant for the underlying constants.
x <- c(1, 2, 3, 5, 7, 8) scale_robust(x) # ensemble (n < 20) scale_robust(x, method = "qn") # specific method (always qn) set.seed(42) y <- rnorm(50) scale_robust(y) # ensemble auto-switches to GMD scale_robust(y, method = "qn") # qn regardless of n scale_robust(y, auto_switch = FALSE) # forces ensemble # Analytical CI for a named method (default) scale_robust(x, method = "sn", ci = TRUE) # Bootstrap CI for a named method scale_robust(x, method = "qn", ci = TRUE, boot_method = "percentile") # Ensemble with bootstrap CIs scale_robust(x, ci = TRUE)x <- c(1, 2, 3, 5, 7, 8) scale_robust(x) # ensemble (n < 20) scale_robust(x, method = "qn") # specific method (always qn) set.seed(42) y <- rnorm(50) scale_robust(y) # ensemble auto-switches to GMD scale_robust(y, method = "qn") # qn regardless of n scale_robust(y, auto_switch = FALSE) # forces ensemble # Analytical CI for a named method (default) scale_robust(x, method = "sn", ci = TRUE) # Bootstrap CI for a named method scale_robust(x, method = "qn", ci = TRUE, boot_method = "percentile") # Ensemble with bootstrap CIs scale_robust(x, ci = TRUE)
Computes the sample standard deviation corrected by the
factor to remove the small-sample bias of the square-root estimator.
sd_c4(x, na.rm = FALSE, ci = FALSE, level = 0.95)sd_c4(x, na.rm = FALSE, ci = FALSE, level = 0.95)
x |
A numeric vector. |
na.rm |
Logical. If |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The standard sd function computes
where is the unbiased
sample variance. However, for finite
; the square root introduces a downward bias.
The correction factor removes this bias:
Statistical Properties. This is a classical (non-robust) estimator with 100% ARE by construction, but 0% breakdown point — a single outlier can make it arbitrarily large.
Numerical Stability. Uses Welford's online algorithm for numerically stable variance computation, avoiding catastrophic cancellation that affects naive two-pass formulas.
If ci = FALSE (default), a single numeric value: the
bias-corrected standard deviation. Returns NA if n < 2.
If ci = TRUE, an object of class "robscale_ci" with an
exact chi-squared confidence interval.
sd for the (biased) sample standard deviation;
scale_robust for the unified dispatcher;
robScale for the robust M-estimate of scale.
sd_c4(c(1:9)) # Compare with base sd() --- difference is small for large n x <- rnorm(1000) c(sd_c4 = sd_c4(x), sd = sd(x)) # Exact chi-squared confidence interval sd_c4(c(1:9), ci = TRUE)sd_c4(c(1:9)) # Compare with base sd() --- difference is small for large n x <- rnorm(1000) c(sd_c4 = sd_c4(x), sd = sd(x)) # Exact chi-squared confidence interval sd_c4(c(1:9), ci = TRUE)
Computes the robust estimator of scale proposed by Rousseeuw and Croux (1993).
sn( x, constant = 1.19259855312321, finite.corr = TRUE, na.rm = FALSE, ci = FALSE, level = 0.95 )sn( x, constant = 1.19259855312321, finite.corr = TRUE, na.rm = FALSE, ci = FALSE, level = 0.95 )
x |
A numeric vector of observations. |
constant |
Consistency constant. Default is |
finite.corr |
Logical; if |
na.rm |
Logical; if |
ci |
Logical. If |
level |
Confidence level for the interval (default 0.95). |
The estimator is defined as the median of medians:
Statistical Properties.
achieves a 50% breakdown point and provides superior
statistical efficiency compared to the Median Absolute Deviation (MAD). At
the Gaussian distribution, it has an asymptotic relative efficiency
(ARE) of 0.58, which is significantly higher than the 0.37 of the MAD.
Unlike M-estimators of scale, is an explicit function of the
data and does not require an iterative solution or a prior location
estimate.
Computational Performance.
This implementation provides a highly optimized
algorithm, avoiding the complexity of the naive definition.
The execution strategy is tiered for maximum efficiency:
Optimal sorting networks are used for the target regime of
very small samples (). These networks minimize control
flow overhead and maximize pipeline utilization.
Highly tuned kernels are employed for general datasets, leveraging C++17 features for cache-aware computation.
Cache-aware parallelization via Intel TBB (Threading Building Blocks) for large-scale data, with thresholds derived from the detected per-core L2 cache size.
If ci = FALSE (default), the estimator of scale
(a scalar). If ci = TRUE, an object of class "robscale_ci".
Rousseeuw, P. J., and Croux, C. (1993). Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association, 88(424), 1273–1283. doi:10.1080/01621459.1993.10476408
Akinshin, A. (2022). Finite-sample Rousseeuw-Croux scale estimators. arXiv preprint arXiv:2209.12268.
qn for the scale estimator;
robScale for the M-estimator of scale;
adm for the average distance to median.
sn(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) sn(x) # Asymptotic confidence interval sn(x, ci = TRUE)sn(c(1:9)) x <- c(1, 2, 3, 5, 7, 8) sn(x) # Asymptotic confidence interval sn(x, ci = TRUE)