vignettes/migration_guide.Rmd
migration_guide.Rmd
Version 2.x released September 2020 offers many new features and improved speed. These changes make pense 2.x incompatible with code written for pense 1.x and results will not be identical!
The most visible changes are to functions pense()
and
pensem()
, which now only fit models but do not evaluate
prediction performance anymore. Instead, the new functions
pense_cv()
and pensem_cv()
are now used for
fitting models and estimating their prediction performance.
First we need to load the new version of the pense package:
library(pense)
#> Loading required package: Matrix
packageVersion("pense")
#> [1] '2.2.2'
This guide uses the following simulated data:
set.seed(1234)
x <- matrix(rt(50 * 10, df = 5), ncol = 10)
y <- 0.5 * x[, 1] - 2 * x[, 2] + 1.5 * x[, 3] + rt(nrow(x), df = 3)
_cv()
family of functions
The most basic usage in old versions was to call function
pense()
to fit models with nlambda
different
penalization levels and evaluate each model’s prediction performance
with K-fold cross-validation. In the new version, this will now raise an
error:
set.seed(1234)
fitted_with_cv <- pense(x, y, alpha = 0.6, nlambda = 40, warm_reset = 5L, cv_k = 5)
#> Error:
#> ! The `cv_k` argument of `pense()` was deprecated in pense 2.0.0 and is
#> now defunct.
#> ℹ Please use `pense_cv()` instead.
As the error message says, if model fitting and
cross-validation is required, use pense_cv()
instead. The
simple solution is to replace pense()
with
pense_cv()
:
set.seed(1234)
fitted_with_cv <- pense_cv(x, y, alpha = 0.6, nlambda = 40, warm_reset = 5L, cv_k = 5)
#> Error:
#> ! The `warm_reset` argument of `pense()` was deprecated in pense 2.0.0
#> and is now defunct.
#> ℹ Please use the `nlambda_enpy` argument instead.
However, this raises a warning that the argument
warm_reset=
is also deprecated in favor of argument
nlambda_enpy=
. The new version uses more consistent and
self-explaining naming for arguments. Therefore, the most basic way for
computing regularized S-estimates of linear regression and
estimating their prediction performance via CV is the following:
set.seed(1234)
fitted_with_cv <- pense_cv(x, y, alpha = 0.6, nlambda = 40, nlambda_enpy = 5L, cv_k = 5)
To only fit the models, without estimating prediction performance,
use pense()
(note the absence of the cv_k=
argument):
fitted_no_cv <- pense(x, y, alpha = 0.6, nlambda = 40, nlambda_enpy = 5L)
The structure of the returned objects is different from pense
versions 1.x. In old versions, the estimated coefficients were stored as
a sparse matrix object, with one column per fit (i.e., per penalization
level). In new versions, estimates are stored as a list with one entry
per penalization level. For fitted models with prediction performance,
as in fitted_with_cv
, the returned object additionally
contains information on the prediction performance of all fitted
models:
str(fitted_no_cv, max.level = 1)
#> List of 6
#> $ call : language pense(x = x, y = y, alpha = 0.6, nlambda = 40, nlambda_enpy = 5L)
#> $ bdp : num 0.255
#> $ lambda :List of 1
#> $ metrics :List of 1
#> $ estimates:List of 40
#> $ alpha : num 0.6
#> - attr(*, "class")= chr [1:2] "pense" "pense_fit"
str(fitted_with_cv, max.level = 1)
#> List of 9
#> $ call : language pense_cv(x = x, y = y, cv_k = 5, alpha = 0.6, nlambda = 40, nlambda_enpy = 5L)
#> $ bdp : num 0.255
#> $ lambda :List of 1
#> $ alpha : num 0.6
#> $ cvres :'data.frame': 40 obs. of 4 variables:
#> $ cv_measure: chr "tau_size"
#> $ cv_repl : int 1
#> $ metrics :List of 1
#> $ estimates :List of 40
#> - attr(*, "class")= chr [1:2] "pense" "pense_cvfit"
As in previous versions, the coefficient estimates are best accessed
via the coefficients()
method. For fits with estimated
prediction performance, the method returns the coefficients of the model
yielding the lowest prediction error:
coefficients(fitted_with_cv)
#> (Intercept) X1 X2 X3 X4
#> 0.0764169254 0.5424968148 -1.7469201595 1.4314931674 -0.1059407235
#> X5 X6 X7 X8 X9
#> 0.1410239524 0.0470255357 -0.0456642474 0.0005307165 0.2026268551
#> X10
#> 0.0733546445
Fits computed with pense()
, however, do not have
information about prediction performance, hence
coefficients(fitted_no_cv)
would not know what penalization
level to use. In this case, the desired penalization level must be
specified manually:
coefficients(fitted_no_cv, lambda = fitted_no_cv$lambda[10])
#> Error in lambda[[1L]]: subscript out of bounds
An important difference to previous versions is that new versions
do not correct for the bias introduced by the Ridge
penalty. The correction, which was adopted from the original LS elastic
net paper (Zou and Hastie 2005), was
dropped as it does not adequately counter the effects of the Ridge
penalty on the S-estimates. Specifying the correction=
argument results in an error in new versions of the package:
coefficients(fitted_with_cv, correction = TRUE)
#> Error:
#> ! The `correction` argument of `coef()` was deprecated in pense 2.0.0
#> and is now defunct.
The same applies to functions residuals()
and
predict()
for extracting residuals of the fits and using
the estimated coefficients for prediction, respectively.
Many of the optional arguments to pense()
and
pensem()
have been renamed. Options to control the
algorithms have been regrouped to remove ambiguity and redundancies. For
example, previous versions had several options to specify the numerical
tolerance used in different parts of the algorithm. This could have lead
to unstable algorithms and undesired results. In new versions, the
numerical tolerance is only specified on the top level in the call to
pense()
and friends.
In previous versions of the package, all options for the PENSE
algorithm have been set via the pense(options=)
argument
and the pense_options()
function.
In new versions, the options are either given directly to the
pense()
function, or supplied via arguments
algorithm_opts
or mscale_opts
:
pense versions 1.x | pense versions 2.x and above |
---|---|
pense_options(eps=) |
pense(eps=) |
pense_options(delta=) |
pense(bdp=) |
pense_options(cc=) |
pense(cc=) |
pense_options(maxit=) |
mm_algorithm_options(max_it=) |
pense_options(mscale_maxit=) |
mscale_algorithm_options(max_it=) |
pense_options(mscale_eps=) |
mscale_algorithm_options(eps=) |
all other options | unsupported |
For example, in previous versions the breakdown point of the PENSE
estimate was set with pense_options(delta=)
:
pense(x, y, alpha = 0.6, nlambda = 40, nlambda_enpy = 5L, options = pense_options(delta = 0.33))
In the new versions, the breakdown point is set directly in the call
to pense()
:
pense(x, y, alpha = 0.6, nlambda = 40, nlambda_enpy = 5L, bdp = 0.33)
As with pense()
, options for controlling the algorithm
used by pensem_cv()
are moved from
mstep_options()
to:
pense versions 1.x | pense versions 2.x and above |
---|---|
mstep_options(cc=) |
pensem_cv(cc=) |
mstep_options(eps=) |
pensem_cv(eps=) |
mstep_options(maxit=) |
mm_algorithm_options(max_it=) |
all other options | unsupported |
In previous versions, the EN algorithm (the workhorse to compute
PENSE estimates), was specified via pense(en_options=)
. In
new versions, the user can select the EN algorithm separately for
computing PENSE estimates and for computing starting points via ENPY
separately. Therefore, the algorithm and it’s options are now set
through arguments algorithm_opts=
and
enpy_opts=
.
pense(x, y, alpha = 0.6, nlambda = 40, nlambda_enpy = 5L,
algorithm_opts = mm_algorithm_options(en_algorithm_opts = en_lars_options()),
enpy_opts = enpy_options(en_algorithm_opts = en_admm_options()))
Moreover, functions en_options_aug_lars()
and
en_options_dal()
are replaced by
en_lars_options()
and en_dal_options()
,
respectively. The new functions accept similar arguments, but with
slightly different names. They do not accept the numerical tolerance
eps
anymore, as this is now set directly in
pense()
and friends. The LARS algorithm now always uses the
Gram matrix, afforded by a more efficient implementation (argument
use_gram
is now ignored). Similarly, the DAL algorithm
always uses a conjugate gradient pre-conditioner, doesn’t print status
information, and adaptively chooses the initial step size as either
eta_start_conservative
or
eta_start_aggressive
, based on the change in penalization
level. Hence, arguments verbosity
,
preconditioner
, and eta_start
are now
defunct:
pense versions 1.x | pense versions 2.x and above |
---|---|
en_options_dal(eps=) |
pense(eps=) |
en_options_dal(maxit=) |
en_dal_options(max_it=) |
en_options_dal(eta_mult=) |
en_dal_options(eta_multiplier=) |
en_options_dal(eta_start=) |
en_dal_options(eta_start_conservative=, eta_start_aggressive=) |
all other options | unsupported |