| Title: | Capushe, Data-Driven Slope Estimation and Dimension Jump |
|---|---|
| Description: | Calibration of penalized criteria for model selection. The calibration methods available in this package are based on the slope heuristics. |
| Authors: | Vincent Brault [cre, aut] (ORCID: <https://orcid.org/0000-0002-3629-3429>), Sylvain Arlot [ctb], Jean-Patrick Baudry [ctb], Cathy Maugis [ctb], Bertrand Michel [ctb] |
| Maintainer: | Vincent Brault <[email protected]> |
| License: | GPL (>= 2.0) |
| Version: | 1.1.3 |
| Built: | 2026-05-09 06:53:06 UTC |
| Source: | https://github.com/cran/capushe |
These functions return the model selected by the Akaike Information Criterion (AIC).
AICcapushe(data,n)AICcapushe(data,n)
data |
|
n |
|
The penalty shape value should be increasing with respect to the complexity value (column 3).
The complexity values have to be positive.
n is necessary to compute AIC and BIC criteria. n is the size of
sample used to compute the contrast values given in the data matrix.
Do not confuse n with the size of the model collection which is the number
of rows of the data matrix.
model The model selected by AIC.
data(datacapushe) AICcapushe(datacapushe,n=1000)data(datacapushe) AICcapushe(datacapushe,n=1000)
These functions return the model selected by the Bayesian Information Criterion (BIC).
BICcapushe(data,n)BICcapushe(data,n)
data |
|
n |
|
The penalty shape value should be increasing with respect to the complexity value (column 3).
The complexity values have to be positive.
n is necessary to compute AIC and BIC criteria. n is the size of
sample used to compute the contrast values given in the data matrix.
Do not confuse n with the size of the model collection which is the number
of rows of the data matrix.
model The model selected by BIC.
data(datacapushe) BICcapushe(datacapushe,n=1000)data(datacapushe) BICcapushe(datacapushe,n=1000)
The capushe function proposes two algorithms based on the slope heuristics
to calibrate penalties in the context of model selection via penalization.
capushe(data,n=0,pct=0.15,point=0,psi.rlm=psi.bisquare,scoef=2,Careajump=0,Ctresh=0)capushe(data,n=0,pct=0.15,point=0,psi.rlm=psi.bisquare,scoef=2,Careajump=0,Ctresh=0)
data |
|
n |
|
pct |
Minimum percentage of points for the plateau selection.
See |
point |
Minimum number of point for the plateau selection (See |
psi.rlm |
Weight function used by |
scoef |
Ratio parameter. Default value is 2. |
Careajump |
Constant of jump area (See |
Ctresh |
Maximal treshold for the complexity associated to the penalty coefficient (See |
The model selected by the procedure fulfills
argmin
where
is the penalty coefficient.
is the empirical contrast.
is the estimator for the model .
is the ratio parameter.
is the penalty shape.
The capushe function calls the functions DDSE and
Djump to calibrate , see the description of these functions
for more details.
In the case of equality between two penalty shape values, only the model with the
smallest contrast is considered.
Vincent Brault
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
Djump, DDSE, AIC
or BIC to use only one of these model selection functions.
plot for graphical displays of DDSE
and Djump.
data(datacapushe) capushe(datacapushe) capushe(datacapushe,1000)data(datacapushe) capushe(datacapushe) capushe(datacapushe,1000)
Class of object returned by the capushe function.
object |
an object with class |
DDSEA list returned by the DDSE function.
DjumpA list returned by the Djump function.
AIC_capusheA list returned by the AICcapushe function.
BIC_capusheA list returned by the BICcapushe function.
nThe number of observations given by the user.
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
See also plot,Capushe-method and capushe.
A dataframe example for the capushe package based on a simulated Gaussian
mixture dataset in .
data(datacapushe)data(datacapushe)
A data frame with 50 rows (models) and the following 4 variables:
modela character vector: model names.
pena numeric vector: model penalty shape values.
complexitya numeric vector: model complexity values.
contrasta numeric vector: model contrast values.
The simulated dataset is composed of observations in . It
consists of an equiprobable mixture of three large "bubble" groups centered at
, and respectively. Each
bubble group is simulated from a mixture of seven components according
to the following density distribution:
with , , , ,
, and . Thus the
distribution of the dataset is actually a -component Gaussian mixture.
A model collection of spherical Gaussian mixtures is considered and the dataframe
datacapushe contains the maximum likelihood estimations for each of these models.
The number of free parameters of each model is used for the complexity values and
is defined by this complexity divided by .
datapartialcapushe and datavalidcapushe can be used to run the
validation function. datapartialcapushe only
contains the models with less than components. datavalidcapushe
contains three models with , and components respectively.
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe),newwindow=FALSE) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe.data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe),newwindow=FALSE) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe.
A dataframe example for the capushe package based on a simulated Gaussian
mixture dataset in .
data(datapartialcapushe)data(datapartialcapushe)
A data frame with 21 rows (models) and the following 4 variables:
modela character vector: model names.
pena numeric vector: model penalty shape values.
complexitya numeric vector: model complexity values.
contrasta numeric vector: model contrast values.
The simulated dataset is composed of observations in . It
consists of an equiprobable mixture of three large "bubble" groups centered at
, and respectively. Each
bubble group is simulated from a mixture of seven components according
to the following density distribution:
with , , , ,
, and . Thus the
distribution of the dataset is actually a -component Gaussian mixture.
A model collection of spherical Gaussian mixtures is considered and the dataframe
datacapushe contains the maximum likelihood estimations for each of these models.
The number of free parameters of each model is used for the complexity values and
is defined by this complexity divided by .
datapartialcapushe and datavalidcapushe can be used to run the
validation function. datapartialcapushe only
contains the models with less than components. datavalidcapushe
contains three models with , and components respectively.
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe),newwindow=FALSE) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe.data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe),newwindow=FALSE) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe.
A dataframe example for the capushe package based on a simulated Gaussian
mixture dataset in .
data(datavalidcapushe)data(datavalidcapushe)
A data frame with 3 rows (models) and the following 4 variables:
modela character vector: model names.
pena numeric vector: model penalty shape values.
complexitya numeric vector: model complexity values.
contrasta numeric vector: model contrast values.
The simulated dataset is composed of observations in . It
consists of an equiprobable mixture of three large "bubble" groups centered at
, and respectively. Each
bubble group is simulated from a mixture of seven components according
to the following density distribution:
with , , , ,
, and . Thus the
distribution of the dataset is actually a -component Gaussian mixture.
A model collection of spherical Gaussian mixtures is considered and the dataframe
datacapushe contains the maximum likelihood estimations for each of these models.
The number of free parameters of each model is used for the complexity values and
is defined by this complexity divided by .
datapartialcapushe and datavalidcapushe can be used to run the
validation function. datapartialcapushe only
contains the models with less than components. datavalidcapushe
contains three models with , and components respectively.
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe),newwindow=FALSE) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe.data(datacapushe) capushe(datacapushe,n=1000) ## BIC, DDSE and Djump all three select the true model plot(capushe(datacapushe),newwindow=FALSE) ## Validation: data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe.
DDSE is a model selection function based on the slope heuristics.
DDSE(data, pct = 0.15, point = 0, psi.rlm = psi.bisquare, scoef = 2)DDSE(data, pct = 0.15, point = 0, psi.rlm = psi.bisquare, scoef = 2)
data |
|
pct |
Minimum percentage of points for the plateau selection. It must be between 0 and 1. Default value is 0.15. |
point |
Minimum number of point for the plateau selection.
If |
psi.rlm |
Weight function used by |
scoef |
Ratio parameter. Default value is 2. |
Let be the model collection and .
The DDSE algorithm proceeds in four steps:
If several models in the collection have the same penalty shape value (column 2),
only the model having the smallest contrast value (column 4)
is considered.
For any , the slope (argument @kappa) of the linear regression
(argument psi.rlm) on the couples of points
is computed.
For any , the model fulfilling the following condition is selected:
argmin .
This gives an increasing sequence of change-points (output
@ModelHat$point_breaking). Let (output @ModelHat$number_plateau)
be the lengths of each "plateau".
If point is different from 0, let max
else let max
(output @ModelHat$imax).
The model (output @model) is finally returned.
The "slope interval" is the interval where
and .
@model |
The |
@kappa |
The vector of the successive slope values. |
@ModelHat |
A list describing the algorithm. |
@ModelHat$model_hat |
The vector of preselected models |
@ModelHat$point_breaking |
The vector of the breaking points |
@ModelHat$number_plateau |
The vector of the lengths |
@ModelHat$imax |
The rank |
@interval |
A list about the "slope interval". |
@interval$interval |
The slope interval. |
@interval$percent_of_points |
The proportion |
@graph |
A list computed for the |
Vincent Brault
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
capushe for a model selection function including AIC,
BIC, the DDSE algorithm and the Djump algorithm.
plot for graphical dsiplays of the DDSE algorithm
and the Djump algorithm.
Djump is a model selection function based on the slope heuristics.
Djump(data,scoef=2,Careajump=0,Ctresh=0) Djump(data, scoef = 2, Careajump = 0, Ctresh = 0)Djump(data,scoef=2,Careajump=0,Ctresh=0) Djump(data, scoef = 2, Careajump = 0, Ctresh = 0)
data |
|
scoef |
Ratio parameter. Default value is 2. |
Careajump |
Constant of jump area (See |
Ctresh |
Maximal treshold for the complexity associated to the penalty coefficient (See |
Djump is a model selection function based on the slope heuristics.
The Djump algorithm proceeds in three steps:
For all , compute
This gives a decreasing step function .
Find such that corresponds to the
greatest jump of complexity if else such that
Select (output @model).
Arlot has proposed a jump area containing the maximal jump defined by :
If , Djump return the area with the greatest jump. In practice,
it is advisable to take where is the number of observations.
The Djump algorithm proceeds in three steps:
For all , compute
This gives a decreasing step function .
Find such that corresponds to the
greatest jump of complexity if else such that
Select (output @model).
Arlot has proposed a jump area containing the maximal jump defined by :
If , Djump return the area with the greatest jump. In practice,
it is advisable to take where is the number of observations.
@model |
The |
@ModelHat |
A list describing the algorithm. |
@ModelHat$jump |
The vector of jump heights. |
@ModelHat$kappa |
The vector of the values of |
@ModelHat$model_hat |
The vector of the selected models |
@ModelHat$JumpMax |
The location of the greatest jump. |
@ModelHat$Kopt |
|
@graph |
A list computed for the |
@modelThe model selected by the dimension jump method.
@ModelHatA list describing the algorithm.
@ModelHat$jumpThe vector of jump heights.
@ModelHat$kappaThe vector of the values of at each jump.
@ModelHat$model_hatThe vector of the selected models by the jump.
@ModelHat$JumpMaxThe location of the greatest jump.
@ModelHat$Kopt.
@graphA list computed for the plot method.
modelcharacter. The model selected by the dimension jump method.
ModelHatlist. A list describing the algorithm.
jump The vector of jump heights.
kappa The vector of the values of at each jump.
model_hat The vector of the selected models by the jump.
JumpMax The location of the greatest jump.
Kopt .
graphlist.
Arealist.
graphlist.
Arealist.
Vincent Brault
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
capushe for a model selection function including AIC,
BIC, the DDSE algorithm and the Djump algorithm.
plot for a graphical display of the DDSE
algorithm and the Djump algorithm.
capushe for a model selection function including AIC,
BIC, the DDSE algorithm and the Djump algorithm.
plot for a graphical display of the DDSE
algorithm and the Djump algorithm.
data(datacapushe) Djump(datacapushe) res <- Djump(datacapushe) plot(res,newwindow=FALSE) res <- Djump(datacapushe,Careajump=sqrt(log(1000)/1000)) plot(res,newwindow=FALSE) res <- Djump(datacapushe,Ctresh=1000/log(1000)) plot(res,newwindow=FALSE) data(datacapushe) Djump(datacapushe) plot(Djump(datacapushe),newwindow=FALSE) Djump(datacapushe,Careajump=sqrt(log(1000)/1000)) plot(Djump(datacapushe,Careajump=sqrt(log(1000)/1000)),newwindow=FALSE) Djump(datacapushe,Ctresh=1000/log(1000)) plot(Djump(datacapushe,Ctresh=1000/log(1000)),newwindow=FALSE)data(datacapushe) Djump(datacapushe) res <- Djump(datacapushe) plot(res,newwindow=FALSE) res <- Djump(datacapushe,Careajump=sqrt(log(1000)/1000)) plot(res,newwindow=FALSE) res <- Djump(datacapushe,Ctresh=1000/log(1000)) plot(res,newwindow=FALSE) data(datacapushe) Djump(datacapushe) plot(Djump(datacapushe),newwindow=FALSE) Djump(datacapushe,Careajump=sqrt(log(1000)/1000)) plot(Djump(datacapushe,Careajump=sqrt(log(1000)/1000)),newwindow=FALSE) Djump(datacapushe,Ctresh=1000/log(1000)) plot(Djump(datacapushe,Ctresh=1000/log(1000)),newwindow=FALSE)
The plot methods allow the user to check that the slope heuristics can be applied confidently.
signature(x = "Capushe") This graphical function displays the DDSE plot and the Djump plot.
signature(x = "DDSE") This graphical function displays the DDSE plot.
signature(x = "Djump") This graphical function displays the Djump plot.
plot(x,y, ...)plot(x,y, ...)
x |
|
... |
other arguments :
|
y |
is unused. |
The graphical window of DDSE is composed of three graphics (see DDSE for more details):
The left plot shows with respect to the
penalty shape values.
Successive slope values .
The bottomright plot shows the selected models with respect
to the successive slope values. The plateau in blue is selected.
The graphical window of Djump shows the complexity of
the selected model with respect to . corresponds
to the greatest jump. is defined by .
The red line represents the slope interval computed by the DDSE algorithm
(only for capushe). See Djump for more details.
Use newwindow=FALSE to produce a PDF files (for an object of class capushe, use moreover ask=FALSE).
validation checks that the slope heuristics can be applied confidently.
validation(x,data2,...)validation(x,data2,...)
x |
|
data2 |
|
... |
|
The validation function plots the additional and more complex models data2
to check that the linear relation between the penalty shape values and the contrast
values (which is recorded in x) is valid for the more complex models.
Brault Vincent
Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/s11222-011-9236-1
capushe for a more general model selection function including
AIC, BIC, the DDSE
algorithm and the Djump algorithm.
data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe. data(datacapushe) plot(capushe(datacapushe),newwindow=FALSE)data(datapartialcapushe) capushepartial=capushe(datapartialcapushe) data(datavalidcapushe) validation(capushepartial,datavalidcapushe,newwindow=FALSE) ## The slope heuristics should not ## be applied for datapartialcapushe. data(datacapushe) plot(capushe(datacapushe),newwindow=FALSE)