Title: | Group Regression Models for Risk Protein Complex Identification |
---|---|
Description: | Two protein complex-based group regression models (PCLasso and PCLasso2) for risk protein complex identification. PCLasso is a prognostic model that identifies risk protein complexes associated with survival. PCLasso2 is a classification model that identifies risk protein complexes associated with classes. For more information, see Wang and Liu (2021) <doi:10.1093/bib/bbab212>. |
Authors: | Wei Liu [aut, cre] |
Maintainer: | Wei Liu <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2025-02-16 05:01:30 UTC |
Source: | https://github.com/weiliu123/pclassoreg |
A dataset for classification
classData
classData
A list containing a protein expression matrix and a response vector
a protein expression matrix
a response vector
PCLasso
Perform k-fold cross validations for the PCLasso model with grouped
covariates over a grid of values for the regularization parameter
lambda
.
cv.PCLasso( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), nfolds = 5, standardize = TRUE, ... )
cv.PCLasso( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), nfolds = 5, standardize = TRUE, ... )
x |
A n x p design matrix of gene/protein expression measurements with n
samples and p genes/proteins, as in |
y |
The time-to-event outcome, as a two-column matrix or |
group |
A list of groups as in |
penalty |
The penalty to be applied to the model. For group selection,
one of grLasso, grMCP, or grSCAD. For bi-level selection, one of gel or
cMCP. See |
nfolds |
The number of cross-validation folds. Default is 5. |
standardize |
Logical flag for |
... |
Arguments to be passed to |
The function calls PCLasso
nfolds
times, each time
leaving out 1/nfolds
of the data. The cross-validation error is based
on the deviance. The numbers for censored samples are balanced across the
folds. cv.PCLasso
uses the approach of calculating the full Cox
partial likelihood using the cross-validated set of linear predictors. See
cv.grpsurv
in the R package grpreg
for details.
An object with S3 class "cv.PCLasso" containing:
cv.fit |
An object of class "cv.grpsurv". |
complexes.dt |
Complexes with
features (genes/proteins) not included in |
Wei Liu
PCLasso: a protein complex-based, group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform, 2021.
Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit model cv.fit1 <- cv.PCLasso(x, y, group = PC.Human, penalty = "grLasso", nfolds = 10)
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit model cv.fit1 <- cv.PCLasso(x, y, group = PC.Human, penalty = "grLasso", nfolds = 10)
PCLasso2
Perform k-fold cross validations for the PCLasso2 model with grouped
covariates over a grid of values for the regularization parameter
lambda
.
cv.PCLasso2( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), family = c("binomial", "gaussian", "poisson"), nfolds = 5, gamma = 8, standardize = TRUE, ... )
cv.PCLasso2( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), family = c("binomial", "gaussian", "poisson"), nfolds = 5, gamma = 8, standardize = TRUE, ... )
x |
A n x p design matrix of gene/protein expression measurements with n
samples and p genes/proteins, as in |
y |
The response vector. |
group |
A list of groups as in |
penalty |
The penalty to be applied to the model. For group selection,
one of grLasso, grMCP, or grSCAD. See |
family |
Either "binomial" or "gaussian", depending on the response. |
nfolds |
The number of cross-validation folds. Default is 5. |
gamma |
Tuning parameter of the |
standardize |
Logical flag for |
... |
Arguments to be passed to |
The function calls PCLasso2
nfolds
times, each time
leaving out 1/nfolds
of the data. The cross-validation error is based
on the deviance. The numbers for each class are balanced across the folds;
i.e., the number of outcomes in which y is equal to 1 is the same for each
fold, or possibly off by 1 if the numbers do not divide evenly. See
cv.grpreg
in the R package grpreg
for details.
An object with S3 class "cv.PCLasso2" containing:
cv.fit |
An object of class "cv.grpreg". |
complexes.dt |
Complexes with features
(genes/proteins) not included in |
Wei Liu
PCLasso2: a protein complex-based, group Lasso-logistic model for risk protein complex discovery. To be published.
Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit model cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grLasso", family = "binomial", nfolds = 5) cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grSCAD", family = "binomial", nfolds = 5, gamma = 10) cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grMCP", family = "binomial", nfolds = 5, gamma = 15)
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit model cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grLasso", family = "binomial", nfolds = 5) cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grSCAD", family = "binomial", nfolds = 5, gamma = 10) cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grMCP", family = "binomial", nfolds = 5, gamma = 15)
get protein complexes
getPCGroups( Groups, Organism = c("Human", "Mouse", "Rat", "Mammalia", "Bovine", "Dog", "Rabbit"), Type = c("GeneSymbol", "EntrezID", "UniprotID") )
getPCGroups( Groups, Organism = c("Human", "Mouse", "Rat", "Mammalia", "Bovine", "Dog", "Rabbit"), Type = c("GeneSymbol", "EntrezID", "UniprotID") )
Groups |
A data frame containing the protein complexes |
Organism |
Organism. one of |
Type |
The name type of the proteins in the protein complexes. One of
|
A list of protein complexes
data(PCGroups) PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol")
data(PCGroups) PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol")
A dataset containing the protein complexes
PCGroups
PCGroups
A data frame with 3512 rows and 6 variables:
ID of the protein complex
name of the protein complex
organism
Uniprot IDs of the proteins in the protein complex
Entrez IDs of the proteins in the protein complex
gene symbols of the proteins in the protein complex
https://mips.helmholtz-muenchen.de/corum/
Construct a PCLasso model based on a gene/protein expression matrix, survival data, and protein complexes.
PCLasso( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), standardize = TRUE, ... )
PCLasso( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), standardize = TRUE, ... )
x |
A n x p matrix of gene/protein expression measurements with n samples and p genes/proteins. |
y |
The time-to-event outcome, as a two-column matrix or |
group |
A list of groups. The feature (gene/protein) names in
|
penalty |
The penalty to be applied to the model. For group selection,
one of grLasso, grMCP, or grSCAD. See |
standardize |
Logical flag for |
... |
Arguments to be passed to |
The function PCLasso
implements the PCLasso model when the
parameter penalty
is set to "grLasso". The PCLasso model is a
prognostic model which selects important predictors at the protein complex
level to achieve accurate prognosis and identify risk protein complexes.
The PCLasso model has three inputs: a gene expression matrix, survival
data, and protein complexes. It estimates the correlation between gene
expression in protein complexes and survival data at the level of protein
complexes. Similar to the traditional Lasso-Cox model, PCLasso is based on
the Cox PH model and estimates the Cox regression coefficients by
maximizing partial likelihood with regularization penalty. The difference
is that PCLasso selects features at the level of protein complexes rather
than individual genes. Considering that genes usually function by forming
protein complexes, PCLasso regards genes belonging to the same protein
complex as a group, and constructs a l1/l2 penalty based on the sum (i.e.,
l1 norm) of the l2 norms of the regression coefficients of the group
members to perform the selection of features at the group level. Since a
gene may belong to multiple protein complexes, that is, there is overlap
between protein complexes, the classical group Lasso-Cox model for
non-overlapping groups may lead to false sparse solutions. The PCLasso
model deals with the overlapping problem of protein complexes by
constructing a latent group Lasso-Cox model. And by reconstructing the gene
expression matrix of the protein complexes, the latent group Lasso-Cox
model is transformed into a non-overlapping group Lasso-Cox model in an
expanded space, which can be directly solved using the classical group
Lasso method. Through the final sparse solution, we can predict the
patient's risk score based on a small set of protein complexes and identify
risk protein complexes that are frequently selected to construct prognostic
models. The penalty parameters grSCAD
and grMCP
can also be
used to identify survival-related risk protein complexes. Their penalty for
large coefficients is smaller than grLasso
, so they tend to choose
less risk protein complexes.
An object with S3 class \code{PCLasso} containing:
fit |
An object of class |
complexes.dt |
Complexes with features (genes/proteins) not included
in |
PCLasso: a protein complex-based, group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform, 2021.
Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit PCLasso model fit.PCLasso <- PCLasso(x, y, group = PC.Human, penalty = "grLasso") # fit PCSCAD model fit.PCSCAD <- PCLasso(x, y, group = PC.Human, penalty = "grSCAD") # fit PCMCP model fit.PCMCP <- PCLasso(x, y, group = PC.Human, penalty = "grMCP")
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit PCLasso model fit.PCLasso <- PCLasso(x, y, group = PC.Human, penalty = "grLasso") # fit PCSCAD model fit.PCSCAD <- PCLasso(x, y, group = PC.Human, penalty = "grSCAD") # fit PCMCP model fit.PCMCP <- PCLasso(x, y, group = PC.Human, penalty = "grMCP")
Protein complex-based group Lasso-logistic model
PCLasso2( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), family = c("binomial", "gaussian", "poisson"), gamma = 8, standardize = TRUE, ... )
PCLasso2( x, y, group, penalty = c("grLasso", "grMCP", "grSCAD"), family = c("binomial", "gaussian", "poisson"), gamma = 8, standardize = TRUE, ... )
x |
A n x p matrix of gene/protein expression measurements with n samples and p genes/proteins. |
y |
The response vector. |
group |
A list of groups. The feature (gene/protein) names in |
penalty |
The penalty to be applied to the model. For group selection,
one of grLasso, grMCP, or grSCAD. See |
family |
Either "binomial" or "gaussian", depending on the response. |
gamma |
Tuning parameter of the |
standardize |
Logical flag for |
... |
Arguments to be passed to |
The PCLasso2 model is a classification model that selects important
predictors at the protein complex level to achieve accurate classification
and identify risk protein complexes. The PCLasso2 model has three inputs: a
protein expression matrix, a vector of binary response variables, and a
number of known protein complexes. It estimates the correlation between
protein expression and response variable at the level of protein complexes.
Similar to traditional Lasso-logistic model, PCLasso2 is based on the
logistic regression model and estimates the logistic regression coefficients
by maximizing likelihood function with regularization penalty. The
difference is that PCLasso2 selects features at the level of protein
complexes rather than individual proteins. Considering that proteins usually
function by forming protein complexes, PCLasso2 regards proteins belonging
to the same protein complex as a group and constructs a group Lasso penalty
(l1/l2 penalty) based on the sum (i.e. l1 norm) of the l2 norms of the
regression coefficients of the group members to perform the selection of
features at the group level. With the group Lasso penalty, PCLasso2 trains
the logistic regression model and obtains a sparse solution at the protein
complex level, that is, the proteins belonging to a protein complex are
either wholly included or wholly excluded from the model. PCLasso2 outputs a
prediction model and a small set of protein complexes included in the model,
which are referred to as risk protein complexes. The PCSCAD and PCMCP are
performed by setting the penalty parameter penalty
as grSCAD
and grMCP
, respectively.
An object with S3 class PCLasso2
containing:
fit |
An object of class |
Complexes.dt |
Complexes with features (genes/proteins) not included
in |
PCLasso2: a protein complex-based, group Lasso-logistic model for risk protein complex discovery. To be published.
PCLasso: a protein complex-based, group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform, 2021.
Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit PCLasso2 model fit.PCLasso2 <- PCLasso2(x, y, group = PC.Human, penalty = "grLasso", family = "binomial") # fit PCSCAD model fit.PCSCAD <- PCLasso2(x, y, group = PC.Human, penalty = "grSCAD", family = "binomial", gamma = 10) # fit PCMCP model fit.PCMCP <- PCLasso2(x, y, group = PC.Human, penalty = "grMCP", family = "binomial", gamma = 9)
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit PCLasso2 model fit.PCLasso2 <- PCLasso2(x, y, group = PC.Human, penalty = "grLasso", family = "binomial") # fit PCSCAD model fit.PCSCAD <- PCLasso2(x, y, group = PC.Human, penalty = "grSCAD", family = "binomial", gamma = 10) # fit PCMCP model fit.PCMCP <- PCLasso2(x, y, group = PC.Human, penalty = "grMCP", family = "binomial", gamma = 9)
Two protein complex-based group regression models (PCLasso and PCLasso2) for risk protein complex identification. PCLasso is a prognostic model that identifies risk protein complexes associated with survival. PCLasso2 is a classification model that identifies risk protein complexes associated with classes. For more information, see Wang and Liu (2021) <doi:10.1093/bib/bbab212>.
The PCLasso model accepts a protein expression matrix, survival data, and protein complexes for training the prognostic model, and makes predictions for new samples and identifies risk protein complexes associated with survival.
The PCLasso2 model accepts a protein expression matrix, a response vector, and protein complexes for training the classification model, and makes predictions for new samples and identifies risk protein complexes associated with classes.
Both PCLasso and PCLasso2 use grLasso
as the penalty function. The
other two penalties grSCAD
and grMCP
can also be used for model
construction and risk protein complex identification. The package also
provides methods for plotting coefficient paths and cross-validation curves.
PCLasso2: a protein complex-based, group Lasso-logistic model for risk protein complex discovery. To be published.
PCLasso: a protein complex-based group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform, 2021.
Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.
cv.PCLasso
objectPlot the cross-validation curve from a cv.PCLasso
object,
along with standard error bars.
## S3 method for class 'cv.PCLasso' plot(x, type = c("cve", "rsq", "snr", "all"), norm = NULL, ...)
## S3 method for class 'cv.PCLasso' plot(x, type = c("cve", "rsq", "snr", "all"), norm = NULL, ...)
x |
Fitted |
type |
What to plot on the vertical axis. "cve" plots the cross-validation error (deviance); "rsq" plots an estimate of the fraction of the deviance explained by the model (R-squared); "snr" plots an estimate of the signal-to-noise ratio; "all" produces all of the above. |
norm |
If TRUE, plot the norm of each group, rather than the individual coefficients. |
... |
Other graphical parameters to |
Error bars representing approximate +/- 1 SE (68% confidence
intervals) are plotted along with the estimates at value of lambda. See
plot.cv.grpreg
in the R package grpreg
for details.
No return value, called for plotting of cv.PCLasso
objects.
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit model cv.fit1 <- cv.PCLasso(x, y, group = PC.Human, penalty = "grLasso", nfolds = 10) # plot the norm of each group plot(cv.fit1, norm = TRUE) # plot the individual coefficients plot(cv.fit1, norm = FALSE) # plot the cross-validation error (deviance) plot(cv.fit1, type = "cve")
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit model cv.fit1 <- cv.PCLasso(x, y, group = PC.Human, penalty = "grLasso", nfolds = 10) # plot the norm of each group plot(cv.fit1, norm = TRUE) # plot the individual coefficients plot(cv.fit1, norm = FALSE) # plot the cross-validation error (deviance) plot(cv.fit1, type = "cve")
cv.PCLasso2
objectPlot the cross-validation curve from a cv.PCLasso2
object, along with standard error bars.
## S3 method for class 'cv.PCLasso2' plot(x, type = c("cve", "rsq", "snr", "all"), norm = NULL, ...)
## S3 method for class 'cv.PCLasso2' plot(x, type = c("cve", "rsq", "snr", "all"), norm = NULL, ...)
x |
Fitted |
type |
What to plot on the vertical axis. "cve" plots the cross-validation error (deviance); "rsq" plots an estimate of the fraction of the deviance explained by the model (R-squared); "snr" plots an estimate of the signal-to-noise ratio; "all" produces all of the above. |
norm |
If TRUE, plot the norm of each group, rather than the individual coefficients. |
... |
Other graphical parameters to |
Error bars representing approximate +/- 1 SE (68% confidence
intervals) are plotted along with the estimates at value of lambda. See
plot.cv.grpreg
in the R package grpreg
for details.
No return value, called for plotting of cv.PCLasso2
objects.
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit model cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grLasso", family = "binomial", nfolds = 10) # plot the norm of each group plot(cv.fit1, norm = TRUE) # plot the individual coefficients plot(cv.fit1, norm = FALSE) # plot the cross-validation error (deviance) plot(cv.fit1, type = "cve")
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit model cv.fit1 <- cv.PCLasso2(x, y, group = PC.Human, penalty = "grLasso", family = "binomial", nfolds = 10) # plot the norm of each group plot(cv.fit1, norm = TRUE) # plot the individual coefficients plot(cv.fit1, norm = FALSE) # plot the cross-validation error (deviance) plot(cv.fit1, type = "cve")
Produces a plot of the coefficient paths for a fitted
PCLasso
object.
## S3 method for class 'PCLasso' plot(x, norm = TRUE, ...)
## S3 method for class 'PCLasso' plot(x, norm = TRUE, ...)
x |
Fitted |
norm |
If TRUE, plot the norm of each group, rather than the individual coefficients. |
... |
Other graphical parameters to |
No return value, called for plotting of PCLasso
objects.
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit PCLasso model fit.PCLasso <- PCLasso(x, y, group = PC.Human, penalty = "grLasso") # plot the norm of each group plot(fit.PCLasso, norm = TRUE) # plot the individual coefficients plot(fit.PCLasso, norm = FALSE)
# load data data(survivalData) data(PCGroups) x = survivalData$Exp y = survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") # fit PCLasso model fit.PCLasso <- PCLasso(x, y, group = PC.Human, penalty = "grLasso") # plot the norm of each group plot(fit.PCLasso, norm = TRUE) # plot the individual coefficients plot(fit.PCLasso, norm = FALSE)
Produces a plot of the coefficient paths for a fitted
PCLasso2
object.
## S3 method for class 'PCLasso2' plot(x, norm = TRUE, ...)
## S3 method for class 'PCLasso2' plot(x, norm = TRUE, ...)
x |
Fitted |
norm |
If TRUE, plot the norm of each group, rather than the individual coefficients. |
... |
Other graphical parameters to |
No return value, called for plotting of PCLasso2
objects.
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit PCLasso2 model fit.PCLasso2 <- PCLasso2(x, y, group = PC.Human, penalty = "grLasso") # plot the norm of each group plot(fit.PCLasso2, norm = TRUE) # plot the individual coefficients plot(fit.PCLasso2, norm = FALSE)
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") # fit PCLasso2 model fit.PCLasso2 <- PCLasso2(x, y, group = PC.Human, penalty = "grLasso") # plot the norm of each group plot(fit.PCLasso2, norm = TRUE) # plot the individual coefficients plot(fit.PCLasso2, norm = FALSE)
Similar to other predict methods, this function returns
predictions from a fitted cv.PCLasso
object, using the optimal value
chosen for lambda
.
## S3 method for class 'cv.PCLasso' predict( object, x = NULL, type = c("link", "response", "survival", "median", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
## S3 method for class 'cv.PCLasso' predict( object, x = NULL, type = c("link", "response", "survival", "median", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
object |
Fitted |
x |
Matrix of values at which predictions are to be made. The features
(genes/proteins) contained in |
type |
Type of prediction: "link" returns the linear predictors; "response" gives the risk (i.e., exp(link)); "vars" returns the indices for the nonzero coefficients; "vars.unique" returns unique features (genes/proteins) with nonzero coefficients (If a feature belongs to multiple groups and multiple groups are selected, the feature will be repeatedly selected. Compared with "var", "var.unique" will filter out repeated features.); "groups" returns the groups with at least one nonzero coefficient; "nvars" returns the number of nonzero coefficients; "nvars.unique" returens the number of unique features (genes/proteins) with nonzero coefficients; "ngroups" returns the number of groups with at least one nonzero coefficient; "norm" returns the L2 norm of the coefficients in each group."survival" returns the estimated survival function; "median" estimates median survival times. |
lambda |
Values of the regularization parameter |
... |
Arguments to be passed to |
The object returned depends on type
.
# load data data(survivalData) data(PCGroups) x <- survivalData$Exp y <- survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train,] x.test <- x[-idx.train,] y.test <- y[-idx.train,] # fit cv.PCLasso model cv.fit1 <- cv.PCLasso(x = x.train, y = y.train, group = PC.Human, nfolds = 5) # predict risk scores of samples in x.test s <- predict(object = cv.fit1, x = x.test, type="link", lambda=cv.fit1$cv.fit$lambda.min) # Nonzero coefficients sel.groups <- predict(object = cv.fit1, type="groups", lambda = cv.fit1$cv.fit$lambda.min) sel.ngroups <- predict(object = cv.fit1, type="ngroups", lambda = cv.fit1$cv.fit$lambda.min) sel.vars.unique <- predict(object = cv.fit1, type="vars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.nvars.unique <- predict(object = cv.fit1, type="nvars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.vars <- predict(object = cv.fit1, type="vars", lambda=cv.fit1$cv.fit$lambda.min) sel.nvars <- predict(object = cv.fit1, type="nvars", lambda=cv.fit1$cv.fit$lambda.min)
# load data data(survivalData) data(PCGroups) x <- survivalData$Exp y <- survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train,] x.test <- x[-idx.train,] y.test <- y[-idx.train,] # fit cv.PCLasso model cv.fit1 <- cv.PCLasso(x = x.train, y = y.train, group = PC.Human, nfolds = 5) # predict risk scores of samples in x.test s <- predict(object = cv.fit1, x = x.test, type="link", lambda=cv.fit1$cv.fit$lambda.min) # Nonzero coefficients sel.groups <- predict(object = cv.fit1, type="groups", lambda = cv.fit1$cv.fit$lambda.min) sel.ngroups <- predict(object = cv.fit1, type="ngroups", lambda = cv.fit1$cv.fit$lambda.min) sel.vars.unique <- predict(object = cv.fit1, type="vars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.nvars.unique <- predict(object = cv.fit1, type="nvars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.vars <- predict(object = cv.fit1, type="vars", lambda=cv.fit1$cv.fit$lambda.min) sel.nvars <- predict(object = cv.fit1, type="nvars", lambda=cv.fit1$cv.fit$lambda.min)
Similar to other predict methods, this function returns predictions from a
fitted cv.PCLasso2
object, using the optimal value chosen for
lambda
.
## S3 method for class 'cv.PCLasso2' predict( object, x = NULL, type = c("link", "response", "class", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
## S3 method for class 'cv.PCLasso2' predict( object, x = NULL, type = c("link", "response", "class", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
object |
Fitted |
x |
Matrix of values at which predictions are to be made. The features
(genes/proteins) contained in |
type |
Type of prediction: "link" returns the linear predictors; "response" gives the risk (i.e., exp(link)); "class" returns the binomial outcome with the highest probability; "vars" returns the indices for the nonzero coefficients; "vars.unique" returns unique features (genes/proteins) with nonzero coefficients (If a feature belongs to multiple groups and multiple groups are selected, the feature will be repeatedly selected. Compared with "var", "var.unique" will filter out repeated features.); "groups" returns the groups with at least one nonzero coefficient; "nvars" returns the number of nonzero coefficients; "nvars.unique" returns the number of unique features (genes/proteins) with nonzero coefficients; "ngroups" returns the number of groups with at least one nonzero coefficient; "norm" returns the L2 norm of the coefficients in each group. |
lambda |
Values of the regularization parameter |
... |
Arguments to be passed to |
The object returned depends on type
.
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") #' set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train] x.test <- x[-idx.train,] y.test <- y[-idx.train] # fit model cv.fit1 <- cv.PCLasso2(x = x.train, y = y.train, group = PC.Human, penalty = "grLasso", family = "binomial", nfolds = 10) # predict risk scores of samples in x.test s <- predict(object = cv.fit1, x = x.test, type="link", lambda=cv.fit1$cv.fit$lambda.min) # predict classes of samples in x.test s <- predict(object = cv.fit1, x = x.test, type="class", lambda=cv.fit1$cv.fit$lambda.min) # Nonzero coefficients sel.groups <- predict(object = cv.fit1, type="groups", lambda = cv.fit1$cv.fit$lambda.min) sel.ngroups <- predict(object = cv.fit1, type="ngroups", lambda = cv.fit1$cv.fit$lambda.min) sel.vars.unique <- predict(object = cv.fit1, type="vars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.nvars.unique <- predict(object = cv.fit1, type="nvars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.vars <- predict(object = cv.fit1, type="vars", lambda=cv.fit1$cv.fit$lambda.min) sel.nvars <- predict(object = cv.fit1, type="nvars", lambda=cv.fit1$cv.fit$lambda.min)
# load data data(classData) data(PCGroups) x = classData$Exp y = classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") #' set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train] x.test <- x[-idx.train,] y.test <- y[-idx.train] # fit model cv.fit1 <- cv.PCLasso2(x = x.train, y = y.train, group = PC.Human, penalty = "grLasso", family = "binomial", nfolds = 10) # predict risk scores of samples in x.test s <- predict(object = cv.fit1, x = x.test, type="link", lambda=cv.fit1$cv.fit$lambda.min) # predict classes of samples in x.test s <- predict(object = cv.fit1, x = x.test, type="class", lambda=cv.fit1$cv.fit$lambda.min) # Nonzero coefficients sel.groups <- predict(object = cv.fit1, type="groups", lambda = cv.fit1$cv.fit$lambda.min) sel.ngroups <- predict(object = cv.fit1, type="ngroups", lambda = cv.fit1$cv.fit$lambda.min) sel.vars.unique <- predict(object = cv.fit1, type="vars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.nvars.unique <- predict(object = cv.fit1, type="nvars.unique", lambda = cv.fit1$cv.fit$lambda.min) sel.vars <- predict(object = cv.fit1, type="vars", lambda=cv.fit1$cv.fit$lambda.min) sel.nvars <- predict(object = cv.fit1, type="nvars", lambda=cv.fit1$cv.fit$lambda.min)
Similar to other predict methods, this function returns
predictions from a fitted PCLasso
object.
## S3 method for class 'PCLasso' predict( object, x = NULL, type = c("link", "response", "survival", "median", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
## S3 method for class 'PCLasso' predict( object, x = NULL, type = c("link", "response", "survival", "median", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
object |
Fitted |
x |
Matrix of values at which predictions are to be made. The features
(genes/proteins) contained in |
type |
Type of prediction: "link" returns the linear predictors; "response" gives the risk (i.e., exp(link)); "vars" returns the indices for the nonzero coefficients; "vars.unique" returns unique features (genes/proteins) with nonzero coefficients (If a feature belongs to multiple groups and multiple groups are selected, the feature will be repeatedly selected. Compared with "var", "var.unique" will filter out repeated features.); "groups" returns the groups with at least one nonzero coefficient; "nvars" returns the number of nonzero coefficients; "nvars.unique" returns the number of unique features (genes/proteins) with nonzero coefficients; "ngroups" returns the number of groups with at least one nonzero coefficient; "norm" returns the L2 norm of the coefficients in each group."survival" returns the estimated survival function; "median" estimates median survival times. |
lambda |
Values of the regularization parameter |
... |
Arguments to be passed to |
See predict.grpsurv
in the R package grpreg
for
details.
The object returned depends on type
.
# load data data(survivalData) data(PCGroups) x <- survivalData$Exp y <- survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train,] x.test <- x[-idx.train,] y.test <- y[-idx.train,] # fit PCLasso model fit.PCLasso <- PCLasso(x = x.train, y = y.train, group = PC.Human, penalty = "grLasso") # predict risk scores of samples in x.test s <- predict(object = fit.PCLasso, x = x.test, type="link", lambda=fit.PCLasso$fit$lambda) s <- predict(object = fit.PCLasso, x = x.test, type="link", lambda=fit.PCLasso$fit$lambda[10]) # Nonzero coefficients sel.groups <- predict(object = fit.PCLasso, type="groups", lambda = fit.PCLasso$fit$lambda) sel.ngroups <- predict(object = fit.PCLasso, type="ngroups", lambda = fit.PCLasso$fit$lambda) sel.vars.unique <- predict(object = fit.PCLasso, type="vars.unique", lambda = fit.PCLasso$fit$lambda) sel.nvars.unique <- predict(object = fit.PCLasso, type="nvars.unique", lambda = fit.PCLasso$fit$lambda) sel.vars <- predict(object = fit.PCLasso, type="vars", lambda=fit.PCLasso$fit$lambda) sel.nvars <- predict(object = fit.PCLasso, type="nvars", lambda=fit.PCLasso$fit$lambda) # For values of lambda not in the sequence of fitted models, # linear interpolation is used. sel.groups <- predict(object = fit.PCLasso, type="groups", lambda = c(0.1, 0.05)) sel.ngroups <- predict(object = fit.PCLasso, type="ngroups", lambda = c(0.1, 0.05)) sel.vars.unique <- predict(object = fit.PCLasso, type="vars.unique", lambda = c(0.1, 0.05)) sel.nvars.unique <- predict(object = fit.PCLasso, type="nvars.unique", lambda = c(0.1, 0.05)) sel.vars <- predict(object = fit.PCLasso, type="vars", lambda=c(0.1, 0.05)) sel.nvars <- predict(object = fit.PCLasso, type="nvars", lambda=c(0.1, 0.05))
# load data data(survivalData) data(PCGroups) x <- survivalData$Exp y <- survivalData$survData PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "EntrezID") set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train,] x.test <- x[-idx.train,] y.test <- y[-idx.train,] # fit PCLasso model fit.PCLasso <- PCLasso(x = x.train, y = y.train, group = PC.Human, penalty = "grLasso") # predict risk scores of samples in x.test s <- predict(object = fit.PCLasso, x = x.test, type="link", lambda=fit.PCLasso$fit$lambda) s <- predict(object = fit.PCLasso, x = x.test, type="link", lambda=fit.PCLasso$fit$lambda[10]) # Nonzero coefficients sel.groups <- predict(object = fit.PCLasso, type="groups", lambda = fit.PCLasso$fit$lambda) sel.ngroups <- predict(object = fit.PCLasso, type="ngroups", lambda = fit.PCLasso$fit$lambda) sel.vars.unique <- predict(object = fit.PCLasso, type="vars.unique", lambda = fit.PCLasso$fit$lambda) sel.nvars.unique <- predict(object = fit.PCLasso, type="nvars.unique", lambda = fit.PCLasso$fit$lambda) sel.vars <- predict(object = fit.PCLasso, type="vars", lambda=fit.PCLasso$fit$lambda) sel.nvars <- predict(object = fit.PCLasso, type="nvars", lambda=fit.PCLasso$fit$lambda) # For values of lambda not in the sequence of fitted models, # linear interpolation is used. sel.groups <- predict(object = fit.PCLasso, type="groups", lambda = c(0.1, 0.05)) sel.ngroups <- predict(object = fit.PCLasso, type="ngroups", lambda = c(0.1, 0.05)) sel.vars.unique <- predict(object = fit.PCLasso, type="vars.unique", lambda = c(0.1, 0.05)) sel.nvars.unique <- predict(object = fit.PCLasso, type="nvars.unique", lambda = c(0.1, 0.05)) sel.vars <- predict(object = fit.PCLasso, type="vars", lambda=c(0.1, 0.05)) sel.nvars <- predict(object = fit.PCLasso, type="nvars", lambda=c(0.1, 0.05))
Similar to other predict methods, this function returns
predictions from a fitted PCLasso2
object.
## S3 method for class 'PCLasso2' predict( object, x = NULL, type = c("link", "response", "class", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
## S3 method for class 'PCLasso2' predict( object, x = NULL, type = c("link", "response", "class", "norm", "coefficients", "vars", "nvars", "vars.unique", "nvars.unique", "groups", "ngroups"), lambda, ... )
object |
Fitted |
x |
Matrix of values at which predictions are to be made. The features
(genes/proteins) contained in |
type |
Type of prediction: "link" returns the linear predictors; "response" gives the risk (i.e., exp(link)); "class" returns the binomial outcome with the highest probability; "vars" returns the indices for the nonzero coefficients; "vars.unique" returns unique features (genes/proteins) with nonzero coefficients (If a feature belongs to multiple groups and multiple groups are selected, the feature will be repeatedly selected. Compared with "var", "var.unique" will filter out repeated features.); "groups" returns the groups with at least one nonzero coefficient; "nvars" returns the number of nonzero coefficients; "nvars.unique" returns the number of unique features (genes/proteins) with nonzero coefficients; "ngroups" returns the number of groups with at least one nonzero coefficient; "norm" returns the L2 norm of the coefficients in each group. |
lambda |
Values of the regularization parameter |
... |
Arguments to be passed to |
See predict.grpreg
in the R package grpreg
for details.
The object returned depends on type
.
# load data data(classData) data(PCGroups) x <- classData$Exp y <- classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train] x.test <- x[-idx.train,] y.test <- y[-idx.train] # fit PCLasso2 model fit.PCLasso2 <- PCLasso2(x = x.train, y = y.train, group = PC.Human, penalty = "grLasso", family = "binomial") # predict risk scores of samples in x.test s <- predict(object = fit.PCLasso2, x = x.test, type="link", lambda=fit.PCLasso2$fit$lambda) # predict classes of samples in x.test s <- predict(object = fit.PCLasso2, x = x.test, type="class", lambda=fit.PCLasso2$fit$lambda[10]) # Nonzero coefficients sel.groups <- predict(object = fit.PCLasso2, type="groups", lambda = fit.PCLasso2$fit$lambda) sel.ngroups <- predict(object = fit.PCLasso2, type="ngroups", lambda = fit.PCLasso2$fit$lambda) sel.vars.unique <- predict(object = fit.PCLasso2, type="vars.unique", lambda = fit.PCLasso2$fit$lambda) sel.nvars.unique <- predict(object = fit.PCLasso2, type="nvars.unique", lambda = fit.PCLasso2$fit$lambda) sel.vars <- predict(object = fit.PCLasso2, type="vars", lambda=fit.PCLasso2$fit$lambda) sel.nvars <- predict(object = fit.PCLasso2, type="nvars", lambda=fit.PCLasso2$fit$lambda) # For values of lambda not in the sequence of fitted models, # linear interpolation is used. sel.groups <- predict(object = fit.PCLasso2, type="groups", lambda = c(0.1, 0.05)) sel.ngroups <- predict(object = fit.PCLasso2, type="ngroups", lambda = c(0.1, 0.05)) sel.vars.unique <- predict(object = fit.PCLasso2, type="vars.unique", lambda = c(0.1, 0.05)) sel.nvars.unique <- predict(object = fit.PCLasso2, type="nvars.unique", lambda = c(0.1, 0.05)) sel.vars <- predict(object = fit.PCLasso2, type="vars", lambda=c(0.1, 0.05)) sel.nvars <- predict(object = fit.PCLasso2, type="nvars", lambda=c(0.1, 0.05))
# load data data(classData) data(PCGroups) x <- classData$Exp y <- classData$Label PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human", Type = "GeneSymbol") set.seed(20150122) idx.train <- sample(nrow(x), round(nrow(x)*2/3)) x.train <- x[idx.train,] y.train <- y[idx.train] x.test <- x[-idx.train,] y.test <- y[-idx.train] # fit PCLasso2 model fit.PCLasso2 <- PCLasso2(x = x.train, y = y.train, group = PC.Human, penalty = "grLasso", family = "binomial") # predict risk scores of samples in x.test s <- predict(object = fit.PCLasso2, x = x.test, type="link", lambda=fit.PCLasso2$fit$lambda) # predict classes of samples in x.test s <- predict(object = fit.PCLasso2, x = x.test, type="class", lambda=fit.PCLasso2$fit$lambda[10]) # Nonzero coefficients sel.groups <- predict(object = fit.PCLasso2, type="groups", lambda = fit.PCLasso2$fit$lambda) sel.ngroups <- predict(object = fit.PCLasso2, type="ngroups", lambda = fit.PCLasso2$fit$lambda) sel.vars.unique <- predict(object = fit.PCLasso2, type="vars.unique", lambda = fit.PCLasso2$fit$lambda) sel.nvars.unique <- predict(object = fit.PCLasso2, type="nvars.unique", lambda = fit.PCLasso2$fit$lambda) sel.vars <- predict(object = fit.PCLasso2, type="vars", lambda=fit.PCLasso2$fit$lambda) sel.nvars <- predict(object = fit.PCLasso2, type="nvars", lambda=fit.PCLasso2$fit$lambda) # For values of lambda not in the sequence of fitted models, # linear interpolation is used. sel.groups <- predict(object = fit.PCLasso2, type="groups", lambda = c(0.1, 0.05)) sel.ngroups <- predict(object = fit.PCLasso2, type="ngroups", lambda = c(0.1, 0.05)) sel.vars.unique <- predict(object = fit.PCLasso2, type="vars.unique", lambda = c(0.1, 0.05)) sel.nvars.unique <- predict(object = fit.PCLasso2, type="nvars.unique", lambda = c(0.1, 0.05)) sel.vars <- predict(object = fit.PCLasso2, type="vars", lambda=c(0.1, 0.05)) sel.nvars <- predict(object = fit.PCLasso2, type="nvars", lambda=c(0.1, 0.05))
A dataset for prognostic model
survivalData
survivalData
A list containing a protein expression matrix and survival data
a protein expression matrix
Survival data. The first column is the time on study (follow up time); the second column is a binary variable with 1 indicating that the event has occurred and 0 indicating right censoring.