| Title: | Exploratory Analysis with the Singular Value Decomposition |
|---|---|
| Description: | A variety of descriptive multivariate analyses with the singular value decomposition, such as principal components analysis, correspondence analysis, and multidimensional scaling. See An ExPosition of the Singular Value Decomposition in R (Beaton et al 2014) <doi:10.1016/j.csda.2013.11.006>. |
| Authors: | Derek Beaton [aut, cre], Cherise R. Chin Fatt [aut], Herve Abdi [aut] |
| Maintainer: | Derek Beaton <[email protected]> |
| License: | GPL-2 |
| Version: | 2.11.0 |
| Built: | 2026-06-02 10:27:49 UTC |
| Source: | https://github.com/derekbeaton/exposition1 |
Exposition is defined as a comprehensive explanation of an idea. With
ExPosition for R, a comprehensive explanation of your data will be provided
with minimal effort.
The core of ExPosition is the singular value
decomposition (SVD; see: svd). The point of ExPosition is
simple: to provide the user with an overview of their data that only the SVD
can provide. ExPosition includes several techniques that depend on the SVD
(see below for examples and functions).
Questions, comments, compliments, and complaints go to Derek Beaton
[email protected].
The following people are authors or contributors to ExPosition code, data,
or examples:
Derek Beaton, Hervé Abdi, Cherise Chin-Fatt, Joseph Dunlop,
Jenny Rieck, Rachel Williams, Anjali Krishnan, and Francesca M. Filbey.
Abdi, H., and Williams, L.J. (2010). Principal component
analysis. Wiley Interdisciplinary Reviews: Computational Statistics,
2, 433-459.
Abdi, H. and Williams, L.J. (2010). Correspondence analysis.
In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of
Research Design. Thousand Oaks (CA): Sage. pp. 267-278.
Abdi, H. (2007).
Singular Value Decomposition (SVD) and Generalized Singular Value
Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of
Measurement and Statistics.Thousand Oaks (CA): Sage. pp. 907-912.
Abdi,
H. (2007). Metric multidimensional scaling. In N.J. Salkind (Ed.):
Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage.
pp. 598-605.
Greenacre, M. J. (2007). Correspondence Analysis in
Practice. Chapman and Hall.
Benzécri, J. P. (1979). Sur le calcul
des taux d'inertie dans l'analyse d'un questionnaire. Cahiers de
l'Analyse des Données, 4, 377-378.
acknowledgements returns a list of people who have contributed to
ExPosition.
acknowledgements()acknowledgements()
A list of people who have contributed something beyond code to the ExPosition family of packages.
Derek Beaton
How six authors use 3 different types of puncatuation throughout their writing.
data(authors)data(authors)
authors$ca$data: Six authors (rows) and the frequency of three
puncutuations (columns). For use with epCA.
authors$mca$data: A Burt table reformatting of the $ca$data. For use with
epMCA.
Brunet, E. (1989). Faut-il ponderer les donnees linguistiques.
CUMFID, 16, 39-50.
Abdi, H., and Williams, L.J. (2010). Principal
component analysis. Wiley Interdisciplinary Reviews: Computational
Statistics, 2, 433-459.
Abdi, H., and Williams, L.J. (2010).
Correspondence analysis. In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.):
Encyclopedia of Research Design. Thousand Oaks (CA): Sage. pp.
267-278.
This data should be used for discriminant analyses or analyses where the group information is important.
data(bada.wine)data(bada.wine)
bada.wine$data: Data matrix with twelve wines (rows) from 3 regions
with 18 attributes (columns).
bada.wine$design: Design matrix with twelve
wines (rows) with 3 regions (columns) to indicate group relationship of the
data matrix.
Abdi, H. and Williams, L.J. (2010). Barycentric discriminant analysis (BADIA). In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of Research Design. Thousand Oaks (CA): Sage. pp. 64-75.
Tasting notes, preferences, breweries and styles of 38 different craft beers from various breweries, across various styles.
data(beer.tasting.notes)data(beer.tasting.notes)
beer.tasting.notes$data: Data matrix. Tasting notes (ratings) of 38
different beers (rows) described by 16 different flavor profiles
(columns).
beer.tasting.notes$brewery.design: Design matrix. Source
brewery of 38 different beers (rows) across 26 breweries (columns).
beer.tasting.notes$style.design: Design matrix. Style of 38 different beers
(rows) across 20 styles (columns) (styles as listed from Beer Advocate
website).
beer.tasting.notes$sup.data: Supplementary data matrix. ABV
and overall preference ratings of 38 beers described by two features (ABV &
overall) in original value and rounded value.
Jenny Rieck and Derek Beaton laboriously “collected” these data for “experimental purposes”.
http://www.beeradvocate.com
Ten assessors perform a free-sorting task to sort eight beers into groups.
data(beers2007)data(beers2007)
beer2007$data: A data matrix with 8 rows (beers) described by 10 assessors (columns).
Abdi, H., Valentin, D., Chollet, S., & Chrea, C. (2007). Analyzing assessors and products in sorting tasks: DISTATIS, theory and applications. Food Quality and Preference, 627-640.
Calculates constraints for plotting data..
calculateConstraints(results,x_axis=1,y_axis=2,constraints=NULL)calculateConstraints(results,x_axis=1,y_axis=2,constraints=NULL)
results |
results from ExPosition (i.e., $ExPosition.Data) |
x_axis |
which component should be on the x axis? |
y_axis |
which component should be on the y axis? |
constraints |
if available, axis constraints for the plots (determines end points of the plots). |
Returns a list with the following items:
$constraints |
axis constraints for the plots (determines end points of the plots). |
Derek Beaton
Performs all steps required for CA processing (row profile approach).
caNorm(X, X_dimensions, colTotal, rowTotal, grandTotal, weights = NULL, masses = NULL)caNorm(X, X_dimensions, colTotal, rowTotal, grandTotal, weights = NULL, masses = NULL)
X |
Data matrix |
X_dimensions |
The dimensions of X in a vector of length 2 (rows,
columns). See |
colTotal |
Vector of column sums. |
rowTotal |
Vector of row sums. |
grandTotal |
Grand total of X |
weights |
Optional weights to include for the columns. |
masses |
Optional masses to include for the rows. |
rowCenter |
The barycenter of X. |
masses |
Masses to be used for the GSVD. |
weights |
Weights to be used for the GSVD. |
rowProfiles |
The row profiles of X. |
deviations |
Deviations of
row profiles from |
Derek Beaton
CA preprocessing for data. Can be performed on rows or columns of your data. This is a row-profile normalization.
caSupplementalElementsPreProcessing(SUP.DATA)caSupplementalElementsPreProcessing(SUP.DATA)
SUP.DATA |
Data that will be supplemental. Row profile normalization is
used. For supplemental rows use |
returns a matrix that is preprocessed for supplemental projections.
Derek Beaton
mdsSupplementalElementsPreProcessing,
pcaSupplementaryColsPreProcessing,
pcaSupplementaryRowsPreProcessing,
hellingerSupplementaryColsPreProcessing,
hellingerSupplementaryRowsPreProcessing,
supplementaryCols, supplementaryRows,
supplementalProjection, rowNorms
Performs a chi-square distance. Primarily used for epMDS.
chi2Dist(X)chi2Dist(X)
X |
Compute chi-square distances between row items. |
D |
Distance matrix for |
MW |
a list of masses and weights. Weights not used in MDS. |
Hervé Abdi
One coffee from Oak Cliff roasters (Dallas, TX) was used in this experiment. Honduran source with a medium roast. The coffee was brewed in two ways and served in two ways (i.e., a 2x2 design). Two batches each of coffee were brewed at 180 degrees (Hot) Farenheit or at room temperature (Cold). One of each was served cold or heated back up to 180 degrees (Hot).
data(coffee.data)data(coffee.data)
coffee.data$preferences: Ten participants indicated if they liked a
particular serving or not.
coffee.data$ratings: Ten participants
indicated on a scale of 0-2 the presence of particular flavors. In an array
format.
Flavor profiles measured: Salty, Spice Cabinet, Sweet, Bittery, and Nutty.
Computes masses and weights for use.
computeMW(DATA, masses = NULL, weights = NULL)computeMW(DATA, masses = NULL, weights = NULL)
DATA |
original data; will be used to compute masses and weights if none are provided. |
masses |
a vector or (diagonal) matrix of masses for the row items. If NULL (default), masses are computed as 1/# of rows |
weights |
a vector or (diagonal) matrix of weights for the column items. If NULL (default), weights are computed as 1/# of columns |
Returns a list with the following items:
M |
a diagonal matrix of masses (if too large, a vector is returned). |
W |
a diagonal matrix of weights (if too large, a vector is returned). |
Derek Beaton
coreCA performs the core of correspondence analysis (CA), multiple correspondence analysis (MCA) and related techniques.
coreCA(DATA, masses = NULL, weights = NULL, hellinger = FALSE, symmetric = TRUE, decomp.approach = 'svd', k = 0)coreCA(DATA, masses = NULL, weights = NULL, hellinger = FALSE, symmetric = TRUE, decomp.approach = 'svd', k = 0)
DATA |
original data to decompose and analyze via the singular value decomposition. |
masses |
a vector or diagonal matrix with masses for the rows (observations). If NULL, one is created or the plain SVD is used. |
weights |
a vector or diagonal matrix with weights for the columns (measures). If NULL, one is created or the plain SVD is used. |
hellinger |
a boolean. If FALSE (default), Chi-square distance will be used. If TRUE, Hellinger distance will be used. |
symmetric |
a boolean. If TRUE (default) symmetric factor scores for rows and columns are computed. If FALSE, the simplex (column-based) will be returned. |
decomp.approach |
string. A switch for different decompositions
(typically for speed). See |
k |
number of components to return (this is not a rotation, just an a priori selection of how much data should be returned). |
This function should not be used directly. Please use epCA or
epMCA unless you plan on writing extensions to ExPosition. Any
extensions wherein CA is the primary analysis should use coreCA.
Returns a large list of items which are also returned in
epCA and epMCA (the help files for those
functions will refer to this as well).
All items with a letter followed
by an i are for the I rows of a DATA matrix. All items with a
letter followed by an j are for the J rows of a DATA
matrix.
fi |
factor scores for the row items. |
di |
square distances of the row items. |
ci |
contributions (to the variance) of the row items. |
ri |
cosines of the row items. |
fj |
factor scores for the column items. |
dj |
square distances of the column items. |
cj |
contributions (to the variance) of the column items. |
rj |
cosines of the column items. |
t |
the percent of explained variance per component (tau). |
eigs |
the eigenvalues from the decomposition. |
pdq |
the set of left singular vectors (pdq$p) for the rows, singular values (pdq$Dv and pdq$Dd), and the set of right singular vectors (pdq$q) for the columns. |
M |
a column-vector or diagonal matrix of masses (for the rows) |
W |
a column-vector or diagonal matrix of weights (for the columns) |
c |
a centering vector (for the columns). |
X |
the final matrix that was decomposed (includes scaling, centering, masses, etc...). |
hellinger |
a boolean. TRUE if Hellinger distance was used. |
symmetric |
a boolean. FALSE if asymmetric factor scores should be computed. |
Derek Beaton and Hervé Abdi.
Abdi, H., and Williams, L.J. (2010). Principal component
analysis. Wiley Interdisciplinary Reviews: Computational Statistics,
2, 433-459.
Abdi, H., and Williams, L.J. (2010). Correspondence analysis.
In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of
Research Design. Thousand Oaks (CA): Sage. pp. 267-278.
Abdi, H. (2007).
Singular Value Decomposition (SVD) and Generalized Singular Value
Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of
Measurement and Statistics.Thousand Oaks (CA): Sage. pp. 907-912.
Greenacre, M. J. (2007). Correspondence Analysis in Practice. Chapman
and Hall.
coreMDS performs metric multidimensional scaling (MDS).
coreMDS(DATA, masses = NULL, decomp.approach = 'svd', k = 0)coreMDS(DATA, masses = NULL, decomp.approach = 'svd', k = 0)
DATA |
original data to decompose and analyze via the singular value decomposition. |
masses |
a vector or diagonal matrix with masses for the rows (observations). If NULL, one is created. |
decomp.approach |
string. A switch for different decompositions
(typically for speed). See |
k |
number of components to return (this is not a rotation, just an a priori selection of how much data should be returned). |
epMDS should not be used directly unless you plan on writing
extensions to ExPosition. See epMDS
Returns a large list of items which are also returned in
epMDS.
All items with a letter followed by an i are
for the I rows of a DATA matrix. All items with a letter followed by
an j are for the J rows of a DATA matrix.
fi |
factor scores for the row items. |
di |
square distances of the row items. |
ci |
contributions (to the variance) of the row items. |
ri |
cosines of the row items. |
masses |
a column-vector or diagonal matrix of masses (for the rows) |
t |
the percent of explained variance per component (tau). |
eigs |
the eigenvalues from the decomposition. |
pdq |
the set of left singular vectors (pdq$p) for the rows, singular values (pdq$Dv and pdq$Dd), and the set of right singular vectors (pdq$q) for the columns. |
X |
the final matrix that was decomposed (includes scaling, centering, masses, etc...). |
Derek Beaton and Hervé Abdi.
Abdi, H. (2007). Metric multidimensional scaling. In N.J.
Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand
Oaks (CA): Sage. pp. 598-605.
O'Toole, A. J., Jiang, F., Abdi, H., and
Haxby, J. V. (2005). Partially distributed representations of objects and
faces in ventral temporal cortex. Journal of Cognitive Neuroscience,
17(4), 580-590.
corePCA performs the core of principal components analysis (PCA), and related techniques.
corePCA(DATA, M = NULL, W = NULL, decomp.approach = 'svd', k = 0)corePCA(DATA, M = NULL, W = NULL, decomp.approach = 'svd', k = 0)
DATA |
original data to decompose and analyze via the singular value decomposition. |
M |
a vector or diagonal matrix with masses for the rows (observations). If NULL, one is created or the plain SVD is used. |
W |
a vector or diagonal matrix with weights for the columns (measures). If NULL, one is created or the plain SVD is used. |
decomp.approach |
string. A switch for different decompositions
(typically for speed). See |
k |
number of components to return (this is not a rotation, just an a priori selection of how much data should be returned). |
This function should not be used directly. Please use epPCA
unless you plan on writing extensions to ExPosition.
Returns a large list of items which are also returned in
epPCA (the help files for those functions will refer to this
as well).
All items with a letter followed by an i are for the
I rows of a DATA matrix. All items with a letter followed by an
j are for the J rows of a DATA matrix.
fi |
factor scores for the row items. |
di |
square distances of the row items. |
ci |
contributions (to the variance) of the row items. |
ri |
cosines of the row items. |
fj |
factor scores for the column items. |
dj |
square distances of the column items. |
cj |
contributions (to the variance) of the column items. |
rj |
cosines of the column items. |
t |
the percent of explained variance per component (tau). |
eigs |
the eigenvalues from the decomposition. |
pdq |
the set of left singular vectors (pdq$p) for the rows, singular values (pdq$Dv and pdq$Dd), and the set of right singular vectors (pdq$q) for the columns. |
X |
the final matrix that was decomposed (includes scaling, centering, masses, etc...). |
Derek Beaton and Hervé Abdi.
Abdi, H., and Williams, L.J. (2010). Principal component
analysis. Wiley Interdisciplinary Reviews: Computational Statistics,
2, 433-459.
Abdi, H. (2007). Singular Value Decomposition (SVD) and
Generalized Singular Value Decomposition (GSVD). In N.J. Salkind (Ed.):
Encyclopedia of Measurement and Statistics.Thousand Oaks (CA): Sage.
pp. 907-912.
Creates a default design matrix, wherein all observations (i.e., row items) are in the same group.
createDefaultDesign(DATA)createDefaultDesign(DATA)
DATA |
original data that requires a design matrix |
DESIGN |
a column-vector matrix to indicate that all observations are in the same group. |
Derek Beaton
Checks and/or creates a dummy-coded design matrix.
designCheck(DATA, DESIGN = NULL, make_design_nominal = TRUE)designCheck(DATA, DESIGN = NULL, make_design_nominal = TRUE)
DATA |
original data that should be matched to a design matrix |
DESIGN |
a column vector with levels for observations or a dummy-coded matrix |
make_design_nominal |
a boolean. Will make DESIGN nominal if TRUE (default). |
Returns a properly formatted, dummy-coded (or disjunctive coding) design matrix.
DESIGN |
dummy-coded design matrix |
Derek Beaton
data <- iris[,c(1:4)] design <- as.matrix(iris[,c('Species')]) iris.design <- designCheck(data,DESIGN=design,make_design_nominal=TRUE)data <- iris[,c(1:4)] design <- as.matrix(iris[,c('Species')]) iris.design <- designCheck(data,DESIGN=design,make_design_nominal=TRUE)
Conversational data from Alzheimer's Patient-Spouse Dyads.
data(dica.ad)data(dica.ad)
dica.ad$data: Seventeen dyads described by 58 variables.
dica.ad$design: Seventeen dyads that belong to three groups.
Williams, L.J., Abdi, H., French, R., & Orange, J.B. (2010). A tutorial on Multi-Block Discriminant Correspondence Analysis (MUDICA): A new method for analyzing discourse data from clinical populations. Journal of Speech Language and Hearing Research, 53, 1372-1393.
This data should be used for discriminant analyses or analyses where the group information is important.
data(dica.wine)data(dica.wine)
dica.wine$data: Data matrix with twelve wines (rows) from 3 regions
with 16 attributes (columns) in disjunctive (0/1) coding.
dica.wine$design: Design matrix with twelve wines (rows) with 3 regions
(columns) to indicate group relationship of the data matrix.
Abdi, H. (2007). Discriminant correspondence analysis. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 270-275.
The world famous Fisher's iris set: 150 flowers from 3 species with 4 attributes.
data(ep.iris)data(ep.iris)
ep.iris$data: Data matrix with 150 flowers (rows) from 3 species
with 4 attributes (columns) describing sepal and petal features.
ep.iris$design: Design matrix with 150 flowers (rows) with 3 species
(columns) indicating which flower belongs to which species.
http://en.wikipedia.org/wiki/Iris_flower_data_set
Correspondence Analysis (CA) via ExPosition.
epCA(DATA, DESIGN = NULL, make_design_nominal = TRUE, masses = NULL, weights = NULL, hellinger = FALSE, symmetric = TRUE, graphs = TRUE, k = 0)epCA(DATA, DESIGN = NULL, make_design_nominal = TRUE, masses = NULL, weights = NULL, hellinger = FALSE, symmetric = TRUE, graphs = TRUE, k = 0)
DATA |
original data to perform a CA on. |
DESIGN |
a design matrix to indicate if rows belong to groups. |
make_design_nominal |
a boolean. If TRUE (default), DESIGN is a vector that indicates groups (and will be dummy-coded). If FALSE, DESIGN is a dummy-coded matrix. |
masses |
a diagonal matrix or column-vector of masses for the row items. |
weights |
a diagonal matrix or column-vector of weights for the column it |
hellinger |
a boolean. If FALSE (default), Chi-square distance will be used. If TRUE, Hellinger distance will be used. |
symmetric |
a boolean. If TRUE (default) symmetric factor scores for rows and columns are computed. If FALSE, the simplex (column-based) will be returned. |
graphs |
a boolean. If TRUE (default), graphs and plots are provided
(via |
k |
number of components to return. |
epCA performs correspondence analysis. Essentially, a PCA for
qualitative data (frequencies, proportions). If you decide to use Hellinger
distance, it is best to set symmetric to FALSE.
See coreCA for details on what is returned.
Derek Beaton
Abdi, H., and Williams, L.J. (2010). Principal component
analysis. Wiley Interdisciplinary Reviews: Computational Statistics,
2, 433-459.
Abdi, H., and Williams, L.J. (2010). Correspondence analysis.
In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of
Research Design. Thousand Oaks (CA): Sage. pp. 267-278.
Abdi, H. (2007).
Singular Value Decomposition (SVD) and Generalized Singular Value
Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of
Measurement and Statistics.Thousand Oaks (CA): Sage. pp. 907-912.
Greenacre, M. J. (2007). Correspondence Analysis in Practice. Chapman
and Hall.
data(authors) ca.authors.res <- epCA(authors$ca$data)data(authors) ca.authors.res <- epCA(authors$ca$data)
ExPosition plotting function which is an interface to
prettyGraphs.
epGraphs(res, x_axis = 1, y_axis = 2, epPlotInfo = NULL, DESIGN=NULL, fi.col = NULL, fi.pch = NULL, fj.col = NULL, fj.pch = NULL, col.offset = NULL, constraints = NULL, xlab = NULL, ylab = NULL, main = NULL, contributionPlots = TRUE, correlationPlotter = TRUE, graphs = TRUE)epGraphs(res, x_axis = 1, y_axis = 2, epPlotInfo = NULL, DESIGN=NULL, fi.col = NULL, fi.pch = NULL, fj.col = NULL, fj.pch = NULL, col.offset = NULL, constraints = NULL, xlab = NULL, ylab = NULL, main = NULL, contributionPlots = TRUE, correlationPlotter = TRUE, graphs = TRUE)
res |
results from ExPosition |
x_axis |
which component should be on the x axis? |
y_axis |
which component should be on the y axis? |
epPlotInfo |
A list ( |
DESIGN |
A design matrix to apply colors (by pallete selection) to row items |
fi.col |
A matrix of colors for the row items. If NULL, colors will be selected. |
fi.pch |
A matrix of pch values for the row items. If NULL, pch values are all 21. |
fj.col |
A matrix of colors for the column items. If NULL, colors will be selected. |
fj.pch |
A matrix of pch values for the column items. If NULL, pch values are all 21. |
col.offset |
A numeric offset value. Is passed to
|
constraints |
Plot constraints as returned from
|
xlab |
x axis label |
ylab |
y axis label |
main |
main label for the graph window |
contributionPlots |
a boolean. If TRUE (default), contribution bar plots will be created. |
correlationPlotter |
a boolean. If TRUE (default), a correlation circle plot will be created. Applies to PCA family of methods (CA is excluded for now). |
graphs |
a boolean. If TRUE, graphs are created. If FALSE, only data associated to plotting (e.g., constraints, colors) are returned. |
epGraphs is an interface between ExPosition and
prettyGraphs.
The following items are bundled inside of $Plotting.Data:
$fi.col |
the colors that are associated to the row items ($fi). |
$fi.pch |
the pch values associated to the row items ($fi). |
$fj.col |
the colors that are associated to the column items ($fj). |
$fj.pch |
the pch values associated to the column items ($fj). |
$constraints |
axis constraints for the plots (determines end points of the plots). |
Derek Beaton
#this is for ExPosition's iris data data(ep.iris) pca.iris.res <- epPCA(ep.iris$data) #this will put plotting data into a new variable. epGraphs.2.and.3 <- epGraphs(pca.iris.res,x_axis=2,y_axis=3)#this is for ExPosition's iris data data(ep.iris) pca.iris.res <- epPCA(ep.iris$data) #this will put plotting data into a new variable. epGraphs.2.and.3 <- epGraphs(pca.iris.res,x_axis=2,y_axis=3)
Multiple Correspondence Analysis (MCA) via ExPosition.
epMCA(DATA, make_data_nominal = TRUE, DESIGN = NULL, make_design_nominal = TRUE, masses = NULL, weights = NULL, hellinger = FALSE, symmetric = TRUE, correction = c("b"), graphs = TRUE, k = 0)epMCA(DATA, make_data_nominal = TRUE, DESIGN = NULL, make_design_nominal = TRUE, masses = NULL, weights = NULL, hellinger = FALSE, symmetric = TRUE, correction = c("b"), graphs = TRUE, k = 0)
DATA |
original data to perform a MCA on. This data can be in original formatting (qualitative levels) or in dummy-coded variables. |
make_data_nominal |
a boolean. If TRUE (default), DATA is recoded as a dummy-coded matrix. If FALSE, DATA is a dummy-coded matrix. |
DESIGN |
a design matrix to indicate if rows belong to groups. |
make_design_nominal |
a boolean. If TRUE (default), DESIGN is a vector that indicates groups (and will be dummy-coded). If FALSE, DESIGN is a dummy-coded matrix. |
masses |
a diagonal matrix or column-vector of masses for the row items. |
weights |
a diagonal matrix or column-vector of weights for the column it |
hellinger |
a boolean. If FALSE (default), Chi-square distance will be used. If TRUE, Hellinger distance will be used. |
symmetric |
a boolean. If TRUE symmetric factor scores for rows. |
correction |
which corrections should be applied? "b" = Benzécri correction, "bg" = Greenacre adjustment to Benzécri correction. |
graphs |
a boolean. If TRUE (default), graphs and plots are provided
(via |
k |
number of components to return. |
epMCA performs multiple correspondence analysis. Essentially, a CA
for categorical data.
It should be noted that when hellinger is
selected as TRUE, no correction will be performed. Additionally, if you
decide to use Hellinger, it is best to set symmetric to FALSE.
See coreCA for details on what is returned. In
addition to the values returned:
$pdq |
this is the corrected SVD data, if a correction was selected. If no correction was selected, it is uncorrected. |
$pdq.uncor |
uncorrected SVD data. |
Derek Beaton
Abdi, H., and Williams, L.J. (2010). Principal component
analysis. Wiley Interdisciplinary Reviews: Computational Statistics,
2, 433-459.
Abdi, H., and Williams, L.J. (2010). Correspondence analysis.
In N.J. Salkind, D.M., Dougherty, & B. Frey (Eds.): Encyclopedia of
Research Design. Thousand Oaks (CA): Sage. pp. 267-278.
Abdi, H. (2007).
Singular Value Decomposition (SVD) and Generalized Singular Value
Decomposition (GSVD). In N.J. Salkind (Ed.): Encyclopedia of
Measurement and Statistics.Thousand Oaks (CA): Sage. pp. 907-912.
Benzécri, J. P. (1979). Sur le calcul des taux d'inertie dans l'analyse d'un
questionnaire. Cahiers de l'Analyse des Données, 4,
377-378.
Greenacre, M. J. (2007). Correspondence Analysis in Practice.
Chapman and Hall.
data(mca.wine) mca.wine.res <- epMCA(mca.wine$data)data(mca.wine) mca.wine.res <- epMCA(mca.wine$data)
Multidimensional Scaling (MDS) via ExPosition.
epMDS(DATA, DATA_is_dist = TRUE, method="euclidean", DESIGN = NULL, make_design_nominal = TRUE, masses = NULL, graphs = TRUE, k = 0)epMDS(DATA, DATA_is_dist = TRUE, method="euclidean", DESIGN = NULL, make_design_nominal = TRUE, masses = NULL, graphs = TRUE, k = 0)
DATA |
original data to perform a MDS on. |
DATA_is_dist |
a boolean. If TRUE (default) the DATA matrix should be a symmetric distance matrix. If FALSE, a Euclidean distance of row items will be computed and used. |
method |
which distance metric should be used. |
DESIGN |
a design matrix to indicate if rows belong to groups. |
make_design_nominal |
a boolean. If TRUE (default), DESIGN is a vector that indicates groups (and will be dummy-coded). If FALSE, DESIGN is a dummy-coded matrix. |
masses |
a diagonal matrix (or vector) that contains the masses (for the row items). |
graphs |
a boolean. If TRUE (default), graphs and plots are provided
(via |
k |
number of components to return. |
epMDS performs metric multi-dimensional scaling. Essentially, a PCA
for a symmetric distance matrix.
See coreMDS for details on what is returned. epMDS
only returns values related to row items (e.g., fi, ci); no column data is
returned.
D |
the distance matrix that was decomposed. In most cases, it is returned as a squared distance. |
With respect to input of DATA, epMDS differs slightly
from other versions of multi-dimensional scaling.
If you provide a
rectangular matrix (e.g., observations x measures), epMDS will
compute a distance matrix and square it.
If you provide a distance
(dissimilarity) matrix, epMDS does not square it.
Derek Beaton
Abdi, H. (2007). Metric multidimensional scaling. In N.J.
Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand
Oaks (CA): Sage. pp. 598-605.
O'Toole, A. J., Jiang, F., Abdi, H., and
Haxby, J. V. (2005). Partially distributed representations of objects and
faces in ventral temporal cortex. Journal of Cognitive Neuroscience,
17(4), 580-590.
data(jocn.2005.fmri) #by default, components 1 and 2 will be plotted. mds.res.images <- epMDS(jocn.2005.fmri$images$data) ##iris example data(ep.iris) iris.rectangular <- epMDS(ep.iris$data,DATA_is_dist=FALSE) iris.euc.dist <- dist(ep.iris$data,upper=TRUE,diag=TRUE) iris.sq.euc.dist <- as.matrix(iris.euc.dist^2) iris.sq <- epMDS(iris.sq.euc.dist)data(jocn.2005.fmri) #by default, components 1 and 2 will be plotted. mds.res.images <- epMDS(jocn.2005.fmri$images$data) ##iris example data(ep.iris) iris.rectangular <- epMDS(ep.iris$data,DATA_is_dist=FALSE) iris.euc.dist <- dist(ep.iris$data,upper=TRUE,diag=TRUE) iris.sq.euc.dist <- as.matrix(iris.euc.dist^2) iris.sq <- epMDS(iris.sq.euc.dist)
Principal Component Analysis (PCA) via ExPosition.
epPCA(DATA, scale = TRUE, center = TRUE, DESIGN = NULL, make_design_nominal = TRUE, graphs = TRUE, k = 0)epPCA(DATA, scale = TRUE, center = TRUE, DESIGN = NULL, make_design_nominal = TRUE, graphs = TRUE, k = 0)
DATA |
original data to perform a PCA on. |
scale |
a boolean, vector, or string. See |
center |
a boolean, vector, or string. See |
DESIGN |
a design matrix to indicate if rows belong to groups. |
make_design_nominal |
a boolean. If TRUE (default), DESIGN is a vector that indicates groups (and will be dummy-coded). If FALSE, DESIGN is a dummy-coded matrix. |
graphs |
a boolean. If TRUE (default), graphs and plots are provided
(via |
k |
number of components to return. |
epPCA performs principal components analysis on a data matrix.
See corePCA for details on what is returned.
Derek Beaton
Abdi, H., and Williams, L.J. (2010). Principal component
analysis. Wiley Interdisciplinary Reviews: Computational Statistics,
2, 433-459.
Abdi, H. (2007). Singular Value Decomposition (SVD) and
Generalized Singular Value Decomposition (GSVD). In N.J. Salkind (Ed.):
Encyclopedia of Measurement and Statistics.Thousand Oaks (CA): Sage.
pp. 907-912.
data(words) pca.words.res <- epPCA(words$data)data(words) pca.words.res <- epPCA(words$data)
expo.scale is a more elaborate, and complete, version of
scale. Several text options are available, but more
importantly, the center and scale factors are always returned.
expo.scale(DATA, center = TRUE, scale = TRUE)expo.scale(DATA, center = TRUE, scale = TRUE)
DATA |
Data to center, scale, or both. |
center |
boolean, or (numeric) vector. If boolean or vector, it works
just as |
scale |
boolean, text, or (numeric) vector. If boolean or vector, it
works just as |
A data matrix that is scaled with the following attributes
(see scale):
$`scaled:center` |
The center of the data. If no center is provided, all 0s will be returned. |
$`scaled:scale` |
The scale factor of the data. If no scale is provided, all 1s will be returned. |
Derek Beaton
Four algorithms compared using a distance matrix between six faces.
data(faces2005)data(faces2005)
faces2005$data: A data structure representing a distance matrix (6X6) for four algorithms.
Abdi, H., & Valentin, D. (2007). DISTATIS: the analysis of multiple distance matrices. Encyclopedia of Measurement and Statistics. 284-290.
This data should be used with epPCA
data(french.social)data(french.social)
french.social$data: Data matrix with twelve families (rows) with 7 attributes (columns) describing what they spend their income on.
Lebart, L., and Fénelon, J.P. (1975) Statistique et
informatique appliquées. Paris: Dunod
Abdi, H., and Williams, L.J.
(2010). Principal component analysis. Wiley Interdisciplinary Reviews:
Computational Statistics, 2, 433-459.
genPDQ performs the SVD and GSVD for all methods in
ExPosition.
genPDQ(datain, M = NULL, W = NULL, is.mds = FALSE, decomp.approach = "svd", k = 0)genPDQ(datain, M = NULL, W = NULL, is.mds = FALSE, decomp.approach = "svd", k = 0)
datain |
fully preprocessed data to be decomposed. |
M |
vector of masses (for the rows) |
W |
vector of weights (for the columns) |
is.mds |
a boolean. If the method is of MDS (e.g.,
|
decomp.approach |
a string. Allows for the user to choose which decomposition method to perform. Current options are SVD or Eigen. |
k |
number of components to return (this is not a rotation, just an a priori selection of how much data should be returned). |
This function should only be used to create new methods based on the SVD or GSVD.
Data of class epSVD which is a list of matrices and
vectors:
P |
The left singular vectors (rows). |
Q |
The right singular vectors (columns). |
Dv |
Vector of the singular values. |
Dd |
Diagonal matrix of the singular values. |
ng |
Number of singular values/vectors |
rank |
Rank of the decomposed matrix. If it is 1, 0s are padded to the above items for plotting purposes. |
tau |
Explained variance per component |
Derek Beaton
A collection of beer tasting notes of 9 beers, across 16 descriptors, from 4 untrained assessors.
data(great.beer.tasting.1)data(great.beer.tasting.1)
great.beer.tasting.1$data: Data matrix (cube). Tasting notes
(ratings) of 9 different beers (rows) described by 16 different flavor
profiles (columns) by 4 untrained assessors. Thes data contain NAs and must
be imputed or adjusted before an analysis is performed.
great.beer.tasting.1$brewery.design: Design matrix. Source brewery of 9
different beers (rows) across 5 breweries (columns).
great.beer.tasting.1$flavor: Design matrix. Intended prominent flavor of 9
different beers (rows) across 3 flavor profiles (columns).
Rachel Williams, Jenny Rieck and Derek Beaton recoded, collected data and/or “ran the experiment”.
A collection of beer tasting notes of 13 beers, across 15 descriptors, from 9 untrained assessors.
data(great.beer.tasting.2)data(great.beer.tasting.2)
great.beer.tasting.2$data: Data matrix (cube). Tasting notes
(ratings) of 13 different beers (rows) described by 15 different flavor
profiles (columns) by 9 untrained assessors. All original values were on an
interval scale of 0-5. Any decimal values are imputed from alternate data
sources or additional assessors.
great.beer.tasting.2$brewery.design:
Design matrix. Source brewery of 13 different beers (rows) across 13
breweries (columns).
great.beer.tasting.2$style.design: Design matrix.
Style of 13 different beers (rows) across 8 styles (columns). Some complex
styles were truncated.
Rachel Williams, Jenny Rieck and Derek Beaton recoded, collected data and/or “ran the experiment”.
Performs all steps required for Hellinger form of CA processing (row profile approach).
hellingerNorm(X, X_dimensions, colTotal, rowTotal, grandTotal, weights = NULL, masses = NULL)hellingerNorm(X, X_dimensions, colTotal, rowTotal, grandTotal, weights = NULL, masses = NULL)
X |
Data matrix |
X_dimensions |
The dimensions of X in a vector of length 2 (rows,
columns). See |
colTotal |
Vector of column sums. |
rowTotal |
Vector of row sums. |
grandTotal |
Grand total of X |
weights |
Optional weights to include for the columns. |
masses |
Optional masses to include for the rows. |
rowCenter |
The barycenter of X. |
masses |
Masses to be used for the GSVD. |
weights |
Weights to be used for the GSVD. |
rowProfiles |
The row profiles of X. |
deviations |
Deviations of
row profiles from |
Derek Beaton and Hervé Abdi
Preprocessing for supplementary columns in Hellinger analyses.
hellingerSupplementaryColsPreProcessing(SUP.DATA, W = NULL, M = NULL)hellingerSupplementaryColsPreProcessing(SUP.DATA, W = NULL, M = NULL)
SUP.DATA |
A supplemental matrix that has the same number of rows as an active data set. |
W |
A vector or matrix of Weights. If none are provided, a default is computed. |
M |
A vector or matrix of Masses. If none are provided, a default is computed. |
a matrix that has been preprocessed to project supplementary rows for Hellinger methods.
Derek Beaton
Preprocessing for supplementary rows in Hellinger analyses.
hellingerSupplementaryRowsPreProcessing(SUP.DATA, center = NULL)hellingerSupplementaryRowsPreProcessing(SUP.DATA, center = NULL)
SUP.DATA |
A supplemental matrix that has the same number of rows as an active data set. |
center |
The center from the active data. NULL will center
|
a matrix that has been preprocessed to project supplementary columns for Hellinger methods.
Derek Beaton
Seventeen Alzheimer's Patient-Spouse Dyads had conversations recorded and 58 attributes were recoded for this data. Each attribute is a frequency of occurence of the item.
data(jlsr.2010.ad)data(jlsr.2010.ad)
jlsr.2010.ad$ca$data: Seventeen patient-spouse dyads (rows)
described by 58 conversation items. For use with epCA and
discriminant analyses.
jlsr.2010.ad$mca$design: A design matrix that
indicates which group the dyad belongs to: control (CTRL), early stage
Alzheimer's (EDAT) or middle stage Alzheimer's (MDAT).
Williams, L.J., Abdi, H., French, R., and Orange, J.B. (2010). A tutorial on Multi-Block Discriminant Correspondence Analysis (MUDICA): A new method for analyzing discourse data from clinical populations. Journal of Speech Language and Hearing Research, 53, 1372-1393.
Contains 2 data sets: distance matrix of fMRI scans of participants viewing categories of items and distance matrix of the actual pixels from the images in each category.
data(jocn.2005.fmri)data(jocn.2005.fmri)
jocn.2005.fmri$images$data: A distance matrix of 6 categories of
images based on a pixel analysis.
jocn.2005.fmri$scans$data: A distance
matrix of 6 categories of images based on fMRI scans.
O'Toole, A. J., Jiang, F., Abdi, H., and Haxby, J. V. (2005).
Partially distributed representations of objects and faces in ventral
temporal cortex. Journal of Cognitive Neuroscience, 17(4),
580-590.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A.,
Schouten, J. L., and Pietrini, P. (2001). Distributed and overlapping
representation of faces and objects in ventral temporal cortex.
Science, 293, 2425-2430.
http://openfmri.org/dataset/ds000105
epMDS).Makes distances and weights for MDS analyses (see epMDS).
makeDistancesAndWeights(DATA, method = "euclidean", masses = NULL)makeDistancesAndWeights(DATA, method = "euclidean", masses = NULL)
DATA |
A data matrix to compute distances between row items. |
method |
which distance metric should be used. |
masses |
a diagonal matrix (or vector) that contains the masses (for the row items). |
D |
Distance matrix for analysis |
MW |
a list item with
masses and weights. Weights are not used in |
Derek Beaton
link{computeMW}, link{epMDS}, link{coreMDS}
Transforms each column into measure-response columns with disjunctive (0/1) coding. If NA is found somewhere in matrix, barycentric recoding is peformed for the missing value(s).
makeNominalData(datain)makeNominalData(datain)
datain |
a data matrix where the columns will be recoded. |
dataout |
a transformed version of datain. |
Derek Beaton
data(mca.wine) nominal.wine <- makeNominalData(mca.wine$data)data(mca.wine) nominal.wine <- makeNominalData(mca.wine$data)
This function performs all preprocessing steps required for Correspondence Analysis-based preprocessing.
makeRowProfiles(X, weights = NULL, masses = NULL, hellinger = FALSE)makeRowProfiles(X, weights = NULL, masses = NULL, hellinger = FALSE)
X |
Data matrix. |
weights |
optional. Weights to include in preprocessing. |
masses |
optional. Masses to include in preprocessing. |
hellinger |
a boolean. If TRUE, Hellinger preprocessing is used. Else, CA row profile is computed. |
Returns from link{hellingerNorm} or caNorm.
Derek Beaton
A function for correcting the eigenvalues and output from multiple
correspondence analysis (MCA, epMCA)
mca.eigen.fix(DATA, mca.results, make_data_nominal = TRUE, numVariables = NULL, correction = c("b"), symmetric = FALSE)mca.eigen.fix(DATA, mca.results, make_data_nominal = TRUE, numVariables = NULL, correction = c("b"), symmetric = FALSE)
DATA |
original data (i.e., not transformed into disjunctive coding) |
mca.results |
output from |
make_data_nominal |
a boolean. Should DATA be transformed into disjunctive coding? Default is TRUE. |
numVariables |
the number of actual measures/variables in the data (typically the number of columns in DATA) |
correction |
which corrections should be applied? "b" = Benzécri correction, "bg" = Greenacre adjustment to Benzécri correction. |
symmetric |
a boolean. If the results from MCA are symmetric or asymmetric factor scores. Default is FALSE. |
mca.results |
a modified version of mca.results. Factor scores (e.g., $fi, $fj), and $pdq are updated based on corrections chosen. |
Derek Beaton
Benzécri, J. P. (1979). Sur le calcul des taux d'inertie dans
l'analyse d'un questionnaire. Cahiers de l'Analyse des Données,
4, 377-378.
Greenacre, M. J. (2007). Correspondence Analysis in
Practice. Chapman and Hall.
data(mca.wine) #No corrections used in MCA mca.wine.res.uncor <- epMCA(mca.wine$data,correction=NULL) data <- mca.wine$data expo.output <- mca.wine.res.uncor$ExPosition.Data #mca.eigen.fix with just Benzécri correction mca.wine.res.b <- mca.eigen.fix(data, expo.output,correction=c('b')) #mca.eigen.fix with Benzécri + Greenacre adjustment mca.wine.res.bg <- mca.eigen.fix(data,expo.output,correction=c('b','g'))data(mca.wine) #No corrections used in MCA mca.wine.res.uncor <- epMCA(mca.wine$data,correction=NULL) data <- mca.wine$data expo.output <- mca.wine.res.uncor$ExPosition.Data #mca.eigen.fix with just Benzécri correction mca.wine.res.b <- mca.eigen.fix(data, expo.output,correction=c('b')) #mca.eigen.fix with Benzécri + Greenacre adjustment mca.wine.res.bg <- mca.eigen.fix(data,expo.output,correction=c('b','g'))
Six wines described by several assessors with qualitative attributes.
data(mca.wine)data(mca.wine)
mca.wine$data: A (categorical) data matrix with 6 wines (rows) from
several assessors described by 10 attributes (columns). For use with
epMCA.
Abdi, H., & Valentin, D. (2007). Multiple correspondence analysis. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 651-657.
Preprocessing of supplemental data for MDS analyses.
mdsSupplementalElementsPreProcessing(SUP.DATA = NULL, D = NULL, M = NULL)mdsSupplementalElementsPreProcessing(SUP.DATA = NULL, D = NULL, M = NULL)
SUP.DATA |
A supplementary data matrix. |
D |
The original (active) distance matrix that |
M |
masses from the original (active) analysis for |
a matrix that is preprocessed for supplementary projection in MDS.
Derek Beaton
Transform data for MDS analysis.
mdsTransform(D, masses)mdsTransform(D, masses)
D |
A distance matrix |
masses |
A vector or matrix of masses (see |
S |
a preprocessed matrix that can be decomposed. |
Derek Beaton
Checks if data is in disjunctive (sometimes called complete binary) format.
To be used with MCA (e.g., epMCA).
nominalCheck(DATA)nominalCheck(DATA)
DATA |
A data matrix to check. This should be 0/1 disjunctive coded.
|
If DATA are nominal, DATA is returned. If not,
stop is called and execution halts.
Derek Beaton
A replication of MatLab pause function.
pause(x = 0)pause(x = 0)
x |
optional. If x>0 a call is made to |
Derek Beaton (but the pase of which is provided by Phillipe Brosjean from the R mailing list.)
Copied from:
https://stat.ethz.ch/pipermail/r-help/2001-November/
Six wines described by several assessors with rank attributes.
data(pca.wine)data(pca.wine)
pca.wine$data: A data matrix with 6 wines (rows) from several
assessors described by 11 attributes (columns). For use with
epPCA.
Abdi, H., and Williams, L.J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459.
Preprocessing for supplementary columns in PCA.
pcaSupplementaryColsPreProcessing(SUP.DATA = NULL, center = TRUE, scale = TRUE, M = NULL)pcaSupplementaryColsPreProcessing(SUP.DATA = NULL, center = TRUE, scale = TRUE, M = NULL)
SUP.DATA |
A supplemental matrix that has the same number of rows as an active data set. |
center |
The center from the active data. NULL will center
|
scale |
The scale factor from the active data. NULL will scale
(z-score) |
M |
Masses from the active data. |
a matrix that has been preprocessed to project supplementary columns for PCA methods.
Derek Beaton
Preprocessing for supplemental rows in PCA.
pcaSupplementaryRowsPreProcessing(SUP.DATA = NULL, center = TRUE, scale = TRUE, W = NULL)pcaSupplementaryRowsPreProcessing(SUP.DATA = NULL, center = TRUE, scale = TRUE, W = NULL)
SUP.DATA |
A supplemental matrix that has the same number of columns as an active data set. |
center |
The center from the active data. NULL will center
|
scale |
The scale factor from the active data. NULL will scale
(z-score) |
W |
Weights from the active data. |
a matrix that has been preprocessed to project supplementary rows for PCA methods.
Derek Beaton
This function is an interface for the user to a general SVD or related
decomposition. It provides direct access to svd and
eigen. Future decompositions will be available.
pickSVD(datain, is.mds = FALSE, decomp.approach = "svd", k = 0)pickSVD(datain, is.mds = FALSE, decomp.approach = "svd", k = 0)
datain |
a data matrix to decompose. |
is.mds |
a boolean. TRUE for a MDS decomposition. |
decomp.approach |
a string. 'svd' for singular value decomposition, 'eigen' for an eigendecomposition. All approaches provide identical output. Some approaches are (in some cases) faster than others. |
k |
numeric. The number of components to return. |
A list with the following items:
u |
Left singular vectors (rows) |
v |
Right singular vectors (columns) |
d |
Singular values |
tau |
Explained variance per component |
Derek Beaton
Print Correspondence Analysis (CA) results
## S3 method for class 'epCA' print(x,...)## S3 method for class 'epCA' print(x,...)
x |
an list that contains items to make into the epCA class. |
... |
inherited/passed arguments for S3 print method(s). |
Derek Beaton and Cherise Chin-Fatt
Print epGraphs results
## S3 method for class 'epGraphs' print(x,...)## S3 method for class 'epGraphs' print(x,...)
x |
an list that contains items to make into the epGraphs class. |
... |
inherited/passed arguments for S3 print method(s). |
Derek Beaton and Cherise Chin-Fatt
Print Multiple Correspondence Analysis (MCA) results
## S3 method for class 'epMCA' print(x,...)## S3 method for class 'epMCA' print(x,...)
x |
an list that contains items to make into the epMCA class. |
... |
inherited/passed arguments for S3 print method(s). |
Derek Beaton and Cherise Chin-Fatt
Print Multidimensional Scaling (MDS) results
## S3 method for class 'epMDS' print(x,...)## S3 method for class 'epMDS' print(x,...)
x |
an list that contains items to make into the epMDS class. |
... |
inherited/passed arguments for S3 print method(s). |
Derek Beaton and Cherise Chin-Fatt
Print Principal Components Analysis (PCA) results
## S3 method for class 'epPCA' print(x,...)## S3 method for class 'epPCA' print(x,...)
x |
an list that contains items to make into the epPCA class. |
... |
inherited/passed arguments for S3 print method(s). |
Derek Beaton and Cherise Chin-Fatt
Print results from the singular value decomposition (SVD) in ExPosition
## S3 method for class 'epSVD' print(x,...)## S3 method for class 'epSVD' print(x,...)
x |
an list that contains items to make into the epSVD class. |
... |
inherited/passed arguments for S3 print method(s). |
Derek Beaton and Cherise Chin-Fatt
Print results from ExPosition
## S3 method for class 'expoOutput' print(x,...)## S3 method for class 'expoOutput' print(x,...)
x |
an list that contains items to make into the expoOutput class. |
... |
inherited/passed arguments for S3 print method(s). |
Derek Beaton and Cherise Chin-Fatt
This function will normalize the rows of a matrix.
rowNorms(X, type = NULL, center = FALSE, scale = FALSE)rowNorms(X, type = NULL, center = FALSE, scale = FALSE)
X |
Data matrix |
type |
a string. Type of normalization to perform. Options are
|
center |
optional. A vector to center the columns of X. |
scale |
optional. A vector to scale the values of X. |
rowNorms works like link{expo.scale}, but for rows. Hellinger row
norm via hellinger, Correspondence analysis analysis row norm (row
profiles) via ca, Z-score row norm via z. other passes
center and scale to expo.scale and allows for
optional centering and scaling parameters.
Returns a row normalized version of X.
Derek Beaton
Perform Rv coefficient computation.
rvCoeff(Smat, Tmat, type)rvCoeff(Smat, Tmat, type)
Smat |
A square covariance matrix |
Tmat |
A square covariance matrix |
type |
DEPRECATED. Any value here will be ignored |
A single value that is the Rv coefficient.
Derek Beaton
Robert, P., & Escoufier, Y. (1976). A Unifying Tool for Linear Multivariate Statistical Methods: The RV-Coefficient. Journal of the Royal Statistical Society. Series C (Applied Statistics), 25(3), 257–265.
The data come from a larger study on marijuauna dependent individuals (see
Filbey et al., 2009) and are illustrated in Beaton et al., 2013.
The
data contain 2 genetic markers and 3 additional drug use questions from 50
marijuauna dependent individuals.
data(snps.druguse)data(snps.druguse)
snps.druguse$DATA1: Fifty marijuana dependent participants indicated
which, if any, other drugs they have ever used.
snps.druguse$DATA2: Fifty
marijuana dependent participants were genotyped for the COMT and FAAH genes.
In snps.druguse$DATA1:
e - Stands for ecstacy use. Responses are yes or
no. cc - Stands for crack/cocaine use. Responses are yes or no. cm -
Stands for crystal meth use. Responses are yes or no.
In
snps.druguse$DATA2:
COMT - Stands for the COMT gene. Alleles are AA, AG,
or GG. Some values are NA. FAAH - Stands for FAAH gene. Alleles are AA, CA,
CC. Some values are NA.
Filbey, F. M., Schacht, J. P., Myers, U. S., Chavez, R. S., & Hutchison, K. E. (2009). Marijuana craving in the brain. Proceedings of the National Academy of Sciences, 106(31), 13016 – 13021.
Beaton D., Filbey F. M., Abdi H. (2013, in press). Integrating Partial Least Squares Correlation and Correspondence Analysis for Nominal Data. In Abdi H, Chin W, Esposito-Vinzi V, Russolillo G, Trinchera L. Proceedings in Mathematics and Statistics (Vol. 56): New Perspectives in Partial Least Squares and Related Methods. New York, NY: Springer-Verlag.
sqrt_mat performs the square root of a matrix only for square symmetric matrices This function should not be used directly.
sqrt_mat(X)sqrt_mat(X)
X |
a matrix that is square and symmetric |
Derek Beaton
Performs a supplementary projection across ExPosition (and related) techniques.
supplementalProjection(sup.transform = NULL, f.scores = NULL, Dv = NULL, scale.factor = NULL, symmetric = TRUE)supplementalProjection(sup.transform = NULL, f.scores = NULL, Dv = NULL, scale.factor = NULL, symmetric = TRUE)
sup.transform |
Data already transformed for supplementary projection.
That is, the output from: |
f.scores |
Active factor scores, e.g., res$ExPosition.Data$fi |
Dv |
Active singular values, e.g., res$ExPosition.Data$pdq$Dv |
scale.factor |
allows for a scaling factor of supplementary projections. Primarily used for MCA supplemental projections to a correction (e.g., Benzecri). |
symmetric |
a boolean. Default is TRUE. If FALSE, factor scores are computed with asymmetric properties (for rows only). |
A list with:
f.out |
Supplementary factor scores. |
d.out |
Supplementary square distances. |
r.out |
Supplementary cosines. |
Derek Beaton
It is preferred for users to compute supplemental projections via
supplementaryRows and supplementaryCols. These
handle some of the nuances and subtleties due to the different methods.
Computes factor scores for supplementary measures (columns).
supplementaryCols(SUP.DATA, res, center = TRUE, scale = TRUE)supplementaryCols(SUP.DATA, res, center = TRUE, scale = TRUE)
SUP.DATA |
a data matrix of supplementary measures (must have the same observations [rows] as active data) |
res |
ExPosition or TExPosition results |
center |
a boolean, string, or numeric. See |
scale |
a boolean, string, or numeric. See |
This function recognizes the class types of: epPCA,
epMDS, epCA, epMCA, and
TExPosition methods. Further, the function recognizes if Hellinger
(as opposed to row profiles; in CA, MCA and DICA) were used.
A list of values containing:
fjj |
factor scores computed for supplemental columns |
djj |
squared distances for supplemental columns |
rjj |
cosines for supplemental columns |
Derek Beaton
Computes factor scores for supplementary observations (rows).
supplementaryRows(SUP.DATA, res)supplementaryRows(SUP.DATA, res)
SUP.DATA |
a data matrix of supplementary observations (must have the same measures [columns] as active data) |
res |
ExPosition or TExPosition results |
This function recognizes the class types of: epPCA,
epMDS, epCA, epMCA and
TExPosition methods. Further, the function recognizes if Hellinger
(as opposed to row profiles; in CA, MCA and DICA) were used.
A list of values containing:
fii |
factor scores computed for supplemental observations |
dii |
squared distances for supplemental observations |
rii |
cosines for supplemental observations |
Derek Beaton
How six wines are described by 3 assessors across various flavor profiles, totaling 10 columns.
data(wines2007)data(wines2007)
wines2007$data: A data set with 3 experts (studies) describing 6
wines (rows) using several variables using a scale from 1 to 7 with a total
of 10 measures (columns).
wines2007$table: A data matrix which identifies
the 3 experts (studies).
Abdi, H., & Valentin, D. (2007). STATIS. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. pp. 955-962.
10 experts who describe 12 wines using four variables (cat-pee, passion fruit, green pepper, and mineral) considered as standard, and up to two additional variables if the experts chose.
data(wines2012)data(wines2012)
wines2012$data: A data set with 10 experts (studies) describing 12
wines (rows) using four to six variables using a scale from 1 to 9 with a
total of 53 measures (columns).
wines2012$table: A data matrix which
identifies the 10 experts (studies).
wines2012$supplementary: A data
matrix with 12 wines (rows) describing 4 Chemical Properties (columns).
Abdi, H., Williams, L.J., Valentin, D., & Bennani-Dosse, M. (2012). STATIS and DISTATIS: Optimum multi-table principal component analysis and three way metric multidimensional scaling. Wiley Interdisciplinary Reviews: Computational Statistics, 4, 124-167.
Twenty words “randomly” selected from a dictionary and described by two features: length of word and number of definitions.
data(words)data(words)
words$data: A data matrix with 20 words (rows) described by 2
attributes (columns). For use with epPCA.
Abdi, H., and Williams, L.J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 433-459.