Package 'MoEClust' reference manual

Title:	Gaussian Parsimonious Clustering Models with Covariates and a Noise Component
Description:	Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2020) <doi:10.1007/s11634-019-00373-8>. This package fits finite Gaussian mixture models with a formula interface for supplying gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated. A greedy forward stepwise search algorithm is provided for identifying the optimal model in terms of the number of components, the GPCM covariance parameterisation, and the subsets of gating/expert network covariates.
Authors:	Keefe Murphy [aut, cre] , Thomas Brendan Murphy [ctb]
Maintainer:	Keefe Murphy <[email protected]>
License:	GPL (>= 3)
Version:	1.6.0
Built:	2025-03-05 19:13:11 UTC
Source:	https://github.com/keefe-murphy/moeclust

MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Description

Clustering via parsimonious Gaussian Mixtures of Experts using the MoEClust models introduced by Murphy and Murphy (2020) <doi:10.1007/s11634-019-00373-8>. This package fits finite Gaussian mixture models with gating and/or expert network covariates using a range of parsimonious covariance parameterisations from the GPCM family via the EM/CEM algorithm. Visualisation of the results of such models using generalised pairs plots and the inclusion of an additional noise component is also facilitated.

Details

Type:: Package
Package:: MoEClust
Version:: 1.6.0
Date:: 2025-03-05 (this version), 2017-11-28 (original release)
Licence:: GPL (>= 3)

Usage

The most important function in the MoEClust package is: MoE_clust, for fitting the model via EM/CEM with gating and/or expert network covariates, supplied via formula interfaces.

MoE_compare is provided for conducting model selection between different results from MoE_clust using different covariate combinations &/or initialisation strategies, etc.

MoE_stepwise is provided for conducting a greedy forward stepwise search to identify the optimal model in terms of the number of components, GPCM covariance type, and the subsets of gating/expert network covariates.

MoE_control allows supplying additional arguments to MoE_clust and MoE_stepwise which govern, among other things, controls on the inclusion of an additional noise component and controls on the initialisation of the allocations for the EM/CEM algorithm.

A dedicated plotting function (plot.MoEClust) exists for visualising the results using generalised pairs plots, for examining the gating network, and/or log-likelihood, and/or clustering uncertainties, and/or similarity matrix, and/or graphing model selection criteria values. The generalised pairs plots (MoE_gpairs) visualise all pairwise relationships between clustered response variables and associated continuous, categorical, and/or ordinal covariates in the gating &/or expert networks, coloured according to the MAP classification, and also give the marginal distributions of each variable (incl. the covariates) along the diagonal.

An as.Mclust method is provided to coerce the output of class "MoEClust" from MoE_clust to the "Mclust" class, to facilitate use of plotting and other functions for the "Mclust" class within the mclust package. As per mclust, MoEClust also facilitates modelling with an additional noise component (with or without the mixing proportion for the noise component depending on covariates).

Finally, a predict method is provided for predicting the fitted response and probability of cluster membership (and by extension the MAP classification) for new data, in the form of new covariates and new response data, or new covariates only.

Other functions also exist, e.g. MoE_crit, MoE_dens, MoE_estep, MoE_compare, and aitken, which are all used within MoE_clust but are nonetheless made available for standalone use.

The package also contains two data sets: ais and CO2data.

Author(s)

Keefe Murphy [aut, cre], Thomas Brendan Murphy [ctb]

Maintainer: Keefe Murphy - <[email protected]>

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <doi:10.1007/s11634-019-00373-8>.

Examples

data(ais)

# Fit two sets of models
res1  <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                   modelNames=c("VEE", "EVE", "VVE"), network.data=ais)
res2  <- MoE_clust(ais[,3:7], G=2, equalPro=TRUE, expert= ~ sex,
                   modelNames=c("VEE", "EVE", "VVE"), network.data=ais) 
        
# Compare the best model from each set of results
(comp <- MoE_compare(res1, res2, optimal.only=TRUE))

# Produce a plot for the optimal model                                                   
plot(comp$optimal, what="gpairs", show.dens=TRUE)

# Summarise its classification table, component parameters, and gating/expert networks
summary(comp$optimal, classification=TRUE, parameters=TRUE, networks=TRUE)

data(CO2data)
CO2   <- CO2data$CO2
GNP   <- CO2data$GNP

# Fit a range of models 
m1    <- MoE_clust(CO2, G=1:3)
m2    <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3    <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4    <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5    <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)

# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
 
# See if a better model can be found using greedy forward stepwise selection
# Conduct a stepwise search on the same data
(mod1 <- MoE_stepwise(CO2, CO2data[,"GNP", drop=FALSE]))

# Conduct another stepwise search considering models with a noise component
(mod2 <- MoE_stepwise(CO2, CO2data[,"GNP", drop=FALSE], noise=TRUE))

# Compare all sets of results to choose the optimal model
(best <- MoE_compare(mod1, mod2, comp, pick=1)$optimal)
data(ais)

# Fit two sets of models
res1  <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                   modelNames=c("VEE", "EVE", "VVE"), network.data=ais)
res2  <- MoE_clust(ais[,3:7], G=2, equalPro=TRUE, expert= ~ sex,
                   modelNames=c("VEE", "EVE", "VVE"), network.data=ais) 
        
# Compare the best model from each set of results
(comp <- MoE_compare(res1, res2, optimal.only=TRUE))

# Produce a plot for the optimal model                                                   
plot(comp$optimal, what="gpairs", show.dens=TRUE)

# Summarise its classification table, component parameters, and gating/expert networks
summary(comp$optimal, classification=TRUE, parameters=TRUE, networks=TRUE)

data(CO2data)
CO2   <- CO2data$CO2
GNP   <- CO2data$GNP

# Fit a range of models 
m1    <- MoE_clust(CO2, G=1:3)
m2    <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3    <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4    <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5    <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)

# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
 
# See if a better model can be found using greedy forward stepwise selection
# Conduct a stepwise search on the same data
(mod1 <- MoE_stepwise(CO2, CO2data[,"GNP", drop=FALSE]))

# Conduct another stepwise search considering models with a noise component
(mod2 <- MoE_stepwise(CO2, CO2data[,"GNP", drop=FALSE], noise=TRUE))

# Compare all sets of results to choose the optimal model
(best <- MoE_compare(mod1, mod2, comp, pick=1)$optimal)

Australian Institute of Sport data

Description

Data on 102 male and 100 female athletes collected at the Australian Institute of Sport, courtesy of Richard Telford and Ross Cunningham.

Usage

data(ais)
data(ais)

Format

A data frame with 202 observations on the following 13 variables:

sex: categorical, levels = female, male
sport: categorical, levels = B_Ball, Field, Gym, Netball, Row, Swim, T_400m, Tennis, T_Sprnt, W_Polo
RCC: red cell count (numeric)
WCC: white cell count (numeric)
Hc: Hematocrit (numeric)
Hg: Hemoglobin (numeric)
Fe: plasma ferritin concentration (numeric)
BMI: body mass index: Wt/(Ht)^2 (numeric)
SSF: sum of skin folds (numeric)
Bfat: body fat percentage (numeric)
LBM: lean body mass (numeric)
Ht: height, cm (numeric)
Wt: weight, kg (numeric)

Details

The data have been made publicly available in connection with the book by Cook and Weisberg (1994).

References

Cook, R. D. and Weisberg, S. (1994), An Introduction to Regression Graphics. Volume 405 of Wiley Series in Probability and Statistics, New York, NY, USA: John Wiley & Sons.

Examples

data(ais, package="MoEClust")
pairs(ais[,c(3:7)], col=as.numeric(ais$sex), main = "AIS data")
apply(ais[,c(3:7)], 2, summary)
data(ais, package="MoEClust")
pairs(ais[,c(3:7)], col=as.numeric(ais$sex), main = "AIS data")
apply(ais[,c(3:7)], 2, summary)

Aitken Acceleration

Description

Calculates the Aitken acceleration estimate of the final converged maximised log-likelihood under the EM/CEM framework.

Usage

aitken(loglik)
aitken(loglik)

Arguments

loglik

A vector of three consecutive log-likelihood values. These three values should be in ascending order, though this is not checked.

Details

The final converged maximised log-likelihood can be used to determine convergence of the EM/CEM algorithm within MoE_clust, i.e. by checking whether the absolute difference between the previous log-likelihood estimate and the final converged maximised log-likelihood estimate is less than some tolerance.

Value

A list with the following named components:

`ll`	The most current estimate of the log-likelihood, i.e. `loglik[3]`.
`linf`	The most current estimate of the final converged maximised log-likelihood.
`a`	The Aitken acceleration value where typically `0 <= a <= 1`. When `a < 0`, a numerical issue or bug has occurred; when `a > 1`, the algorithm is accelerating and should not be stopped.
`ldiff`	The difference between `linf` and the previous estimate of the log-likelihood, i.e. `loglik[2]`, in accordance with McNicholas et al. (2010).

When the "aitken" method is employed within MoE_clust (via MoE_control), ll at convergence gives the log-likelihood achieved by the estimated parameters, while linf at convergence estimates the log-likelihood that would be achieved after an infinite number of EM/CEM iterations.

Note

Within MoE_clust, as specified by the stopping argument of MoE_control, "aitken" is the default method used to assess convergence. The other option monitors the "relative" change in log-likelihood against some tolerance. See MoE_control.

Author(s)

Keefe Murphy - <[email protected]>

References

Boehning, D., Dietz, E., Schaub, R., Schlattmann, P. and Lindsay, B. G. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46(2): 373-388.

McNicholas, P. D., Murphy, T. B., McDaid, A. F. and Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics & Data Analysis, 54(3): 711-723.

Examples

(a1 <- aitken(-c(449.61534, 442.84221, 436.58999)))
a1$ldiff < 1e-05 # FALSE
(a2 <- aitken(-c(442.84221, 436.58999, 436.58998)))
a2$ldiff < 1e-05 # FALSE
(a3 <- aitken(-c(436.58999, 436.58998, 436.58998)))
a3$ldiff < 1e-05 # TRUE
(a1 <- aitken(-c(449.61534, 442.84221, 436.58999)))
a1$ldiff < 1e-05 # FALSE
(a2 <- aitken(-c(442.84221, 436.58999, 436.58998)))
a2$ldiff < 1e-05 # FALSE
(a3 <- aitken(-c(436.58999, 436.58998, 436.58998)))
a3$ldiff < 1e-05 # TRUE

Convert MoEClust objects to the Mclust class

Description

Converts an object of class "MoEClust" generated by MoE_clust and converts it to an object of class "Mclust" as generated by fitting Mclust, to facilitate use of plotting and other functions for the "Mclust" class within the mclust package. Some caution is advised when converting models with gating &/or expert covariates (see Note below) and users are always encouraged to use the dedicated plot.MoEClust function for objects of the "MoEClust" class instead.

Usage

## S3 method for class 'MoEClust'
as.Mclust(x,
         expert.covar = TRUE,
         signif = 0L,
         ...)
## S3 method for class 'MoEClust'
as.Mclust(x,
         expert.covar = TRUE,
         signif = 0L,
         ...)

Arguments

`x`	An object of class `"MoEClust"` generated by `MoE_clust` or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`expert.covar`	Logical (defaults to `TRUE`) governing whether the extra variability in the component means is added to the MVN ellipses corresponding to the component covariance matrices in the presence of expert network covariates. See the function `expert_covar`.
`signif`	Significance level for outlier removal. Must be a single number in the interval [0, 1). Corresponds to the percentage of data to be considered extreme and therefore removed (half of `signif` at each endpoint, on a column-wise basis). The default, `0`, corresponds to no outlier removal. Only invoke this argument as an aid to visualisation via `plot.Mclust`.
`...`	Further arguments to be passed to other methods.

Details

Mixing proportions are averaged over observations in components in the presence of gating network covariates during the coercion. For models with expert network covariates, the means are given by the posterior means of the fitted values of the expert network.

In the presence of expert network covariates, the component-specific covariance matrices are (by default, via the argument expert.covar) modified for plotting purposes via the function expert_covar, in order to account for the extra variability of the means, usually resulting in bigger shapes & sizes for the MVN ellipses.

The signif argument is intended only to aid visualisation via plot.Mclust, as plots therein can be sensitive to outliers, particularly with regard to axis limits. However, users are always encouraged to use the dedicated plot.MoEClust function for objects of the "MoEClust" class instead (see Note below).

Value

An object of class "Mclust". See methods(class="Mclust") for a (non-exhaustive) list of functions which can be applied to this class.

Note

Plots may be quite misleading in the presence of gating &/or (especially) expert network covariates when the what argument is "density" within plot.Mclust; users are strongly encouraged to use MoE_gpairs with response.type="density" &/or show.dens=TRUE instead.

Predictions (via predict.Mclust) will also be misleading in the presence of covariates of any kind when newdata is supplied; thus, users are strongly encouraged to use predict.MoEClust instead.

The functions clustCombi and clustCombiOptim can be safely used (provided as.Mclust(x) is supplied as the object argument to clustCombi), as they only rely on x$z and x$G only. See the examples below.

Users may expect MoEClust models with no covariates of any kind to be identical to models fitted via mclust, but this is not necessarily true: see the MoE_control argument asMclust.

Author(s)

Keefe Murphy - <[email protected]>

References

Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458): 611-631.

Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1): 289-317.

Examples

library(mclust)

# Fit a gating network mixture of experts model to the ais data
data(ais)
mod   <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)

# Convert to the "Mclust" class and examine the classification
mod2  <- as.Mclust(mod)
plot(mod2, what="classification")

# Examine the uncertainty
plot(mod2, what="uncertainty")

# Return the optimal number of clusters according to entropy
combi <- mclust::clustCombi(object=mod2)
optim <- mclust::clustCombiOptim(object=combi)
table(mod2$classification, ais$sex)
table(optim$cluster.combi, ais$sex)

# Compare plot.MoEClust and plot.Mclust for univariate mixtures
data(CO2data)
res <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, modelNames="V", network.data=CO2data)
plot(as.Mclust(res), what="classification")
plot(as.Mclust(res), what="density")
plot(as.Mclust(res), what="uncertainty") 

# Proper version of what="density" plot:
MoE_gpairs(res, show.map=FALSE, cov.ind=0, show.dens=TRUE)

# Equivalent what="uncertainty" plot:
MoE_Uncertainty(res)
library(mclust)

# Fit a gating network mixture of experts model to the ais data
data(ais)
mod   <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)

# Convert to the "Mclust" class and examine the classification
mod2  <- as.Mclust(mod)
plot(mod2, what="classification")

# Examine the uncertainty
plot(mod2, what="uncertainty")

# Return the optimal number of clusters according to entropy
combi <- mclust::clustCombi(object=mod2)
optim <- mclust::clustCombiOptim(object=combi)
table(mod2$classification, ais$sex)
table(optim$cluster.combi, ais$sex)

# Compare plot.MoEClust and plot.Mclust for univariate mixtures
data(CO2data)
res <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, modelNames="V", network.data=CO2data)
plot(as.Mclust(res), what="classification")
plot(as.Mclust(res), what="density")
plot(as.Mclust(res), what="uncertainty") 

# Proper version of what="density" plot:
MoE_gpairs(res, show.map=FALSE, cov.ind=0, show.dens=TRUE)

# Equivalent what="uncertainty" plot:
MoE_Uncertainty(res)

GNP and CO2 Data Set

Description

This data set gives the gross national product (GNP) per capita in 1996 for various countries as well as their estimated carbon dioxide (CO2) emission per capita for the same year.

Usage

data(CO2data)
data(CO2data)

Format

This data frame consists of 28 countries and the following variables:

GNP: The gross product per capita in 1996.
CO2: The estimated carbon dioxide emission per capita in 1996.
country: An abbreviation pertaining to the country measures (e.g. "GRC" = Greece and "CH" = Switzerland).

References

Hurn, M., Justel, A. and Robert, C. P. (2003) Estimating mixtures of regressions, Journal of Computational and Graphical Statistics, 12(1): 55-79.

Examples

data(CO2data, package="MoEClust")
plot(CO2data$GNP, CO2data$CO2, type="n", ylab=expression('CO'[2]))
text(CO2data$GNP, CO2data$CO2, CO2data$country)
data(CO2data, package="MoEClust")
plot(CO2data$GNP, CO2data$CO2, type="n", ylab=expression('CO'[2]))
text(CO2data$GNP, CO2data$CO2, CO2data$country)

Drop constant variables from a formula

Description

Drops constant variables from the RHS of a formula taking the data set (dat), the formula (formula), and an optional subset vector (sub) as arguments.

Usage

drop_constants(dat,
               formula,
               sub = NULL)
drop_constants(dat,
               formula,
               sub = NULL)

Arguments

`dat`	A `data.frame` where rows correspond to observations and columns correspond to variables. Ideally column names should be present.
`formula`	An object of class `"formula"`: a symbolic description of the model to be fitted. Variables in the `formula` not present in the columns of `dat` will automatically be discarded. The `formula` may include interactions, transformations, or higher order terms: the latter must be specified explicitly using the `AsIs` operator (`I`).
`sub`	An optional vector specifying a subset of observations to be used in the fitting process.

Value

The updated formula with constant variables removed.

Note

Formulas with and without intercepts are accommodated.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
hema  <- as.matrix(ais[,3:7])
sex   <- ais$sex
BMI   <- ais$BMI

# Set up a no-intercept regression formula with constant column 'sex'
form1 <- as.formula(hema ~ sex + BMI + I(BMI^2) - 1)
sub   <- ais$sex == "male"

# Try fitting a linear model
mod1  <- try(lm(form1, data=ais, subset=sub), silent=TRUE)
inherits(mod1, "try-error") # TRUE

# Remove redundant variables from formula & try again
form2 <- drop_constants(ais, form1, sub)
mod2  <- try(lm(form2, data=ais, subset=sub), silent=TRUE)
inherits(mod2, "try-error") # FALSE
data(ais)
hema  <- as.matrix(ais[,3:7])
sex   <- ais$sex
BMI   <- ais$BMI

# Set up a no-intercept regression formula with constant column 'sex'
form1 <- as.formula(hema ~ sex + BMI + I(BMI^2) - 1)
sub   <- ais$sex == "male"

# Try fitting a linear model
mod1  <- try(lm(form1, data=ais, subset=sub), silent=TRUE)
inherits(mod1, "try-error") # TRUE

# Remove redundant variables from formula & try again
form2 <- drop_constants(ais, form1, sub)
mod2  <- try(lm(form2, data=ais, subset=sub), silent=TRUE)
inherits(mod2, "try-error") # FALSE

Drop unused factor levels to predict from unseen data

Description

Drops unseen factor levels in newdata for which predictions are required from a lm or multinom model fit.

Usage

drop_levels(fit,
            newdata)
drop_levels(fit,
            newdata)

Arguments

`fit`	A fitted `lm` or `multinom` model.
`newdata`	A `data.frame` containing variables with which to predict.

Value

A data.frame like newdata with unseen factor levels replaced by NA.

Note

This function is so far untested for models other than lm or multinom, though it may still work for other classes.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
hema  <- as.matrix(ais[,3:7])
BMI   <- ais$BMI
sport <- ais$sport
sub   <- ais$sport != "Row"

# Fit a linear model
mod   <- lm(hema ~ BMI + sport, data=ais, subset=sub)

# Make predictions
pred1 <- try(predict(mod, newdata=ais), silent=TRUE)
inherits(pred1, "try-error") #TRUE

# Remove unused levels and try again
pred2 <- try(predict(mod, newdata=drop_levels(mod, ais)), silent=TRUE)
inherits(pred2, "try-error") #FALSE
anyNA(pred2)                 #TRUE
data(ais)
hema  <- as.matrix(ais[,3:7])
BMI   <- ais$BMI
sport <- ais$sport
sub   <- ais$sport != "Row"

# Fit a linear model
mod   <- lm(hema ~ BMI + sport, data=ais, subset=sub)

# Make predictions
pred1 <- try(predict(mod, newdata=ais), silent=TRUE)
inherits(pred1, "try-error") #TRUE

# Remove unused levels and try again
pred2 <- try(predict(mod, newdata=drop_levels(mod, ais)), silent=TRUE)
inherits(pred2, "try-error") #FALSE
anyNA(pred2)                 #TRUE

Account for extra variability in covariance matrices with expert covariates

Description

In the presence of expert network covariates, this helper function modifies the component-specific covariance matrices of a "MoEClust" object, in order to account for the extra variability due to the component means, usually resulting in bigger shapes & sizes for the MVN ellipses in MoE_gpairs plots. The function also works for univariate response data.

Usage

expert_covar(x,
             weighted = TRUE,
             ...)
expert_covar(x,
             weighted = TRUE,
             ...)

Arguments

`x`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`weighted`	A logical indicating whether the estimated cluster membership probabilities should be used to provide a weighted estimate of the variability due to the component means. Defaults to `TRUE`. The option `weighted=FALSE` is provided only so that previous behaviour under earlier versions of MoEClust can be recovered but is otherwise not recommended.
`...`	Catches unused arguments.

Details

This function is used internally by MoE_gpairs, plot.MoEClust(x, what="gpairs"), and as.Mclust, for visualisation purposes.

Value

The variance component only from the parameters list from the output of a call to MoE_clust, modified accordingly.

Note

The modelName of the resulting variance object may not correspond to the model name of the "MoEClust" object, in particular scale, shape, &/or orientation may no longer be constrained across clusters, and cholsigma, if it was in the input, will be discarded from the output. Usually, the modelName of the transformed variance object will be "VVV" for multivariate data and "V" for univariate data, but not always. Furthermore, the output will drop certain row and column names from the result.

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

data(ais)
res   <- MoE_clust(ais[,3:7], G=2, gating= ~ 1, expert= ~ sex,
                   network.data=ais, modelNames="EEE", equalPro=TRUE)

# Extract the variance object
res$parameters$variance

# Modify the variance object
expert_covar(res)
data(ais)
res   <- MoE_clust(ais[,3:7], G=2, gating= ~ 1, expert= ~ sex,
                   network.data=ais, modelNames="EEE", equalPro=TRUE)

# Extract the variance object
res$parameters$variance

# Modify the variance object
expert_covar(res)

Compute the Frobenius (adjusted) Rand index

Description

This function efficiently computes fuzzy generalisations of the Rand and adjusted Rand indices for comparing two partitions, allowing either or both partitions to be “soft” or “hard”.

Usage

FARI(z1,
     z2)
FARI(z1,
     z2)

Arguments

z1, z2

A $n * G$ matrix representing a hard partition (all entries 0 or 1) or soft cluster-membership probabilities.

Details

If z1 &/or z2 is supplied as a vector of cluster labels, they will be coerced to an appropriate matrix via unmap.

Value

A list with the following named components:

FRI: Measure of Frobenius Rand index between z1 and z2.
FARI: Measure of Frobenius adjusted Rand index between z1 and z2.

Note

The number of columns of the matrices z1 and z2 need not be equal.

Author(s)

Keefe Murphy - <[email protected]>

References

Andrew, J. L., Browne, R., and Hvingelby, C. D. (2022). On assessments of agreement between fuzzy partitions. Journal of Classification, 39(2): 326-342.

Examples

m1 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE",
                gating=~BMI, expert=~sex, network.data=ais)
m2 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", 
                equalPro=TRUE, expert=~sex, network.data=ais)
m3 <- MoE_clust(ais[,3:7], G=2, modelNames="VEE", algo="CEM", tau0=0.1)

# FARI between two soft partitions
FARI(m1$z, m2$z)
# FARI between soft and hard partitions
FARI(m1$z, m3$z)
# FARI between soft partition and hard classification
FARI(m1$z, m2$classification)
# FARI between hard partition and hard classification
FARI(m3$z, m3$classification)
# FARI between hard classification and hard classification
FARI(m1$classification, m2$classification)
m1 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE",
                gating=~BMI, expert=~sex, network.data=ais)
m2 <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", 
                equalPro=TRUE, expert=~sex, network.data=ais)
m3 <- MoE_clust(ais[,3:7], G=2, modelNames="VEE", algo="CEM", tau0=0.1)

# FARI between two soft partitions
FARI(m1$z, m2$z)
# FARI between soft and hard partitions
FARI(m1$z, m3$z)
# FARI between soft partition and hard classification
FARI(m1$z, m2$classification)
# FARI between hard partition and hard classification
FARI(m3$z, m3$classification)
# FARI between hard classification and hard classification
FARI(m1$classification, m2$classification)

Force diagonal elements of a triangular matrix to be positive

Description

This function ensures that the triangular matrix in a QR (or other) decomposition has positive values along its diagonal.

Usage

force_posiDiag(x)
force_posiDiag(x)

Arguments

`x`	A matrix, which must be either upper-triangular or lower-triangular.

Value

An upper or lower triangular matrix with positive diagonal entries such that the matrix is still a valid decomposition of the matrix the input x is a decomposition of.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
res <- MoE_clust(ais[,3:7], G=3, modelNames="EEE")
sig <- res$parameters$variance
a   <- force_posiDiag(sig$cholSigma)
b   <- chol(sig$Sigma)
all.equal(a, b)                    #TRUE
all.equal(crossprod(a), sig$Sigma) #TRUE
all.equal(crossprod(b), sig$Sigma) #TRUE
data(ais)
res <- MoE_clust(ais[,3:7], G=3, modelNames="EEE")
sig <- res$parameters$variance
a   <- force_posiDiag(sig$cholSigma)
b   <- chol(sig$Sigma)
all.equal(a, b)                    #TRUE
all.equal(crossprod(a), sig$Sigma) #TRUE
all.equal(crossprod(b), sig$Sigma) #TRUE

Average posterior probabilities of a fitted MoEClust model

Description

Calculates the per-component average posterior probabilities of a fitted MoEClust model.

Usage

MoE_AvePP(x,
          group = TRUE)
MoE_AvePP(x,
          group = TRUE)

Arguments

`x`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with gating and/or expert covariates and/or a noise component are facilitated here too.
`group`	A logical indicating whether the average posterior probabilities should be computed per component. Defaults to `TRUE`.

Details

When group=TRUE, this function calculates AvePP, the average posterior probabilities of membership for each component for the observations assigned to that component via MAP probabilities. Otherwise, an overall measure of clustering certainty is returned.

Value

When group=TRUE, a named vector of numbers, of length equal to the number of components (G), in the range [1/G,1], such that larger values indicate clearer separation of the clusters. Note that G=x$G for models without a noise component and G=x$G + 1 for models with a noise component. When group=FALSE, a single number in the same range is returned.

Note

This function will always return values of 1 for all components for models fitted using the "CEM" algorithm (see MoE_control), or models with only one component.

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

data(ais)
res <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex, 
                 modelNames="EEE", network.data=ais)

# Calculate the AvePP per component
MoE_AvePP(res)

# Calculate an overall measure of clustering certainty
MoE_AvePP(res, group=FALSE)
data(ais)
res <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex, 
                 modelNames="EEE", network.data=ais)

# Calculate the AvePP per component
MoE_AvePP(res)

# Calculate an overall measure of clustering certainty
MoE_AvePP(res, group=FALSE)

MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Description

Fits MoEClust models: Gaussian Mixture of Experts models with GPCM/mclust-family covariance structures. In other words, performs model-based clustering via the EM/CEM algorithm where covariates are allowed to enter neither, either, or both the mixing proportions (gating network) and/or component densities (expert network) of a Gaussian Parsimonious Clustering Model, with or without an additional noise component. Additional arguments are available via the function MoE_control, including the specification of a noise component, controls on the initialisation of the algorithm, and more.

Usage

MoE_clust(data,
          G = 1:9,
          modelNames = NULL,
          gating = ~1,
          expert = ~1,
          control = MoE_control(...),
          network.data = NULL,
          ...)

## S3 method for class 'MoEClust'
print(x,
      digits = 3L,
      ...)

## S3 method for class 'MoEClust'
summary(object,
        classification = TRUE,
        parameters = FALSE,
        networks = FALSE,
        ...)
MoE_clust(data,
          G = 1:9,
          modelNames = NULL,
          gating = ~1,
          expert = ~1,
          control = MoE_control(...),
          network.data = NULL,
          ...)

## S3 method for class 'MoEClust'
print(x,
      digits = 3L,
      ...)

## S3 method for class 'MoEClust'
summary(object,
        classification = TRUE,
        parameters = FALSE,
        networks = FALSE,
        ...)

Arguments

data

A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.

G

An integer vector specifying the number(s) of mixture components (clusters) to fit. Defaults to G=1:9. Must be a strictly positive integer, unless a noise component is included in the estimation, in which case G=0 is allowed and also included by default. (see MoE_control).

modelNames

A vector of character strings indicating the models to be fitted in the EM/CEM phase of clustering. With n observations and d variables, the defaults are:

for univariate data	`c("E", "V")`
for multivariate data $n > d$	`mclust.options("emModelNames")`
for high-dimensional multivariate data $n \leq d$	`c("EII", "VII", "EEI", "EVI", "VEI", "VVI")`

For single-component models these options reduce to:

for univariate data	`"E"`
for multivariate data $n > d$	`c("EII", "EEI", "EEE")`
for high-dimensional multivariate data $n \leq d$	`c("EII", "EEI")`

For zero-component models with a noise component only the "E" and "EII" models will be fitted for univariate and multivariate data, respectively, although this is clearly for naming consistency only. The help file for mclustModelNames further describes the available models (though the "X" in the single-component models will be coerced to "E" if supplied that way). For single-component models, other model names equivalent to those above can be supplied, but will be coerced to those above.

gating

A formula for determining the model matrix for the multinomial logistic regression in the gating network when fixed covariates enter the mixing proportions. Defaults to ~1, i.e. no covariates. This will be ignored where G=1. Continuous, categorical, and/or ordinal covariates are allowed. Logical covariates will be coerced to factors. Interactions, transformations, and higher order terms are permitted: the latter must be specified explicitly using the AsIs operator (I). The specification of the LHS of the formula is ignored. Intercept terms are included by default.

expert

A formula for determining the model matrix for the (multivariate) WLS in the expert network when fixed covariates are included in the component densities. Defaults to ~1, i.e. no covariates. Continuous, categorical, and/or ordinal covariates are allowed. Logical covariates will be coerced to factors. Interactions, transformations, and higher order terms are permitted: the latter must be specified explicitly using the AsIs operator (I). The specification of the LHS of the formula is ignored. Intercept terms are included by default.

control

A list of control parameters for the EM/CEM and other aspects of the algorithm. The defaults are set by a call to MoE_control. In particular, arguments pertaining to the inclusion of an additional noise component are documented here.

network.data

An optional data frame (or a matrix with named columns) in which to look for the covariates in the gating &/or expert network formulas, if any. If not found in network.data, any supplied gating &/or expert covariates are taken from the environment from which MoE_clust is called. Try to ensure the names of variables in network.data do not match any of those in data.

...

An alternative means of passing control parameters directly via the named arguments of MoE_control. Do not pass the output from a call to MoE_control here! This argument is only relevant for the MoE_clust function and will be ignored for the associated print and summary functions.

x, object, digits, classification, parameters, networks

Arguments required for the print and summary functions: x and object are objects of class "MoEClust" resulting from a call to MoE_clust, while digits gives the number of decimal places to round to for printing purposes (defaults to 3). classification, parameters, and networks are logicals which govern whether a table of the MAP classification of observations, the mixture component parameters, and the gating/expert network coefficients are printed, respectively.

Details

The function effectively allows 6 different types of Gaussian Mixture of Experts model (as well as the different models in the GPCM/mclust family, for each): i) the standard finite Gaussian mixture with no covariates, ii) fixed covariates only in the gating network, iii) fixed covariates only in the expert network, iv) the full Mixture of Experts model with fixed covariates entering both the mixing proportions and component densities. By constraining the mixing proportions to be equal (see equalPro in MoE_control) two extra special cases are facilitated when gating covariates are excluded.

Note that having the same covariates in both networks is allowed. So too are interactions, transformations, and higher order terms (see formula): the latter must be specified explicitly using the AsIs operator (I). Covariates can be continuous, categorical, logical, or ordinal, but the response must always be continuous.

While model selection in terms of choosing the optimal number of components and the GPCM/mclust model type is performed within MoE_clust, using one of the criterion options within MoE_control, choosing between multiple fits with different combinations of covariates or different initialisation settings can be done by supplying objects of class "MoEClust" to MoE_compare.

Value

A list (of class "MoEClust") with the following named entries, mostly corresponding to the chosen optimal model (as determined by the criterion within MoE_control):

`call`	The matched call.
`data`	The input data, as a `data.frame`.
`modelName`	A character string denoting the GPCM/mclust model type at which the optimal `criterion` occurs.
`n`	The number of observations in the `data`.
`d`	The dimension of the `data`.
`G`	The optimal number of mixture components according to `criterion`.
`BIC`	A matrix of all BIC values with `length{G}` rows and `length(modelNames)` columns. May include missing entries: `NA` represents models which were not visited, `-Inf` represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes `"MoECriterion"` and `"mclustBIC"`, for which dedicated `print`, `summary`, and `plot` methods exist.
`ICL`	A matrix of all ICL values with `length{G}` rows and `length(modelNames)` columns. May include missing entries: `NA` represents models which were not visited, `-Inf` represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes `"MoECriterion"` and `"mclustICL"`, for which dedicated `print`, `summary`, and `plot` methods exist.
`AIC`	A matrix of all AIC values with `length{G}` rows and `length(modelNames)` columns. May include missing entries: `NA` represents models which were not visited, `-Inf` represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes `"MoECriterion"` and `"mclustAIC"`, for which dedicated `print`, `summary`, and `plot` methods exist.
`bic`	The BIC value corresponding to the optimal model. May not necessarily be the optimal BIC.
`icl`	The ICL value corresponding to the optimal model. May not necessarily be the optimal ICL.
`aic`	The AIC value corresponding to the optimal model. May not necessarily be the optimal AIC.
`gating`	An object of class `"MoE_gating"` (for which dedicated `print`, `summary`, and `predict` methods exist) and either `"multinom"` or `"glm"` (only for single-component models or noise-only models) giving the `multinom` regression coefficients of the `gating` network. If `gating` covariates were NOT supplied (or the best model has just one component), this corresponds to a RHS of `~1`, otherwise the supplied `gating` formula. As such, a fitted `gating` network is always returned even in the absence of supplied covariates or clusters. The number of parameters to penalise by for `MoE_crit` is given by `length(coef(gating))`, and the `gating` formula used is stored here as an attribute. If there is a noise component (and the option `noise.gate=TRUE` is invoked), its coefficients are those for the last component. Users are cautioned against making inferences about statistical significance from summaries of the coefficients in the gating network.
`expert`	An object of class `"MoE_expert"` (for which dedicated `print`, `summary`, and `predict` methods exist) and `"lm"` giving the (multivariate) WLS regression coefficients of the `expert` network. If `expert` covariates were NOT supplied, this corresponds to a RHS of `~1`, otherwise the supplied `expert` formula. As such, a fitted `expert` network is always returned even in the absence of supplied covariates. The number of parameters to penalise by for `MoE_crit` is given by `G * length(coef(expert[[1]]))`, and the `expert` formula used is stored here is an attribute. Users are cautioned against making inferences about statistical significance from summaries of the coefficients in the expert network.
`LOGLIK`	A matrix of all maximal log-likelihood values with `length{G}` rows and `length(modelNames)` columns. May include missing entries: `NA` represents models which were not visited, `-Inf` represents models which were terminated due to error, for which a log-likelihood could not be estimated. Inherits the classes `"MoECriterion"` and `"mclustLOGLIK"`, for which dedicated `print`, `summary`, and `plot` methods exist.
`loglik`	The vector of increasing log-likelihood values for every EM/CEM iteration under the optimal model. The last element of this vector is the maximum log-likelihood achieved by the parameters returned at convergence.
`linf`	An asymptotic estimate of the final converged maximised log-likelihood. Returned when `stopping="aitken"` and `G > 1` (see `MoE_control` and `aitken`), otherwise the last element of `loglik` is returned instead.
`df`	The number of estimated parameters in the optimal model (i.e. the number of ‘used’ degrees of freedom). Subtract this number from `n` to get the degrees of freedom. The number of parameters due to the gating network, expert network, and covariance matrices are also stored here as attributes of `df`.
`iters`	The total number of EM/CEM iterations for the optimal model.
`hypvol`	The hypervolume parameter for the noise component if required, otherwise set to `NA` (see `MoE_control`).
`parameters`	A list with the following named components: `pro` The mixing proportions: either a vector of length `G` or, if `gating` covariates were supplied, a matrix with an entry for each observation (rows) and component (columns). `mean` The means of each component. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model. For models with expert network covariates, this is given by the posterior mean of the fitted values, otherwise the posterior mean of the response is reported. For models with expert network covariates, the observation-specific component means can be accessed by calling `predict` on the `expert` object above. `variance` A list of variance parameters of each component of the model. The components of this list depend on the model type specification. See the help file for `mclustVariance` for details. Also see `expert_covar` for an alternative approach to summarising the variance parameters in the presence of expert network covariates. `Vinv` The inverse of the hypervolume parameter for the noise component if required, otherwise set to `NULL` (see `MoE_control`).
`z`	The final responsibility matrix whose `[i,k]`-th entry is the probability that observation i belonds to the k-th component. If there is a noise component, its values are found in the last column.
`classification`	The vector of cluster labels for the chosen model corresponding to `z`, i.e. `max.col(z)`. Observations belonging to the noise component, if any, will belong to component `0`.
`uncertainty`	The uncertainty associated with the `classification`.
`net.covs`	A data frame gathering the unique set of covariates used in the `gating` and `expert` networks, if any. Will contain zero columns in the absence of gating or expert network covariates. Supplied gating covariates will be excluded if the optimal model has only one component. May have fewer columns than covariates supplied via the `network.data` argument also, as only the included covariates are gathered here.
`resid.data`	In the presence of expert network covariates, this is the augmented data actually used in the clustering at convergence, as a list of `G` matrices of WLS residuals of dimension `n * d`. Will contain zero columns in the absence of expert network covariates.
`DF`	A matrix giving the numbers of estimated parameters (i.e. the number of ‘used’ degrees of freedom) for all visited models, with `length{G}` rows and `length(modelNames)` columns. Subtract these numbers from `n` to get the degrees of freedom. May include missing entries: `NA` represents models which were not visited, `-Inf` represents models which were terminated due to error, for which parameters could not be estimated. Inherits the classes `"MoECriterion"` and `"mclustDF"`, for which dedicated `print`, `summary`, and `plot` methods exist.
`ITERS`	A matrix giving the total number of EM/CEM iterations for all visited models, with `length{G}` rows and `length(modelNames)` columns. May include missing entries: `NA` represents models which were not visited, `Inf` represents models which were terminated due to singularity/error and thus would never have converged. Inherits the classes `"MoECriterion"` and `"mclustITERS"`, for which dedicated `print`, `summary`, and `plot` methods exist.

Dedicated plot, predict, print, and summary functions exist for objects of class "MoEClust". The results can be coerced to the "Mclust" class to access other functions from the mclust package via as.Mclust.

Note

Where BIC, ICL, AIC, LOGLIK, DF, and ITERS contain NA entries, this corresponds to a model which was not run; for instance a VVV model is never run for single-component models as it is equivalent to EEE. As such, one can consider the value as not really missing, but equivalent to the EEE value. BIC, ICL, AIC, LOGLIK, DF, and ITERS all inherit the classes "MoECriterion" and "mclustBIC", "mclustICL", etc., for which dedicated print, summary, and plot methods exist.

Author(s)

Keefe Murphy - <[email protected]>

References

Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458): 611-631.

Examples

data(ais)
hema  <- ais[,3:7]
sex   <- ais$sex
BMI   <- ais$BMI

# Fit a standard finite mixture model
m1    <- MoE_clust(hema, G=2:3)

# Allow covariates to enter the mixing proportions
m2    <- MoE_clust(hema, G=2:3, gating= ~ sex + BMI)

# Allow covariates to enter the component densities
m3    <- MoE_clust(hema, G=2:3, expert= ~ sex)

# Allow covariates to enter both the gating & expert network
m4    <- MoE_clust(hema, G=2:3, gating= ~ BMI, expert= ~ sex)

# Fit an equal mixing proportion model with an expert network covariate
m5    <- MoE_clust(hema, G=2:3, expert= ~ sex + BMI, equalPro=TRUE)

# Fit models with gating covariates & an additional noise component
m6    <- MoE_clust(hema, G=2:3, tau0=0.1, gating= ~ BMI, network.data=ais)

# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
 
# See if a better model can be found using greedy forward stepwise selection
(step <- MoE_stepwise(ais[,3:7], ais))
(comp <- MoE_compare(comp, step, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))

# Examine the expert network in greater detail
# (but refrain from inferring statistical significance!)
summary(best$expert)

# Visualise the results, incl. the gating network and log-likelihood
plot(best, what="gpairs", show.dens=TRUE)
plot(best, what="gating") # equal mixing proportions!
plot(best, what="loglik")

# Visualise the results using the 'lattice' package
z   <- factor(best$classification, labels=paste0("Cluster", seq_len(best$G)))
lattice::splom(~ hema | sex, groups=z)
lattice::splom(~ hema | z, groups=sex)
data(ais)
hema  <- ais[,3:7]
sex   <- ais$sex
BMI   <- ais$BMI

# Fit a standard finite mixture model
m1    <- MoE_clust(hema, G=2:3)

# Allow covariates to enter the mixing proportions
m2    <- MoE_clust(hema, G=2:3, gating= ~ sex + BMI)

# Allow covariates to enter the component densities
m3    <- MoE_clust(hema, G=2:3, expert= ~ sex)

# Allow covariates to enter both the gating & expert network
m4    <- MoE_clust(hema, G=2:3, gating= ~ BMI, expert= ~ sex)

# Fit an equal mixing proportion model with an expert network covariate
m5    <- MoE_clust(hema, G=2:3, expert= ~ sex + BMI, equalPro=TRUE)

# Fit models with gating covariates & an additional noise component
m6    <- MoE_clust(hema, G=2:3, tau0=0.1, gating= ~ BMI, network.data=ais)

# Extract the model with highest BIC
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, criterion="bic"))
 
# See if a better model can be found using greedy forward stepwise selection
(step <- MoE_stepwise(ais[,3:7], ais))
(comp <- MoE_compare(comp, step, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))

# Examine the expert network in greater detail
# (but refrain from inferring statistical significance!)
summary(best$expert)

# Visualise the results, incl. the gating network and log-likelihood
plot(best, what="gpairs", show.dens=TRUE)
plot(best, what="gating") # equal mixing proportions!
plot(best, what="loglik")

# Visualise the results using the 'lattice' package
z   <- factor(best$classification, labels=paste0("Cluster", seq_len(best$G)))
lattice::splom(~ hema | sex, groups=z)
lattice::splom(~ hema | z, groups=sex)

Choose the best MoEClust model

Description

Takes one or more sets of MoEClust models fitted by MoE_clust (or MoE_stepwise) and ranks them according to the BIC, ICL, or AIC. It's possible to respect the internal ranking within each set of models, or to discard models within each set which were already deemed sub-optimal. This function can help with model selection via exhaustive or stepwise searches.

Usage

MoE_compare(...,
            criterion = c("bic", "icl", "aic"),
            pick = 10L,
            optimal.only = FALSE)

## S3 method for class 'MoECompare'
print(x,
      index = seq_len(x$pick),
      posidens = TRUE,
      rerank = FALSE,
      digits = 3L,
      details = TRUE, 
      maxi = length(index),
      ...)
MoE_compare(...,
            criterion = c("bic", "icl", "aic"),
            pick = 10L,
            optimal.only = FALSE)

## S3 method for class 'MoECompare'
print(x,
      index = seq_len(x$pick),
      posidens = TRUE,
      rerank = FALSE,
      digits = 3L,
      details = TRUE, 
      maxi = length(index),
      ...)

Arguments

`...`	One or more objects of class `"MoEClust"` outputted by `MoE_clust`. All models must have been fit to the same data set. A single named list of such objects can also be supplied. Additionally, objects of class `"MoECompare"` outputted by this very function can also be supplied here. This argument is only relevant for the `MoE_compare` function and will be ignored for the associated `print` function.
`criterion`	The criterion used to determine the ranking. Defaults to `"bic"`.
`pick`	The (integer) number of models to be ranked and compared. Defaults to `10L`. Will be constrained by the number of models within the `"MoEClust"` objects supplied via `...` if `optimal.only` is `FALSE`, otherwise constrained simply by the number of `"MoEClust"` objects supplied. Setting `pick=Inf` is a valid way to select all models.
`optimal.only`	Logical indicating whether to only rank models already deemed optimal within each `"MoEClust"` object (`TRUE`), or to allow models which were deemed suboptimal enter the final ranking (`FALSE`, the default). See `details`.
`x`, `index`, `posidens`, `rerank`, `digits`, `details`, `maxi`	Arguments required for the associated `print` function: `x` An object of class `"MoECompare"` resulting from a call to `MoE_compare`. `index` A logical or numeric vector giving the indices of the rows of the table of ranked models to print. This defaults to the full set of ranked models. It can be useful when the table of ranked models is large to examine a subset via this `index` argument, for display purposes. See `rerank`. `posidens` A logical indicating whether models which have been flagged for having positive log-densities should be included in the comparison (defaults to `TRUE`). Such models may correspond to spurious solutions and can be discarded by specifying `posidens=FALSE`. Only relevant if any of the `"MoEClust"` objects being compared were themselves run with `posidens=TRUE`. `rerank` A logical indicating whether the ranks should be recomputed when subsetting using `index`. Defaults to `FALSE`. Only relevant when `details=TRUE`. `digits` The number of decimal places to round model selection criteria to (defaults to `3`). `details` Logical indicating whether some additional details should be printed, defaults to `TRUE`. Exists to facilitate `MoE_stepwise` printing. `maxi` A number specifying the maximum number of rows/models to print. Defaults to `length(index)`.

Details

The purpose of this function is to conduct model selection on "MoEClust" objects, fit to the same data set, with different combinations of gating/expert network covariates or different initialisation settings.

Model selection will have already been performed in terms of choosing the optimal number of components and GPCM/mclust model type within each supplied set of results, but MoE_compare will respect the internal ranking of models when producing the final ranking if optimal.only is FALSE: otherwise only those models already deemed optimal within each "MoEClust" object will be ranked.

As such if two sets of results are supplied when optimal.only is FALSE, the 1st, 2nd, and 3rd best models could all belong to the first set of results, meaning a model deemed suboptimal according to one set of covariates could be superior to one deemed optimal under another set of covariates.

Value

A list of class "MoECompare", for which a dedicated print function exists, containing the following elements (each of length pick, and ranked according to criterion, where appropriate):

`data`	The name of the data set to which the models were fitted.
`optimal`	The single optimal model (an object of class `"MoEClust"`) among those supplied, according to the chosen `criterion`.
`pick`	The final number of ranked models. May be different (i.e. less than) the supplied `pick` value.
`MoENames`	The names of the supplied `"MoEClust"` objects.
`modelNames`	The `mclustModelNames`.
`G`	The optimal numbers of components.
`df`	The numbers of estimated parameters.
`iters`	The numbers of EM/CEM iterations.
`bic`	BIC values, ranked according to `criterion` (not necessarily `"bic"`).
`icl`	ICL values, ranked according to `criterion` (not necessarily `"icl"`).
`aic`	AIC values, ranked according to `criterion` (not necessarily `"aic"`).
`loglik`	Maximal log-likelihood values.
`gating`	The gating formulas.
`expert`	The expert formulas.
`algo`	The algorithm used for fitting the model - either `"EM"`, `"CEM"`, `"cemEM"`.
`equalPro`	Logical indicating whether mixing proportions were constrained to be equal across components.
`hypvol`	Hypervolume parameters for the noise component if relevant, otherwise set to `NA` (see `MoE_control`).
`noise`	The type of noise component fitted (if any). Only displayed if at least one of the compared models has a noise component.
`noise.gate`	Logical indicating whether gating covariates were allowed to influence the noise component's mixing proportion. Only printed for models with a noise component, when at least one of the compared models has gating covariates.
`equalNoise`	Logical indicating whether the mixing proportion of the noise component for `equalPro` models is also equal (`TRUE`) or estimated (`FALSE`).

Note

The criterion argument here need not comply with the criterion used for model selection within each "MoEClust" object, but be aware that a mismatch in terms of criterion may require the optimal model to be re-fit in order to be extracted, thereby slowing down MoE_compare.

If random starts had been used via init.z="random.hard" or init.z="soft.random", the optimal model may not necessarily correspond to the highest-ranking model in the presence of a criterion mismatch, due to the randomness of the initialisation.

A dedicated print function exists for objects of class "MoECompare".

plot.MoEClust and as.Mclust can both also be called on objects of class "MoECompare".

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

data(CO2data)
CO2   <- CO2data$CO2
GNP   <- CO2data$GNP

# Fit a range of models 
m1    <- MoE_clust(CO2, G=1:3)
m2    <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3    <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4    <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5    <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)
m7    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, tau0=0.1)

# Rank only the optimal models and examine the best model
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))

# Examine all models visited, including those already deemed suboptimal
# Only print models with expert covariates & more than one component
comp2 <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, pick=Inf)
print(comp2, index=comp2$expert != "None" & comp2$G > 1)

# Conduct a stepwise search on the same data
(mod1 <- MoE_stepwise(CO2, GNP))

# Conduct another stepwise search considering models with a noise component
(mod2 <- MoE_stepwise(CO2, GNP, noise=TRUE))

# Compare both sets of results to choose the optimal model
(best <- MoE_compare(mod1, mod2, optimal.only=TRUE)$optimal)
data(CO2data)
CO2   <- CO2data$CO2
GNP   <- CO2data$GNP

# Fit a range of models 
m1    <- MoE_clust(CO2, G=1:3)
m2    <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3    <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4    <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5    <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)
m7    <- MoE_clust(CO2, G=2:3, expert= ~ GNP, tau0=0.1)

# Rank only the optimal models and examine the best model
(comp <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, optimal.only=TRUE))
(best <- comp$optimal)
(summ <- summary(best, classification=TRUE, parameters=TRUE, networks=TRUE))

# Examine all models visited, including those already deemed suboptimal
# Only print models with expert covariates & more than one component
comp2 <- MoE_compare(m1, m2, m3, m4, m5, m6, m7, pick=Inf)
print(comp2, index=comp2$expert != "None" & comp2$G > 1)

# Conduct a stepwise search on the same data
(mod1 <- MoE_stepwise(CO2, GNP))

# Conduct another stepwise search considering models with a noise component
(mod2 <- MoE_stepwise(CO2, GNP, noise=TRUE))

# Compare both sets of results to choose the optimal model
(best <- MoE_compare(mod1, mod2, optimal.only=TRUE)$optimal)

Set control values for use with MoEClust

Description

Supplies a list of arguments (with defaults) for use with MoE_clust.

Usage

MoE_control(init.z = c("hc", "quantile", "kmeans", "mclust", 
                       "random.hard", "soft.random", "list"),
            noise.args = list(...),
            asMclust = FALSE,
            equalPro = FALSE,
            exp.init = list(...),
            algo = c("EM", "CEM", "cemEM"),
            criterion = c("bic", "icl", "aic"),
            stopping = c("aitken", "relative"),
            z.list = NULL, 
            nstarts = 1L,
            eps = .Machine$double.eps,
            tol = c(1e-05, sqrt(.Machine$double.eps), 1e-08),
            itmax = c(.Machine$integer.max, .Machine$integer.max, 1000L),
            hc.args = list(...),
            km.args = list(...),
            posidens = TRUE,
            init.crit = c("bic", "icl"),
            warn.it = 0L,
            MaxNWts = 1000L,
            verbose = interactive(),
            ...)
MoE_control(init.z = c("hc", "quantile", "kmeans", "mclust", 
                       "random.hard", "soft.random", "list"),
            noise.args = list(...),
            asMclust = FALSE,
            equalPro = FALSE,
            exp.init = list(...),
            algo = c("EM", "CEM", "cemEM"),
            criterion = c("bic", "icl", "aic"),
            stopping = c("aitken", "relative"),
            z.list = NULL, 
            nstarts = 1L,
            eps = .Machine$double.eps,
            tol = c(1e-05, sqrt(.Machine$double.eps), 1e-08),
            itmax = c(.Machine$integer.max, .Machine$integer.max, 1000L),
            hc.args = list(...),
            km.args = list(...),
            posidens = TRUE,
            init.crit = c("bic", "icl"),
            warn.it = 0L,
            MaxNWts = 1000L,
            verbose = interactive(),
            ...)

Arguments

`init.z`	The method used to initialise the cluster labels for the non-noise components. Defaults to `"hc"`, i.e. model-based agglomerative hierarchical clustering tree as per `hc`, for multivariate data (see `hc.args`), or `"quantile"`-based clustering as per `quant_clust` for univariate data (unless there are expert network covariates incorporated via `exp.init$joint` &/or `exp.init$clustMD`, in which case the default is again `"hc"`). The `"quantile"` option is thus only available for univariate data when expert network covariates are not incorporated via `exp.init$joint` &/or `exp.init$clustMD`, or when expert network covariates are not supplied. Other options include `"kmeans"` (see `km.args`), `"random.hard"` or `"soft.random"` initialisation (see `nstarts` below), where `init.z="soft.random"` is only available when `algo != "EM"`, a user-supplied `"list"` (see `z.list` below), and a full run of `Mclust` (itself initialised via a model-based agglomerative hierarchical clustering tree, again see `hc.args`), although this last option `"mclust"` will be coerced to `"hc"` if there are no `gating` &/or `expert` covariates within `MoE_clust` (in order to better reproduce `Mclust` output). When `init.z="list"`, `exp.init$clustMD` is forced to `FALSE`; otherwise, when `exp.init$clustMD=TRUE` and the `clustMD` package is loaded, the `init.z` argument instead governs the method by which a call to `clustMD` is initialised. In this instance, `"quantile"` will instead default to `"hc"`, and the arguments to `hc.args` and `km.args` will be ignored (unless all `clustMD` model types fail for a given number of components). When `init.z="mclust"` or `clustMD` is successfully invoked (via `exp.init$clustMD`), the argument `init.crit` (see below) specifies the model-selection criterion (`"bic"` or `"icl"`) by which the optimal `Mclust` or `clustMD` model type to initialise with is determined, and `criterion` remains unaffected. Finally, when the model includes expert network covariates and `exp.init$mahalanobis=TRUE`, the argument `exp.init$estart` (see below) can be used to modify the behaviour of `init.z="random.hard"` or `init.z="soft.random"` when `nstarts > 1`, toggling between a full run of the EM algorithm for each random initialisation (i.e. `exp.init$estart=FALSE`, the default), or a single run of the EM algorithm starting from the best initial partition obtained among the random starts according to the iterative reallocation initialisation routine (i.e. `exp.init$estart=TRUE`).
`noise.args`	A list supplying select named parameters to control inclusion of a noise component in the estimation of the mixture. If either or both of the arguments `tau0` &/or `noise.init` are supplied, a noise component is added to the the model in the estimation. `tau0` Prior mixing proportion for the noise component. If supplied, a noise component will be added to the model in the estimation, with `tau0` giving the prior probability of belonging to the noise component. Typically supplied as a scalar in the interval (0, 1), e.g. `0.1`, such that the same prior probability of belonging to the noise component applies to all observations. However, `tau0` can also be supplied as a vector (with length equal to the number of observations), which may be particularly useful when gating covariates are present and `noise.args$noise.gate` is `TRUE`. Finally, note that this argument can be supplied instead of or in conjunction with the argument `noise.init` below. `noise.init` A logical or numeric vector indicating an initial guess as to which observations are noise in the data. If numeric, the entries should correspond to row indices of the data. If supplied, a noise component will be added to the model in the estimation. This argument can be used in conjunction with `tau0` above, or can be replaced by that argument also. `noise.gate` A logical indicating whether gating network covariates influence the mixing proportion for the noise component, if any. Defaults to `TRUE`, but leads to greater parsimony if `FALSE`. Only relevant in the presence of a noise component; only effects estimation in the presence of gating covariates. `noise.meth` The method used to estimate the volume when a noise component is invoked. Defaults to `hypvol`. For univariate data, this argument is ignored and the range of the data is used instead (unless `noise.vol` below is specified). The options `"convexhull"` and `"ellipsoidhull"` require loading the geometry and cluster packages, respectively. This argument is only relevant if `noise.vol` below is not supplied. `noise.vol` This argument can be used to override the argument `noise.meth` by specifying the (hyper)volume directly, i.e. specifying an improper uniform density. This will override the use of the range of the response data for univariate data if supplied. Note that the (hyper)volume, rather than its inverse, is supplied here. This can affect prediction and the location of the MVN ellipses for `MoE_gpairs` plots (see `noise_vol`). `equalNoise` Logical which is only invoked when `equalPro=TRUE` and gating covariates are not supplied. Under the default setting (`FALSE`), the mixing proportion for the noise component is estimated, and remaining mixing proportions are equal; when `TRUE` all components, including the noise component, have equal mixing proportions. `discard.noise` A logical governing how the means are summarised in `parameters$mean` and by extension the location of the MVN ellipses in `MoE_gpairs` plots for models with both expert network covariates and a noise component (otherwise this argument is irrelevant). The means for models with expert network covariates are summarised by the posterior mean of the fitted values. By default (`FALSE`), the mean of the noise component is accounted for in the posterior mean. Otherwise, or when the mean of the noise component is unavailable (due to having been manually supplied via `noise.args$noise.vol`), the `z` matrix is renormalised after discarding the column corresponding to the noise component prior to computation of the posterior mean. The renormalisation approach can be forced by specifying `noise.args$discard.noise=TRUE`, even when the mean of the noise component is available. For models with a noise component fitted with `algo="CEM"`, a small extra E-step is conducted for observations assigned to the non-noise components in this case. In particular, the argument `noise.meth` will be ignored for high-dimensional `n <= d` data, in which case the argument `noise.vol` must be specified. Note that this forces `noise.args$discard.noise` to `TRUE`. See `noise_vol` for more details. The arguments `tau0` and `noise.init` can be used separately, to provide alternative means to invoke a noise component. However, they can also be supplied together, in which case observations corresponding to `noise.init` have probability `tau0` (rather than 1) of belonging to the noise component. This strategy also works when `tau0` is supplied as a vector.
`asMclust`	The default values of `stopping` and `hc.args$hcUse` (see below) are such that results for models with no covariates in either network are liable to differ from results for equivalent models obtained via `Mclust`. MoEClust uses `stopping="aitken"` and `hcUse="VARS"` by default, while mclust always implicitly uses `stopping="relative"` and defaults to `hcUse="SVD"`. `asMclust` is a logical variable (`FALSE`, by default) which functions as a simple convenience tool for overriding only these two arguments (even if explicitly supplied!) such that they behave like the function `Mclust`. Other user-specified arguments which differ from mclust are not affected by `asMclust`, as their defaults already correspond to mclust. Results may still differ slightly as MoEClust calculates log-likelihood values with greater precision and may also differ slightly in other numerical aspects which affect parameter estimation or convergence in some cases. Finally, note that `asMclust=TRUE` can be invoked even for models with covariates which are not accommodated by mclust.
`equalPro`	Logical variable indicating whether or not the mixing proportions are to be constrained to be equal in the model. Default: `equalPro = FALSE`. Only relevant when `gating` covariates are not supplied within `MoE_clust`, otherwise ignored. In the presence of a noise component (see `noise.args`), only the mixing proportions for the non-noise components are constrained to be equal (by default, see `equalNoise`), after accounting for the noise component.
`exp.init`	A list supplying select named parameters to control the initialisation routine in the presence of expert network covariates (otherwise ignored): `joint` A logical indicating whether the initial partition is obtained on the joint distribution of the responses & expert network covariates (defaults to `TRUE`) or just the responses (`FALSE`). By default, only continuous expert covariates are considered (see `exp.init$clustMD` below). Only relevant when `init.z` is neither `"random.hard"` nor `"soft.random"` (unless `exp.init$clustMD=TRUE`, in which case `init.z` specifies the initialisation routine for a call to `clustMD`, where `init.z="random.hard"` & `init.z="soft.random"` both map to the `clustMD` argument `startCL="random"`). This will render the `"quantile"` option of `init.z` for univariate data unusable if continuous expert covariates are supplied &/or categorical/ordinal expert covariates are supplied when `exp.init$clustMD=TRUE` and the `clustMD` package is loaded. `mahalanobis` A logical indicating whether to iteratively reallocate observations during the initialisation phase to the component corresponding to the expert network regression to which it's closest to the fitted values of in terms of Mahalanobis distance (defaults to `TRUE`). This will ensure that each component can be well modelled by a single expert prior to running the EM/CEM algorithm. `estart` A logical governing the behaviour of `init.z="random.hard"` or `init.z="soft.random"` when `nstarts > 1` in the presence of expert network covariates. Only relevant when `exp.init$mahalanobis=TRUE`. Defaults to `FALSE`; i.e. all random starts are put through full runs of the EM algorithm. When `TRUE`, all random starts are put through the initial iterative reallocation routine prior to a full run of EM for only the single best random initial partition obtained. See the last set of Examples below. `clustMD` A logical indicating whether categorical/ordinal covariates should be incorporated when using the joint distribution of the response and expert network covariates for initialisation (defaults to `FALSE`). Only relevant when `exp.init$joint=TRUE`. Requires the use of the `clustMD` package. Note that initialising in this manner involves fitting all `clustMD` model types in parallel for all numbers of components considered, and may fail (especially) in the presence of nominal expert network covariates. Unless `init.z="list"`, supplying this argument as `TRUE` when the `clustMD` package is loaded has the effect of superseding the `init.z` argument: this argument now governs instead how the call to `clustMD` is initialised (unless all `clustMD` model types fail for a given number of components, in which case `init.z` is invoked instead to initialise for `G` values for which all `clustMD` model types failed). Similarly, the arguments `hc.args` and `km.args` will be ignored (again, unless all `clustMD` model types fail for a given number of components). `max.init` The maximum number of iterations for the Mahalanobis distance-based reallocation procedure when `exp.init$mahalanobis` is `TRUE`. Defaults to `.Machine$integer.max`. `identity` A logical indicating whether the identity matrix (corresponding to the use of the Euclidean distance) is used in place of the covariance matrix of the residuals (corresponding to the use of the Mahalanobis distance). Defaults to `FALSE` for multivariate response data but defaults to `TRUE` for univariate response data. Setting `identity=TRUE` with multivariate data may be advisable when the dimensions of the data are such that the covariance matrix cannot be inverted (otherwise, the pseudo-inverse is used under the `FALSE` default). `drop.break` When `exp.init$mahalanobis=TRUE` observations will be completely in or out of a component during the initialisation phase. As such, it may occur that constant columns will be present when building a given component's expert regression (particularly for categorical covariates). It may also occur, due to this partitioning, that "unseen" data, when calculating the residuals, will have new factor levels. When `exp.init$drop.break=TRUE`, the Mahalanobis distance based initialisation phase will explicitly fail in either of these scenarios. Otherwise, `drop_constants` and `drop_levels` will be invoked when `exp.init$drop.break` is `FALSE` (the default) to try to remedy the situation. In any case, only a warning that the initialisation step failed will be printed, regardless of the value of `exp.init$drop.break`.
`algo`	Switch controlling whether models are fit using the `"EM"` (the default) or `"CEM"` algorithm. The option `"cemEM"` allows running the EM algorithm starting from convergence of the CEM algorithm.
`criterion`	When either `G` or `modelNames` is a vector, `criterion` determines whether the `"bic"` (Bayesian Information Criterion), `"icl"` (Integrated Complete Likelihood), `"aic"` (Akaike Information Criterion) is used to determine the ‘best’ model when gathering output. Note that all criteria will be returned in any case.
`stopping`	The criterion used to assess convergence of the EM/CEM algorithm. The default (`"aitken"`) uses Aitken's acceleration method via `aitken`, otherwise the `"relative"` change in log-likelihood is monitored (which may be less strict). The `"relative"` option corresponds to the stopping criterion used by `Mclust`: see `asMclust` above. Both stopping rules are ultimately governed by `tol[1]`. When the `"aitken"` method is employed, the asymptotic estimate of the final converged maximised log-likelihood is also returned as `linf` for models with 2 or more components, though the largest element of the returned vector `loglik` still gives the log-likelihood value achieved by the parameters returned at convergence, under both `stopping` methods (see `MoE_clust`).
`z.list`	A user supplied list of initial cluster allocation matrices, with number of rows given by the number of observations, and numbers of columns given by the range of component numbers being considered. In particular, `z.list` must only include columns corresponding to non-noise components when using this method to initialise in the presence of a noise component. Only relevant if `init.z="z.list"`. These matrices are allowed correspond to both soft or hard clusterings, and will be internally normalised so that the rows sum to 1 (or coerced always to a 'hard' matrix if `algo != "EM"`). See `noise.init` and `tau0` above for details on initialisation in the presence of a noise component.
`nstarts`	The number of random initialisations to use when `init.z="random.hard"` or `init.z="soft.random"`. Defaults to `1`. When there are no expert covariates (or when `exp.init$mahalanobis=FALSE` or `exp.init$estart=FALSE`), the results will be based on the random start yielding the highest estimated log-likelihood after each initial partition is subjected to a full run of the EM algorithm. Note, in this case, that all `nstarts` random initialisations are affected by `exp.init$mahalanobis`, if invoked in the presence of expert network covariates, which may remove some of the randomness. Conversely, if `exp.init$mahalanobis=TRUE` and `exp.init$estart=TRUE`, all `nstarts` random starts are put through the initial iterative reallocation routine and only the single best initial partition uncovered is put through the full run of the EM algorithm. See `init.z` and `exp.init$estart` above for more details, though note that `exp.init$mahalanobis=TRUE` and `exp.init$estart=FALSE`, by default.
`eps`	A scalar tolerance associated with deciding when to terminate computations due to computational singularity in covariances. Smaller values of `eps` allow computations to proceed nearer to singularity. The default is the relative machine precision `.Machine$double.eps`, which is approximately 2e-16 on IEEE-compliant machines.
`tol`	A vector of length three giving relative convergence tolerances for 1) the log-likelihood of the EM/CEM algorithm, 2) parameter convergence in the inner loop for models with iterative M-step (`"VEI", "VEE", "EVE", "VVE", "VEV"`), and 3) optimisation in the multinomial logistic regression in the gating network, respectively. The default is `c(1e-05, sqrt(.Machine$double.eps), 1e-08)`. If only one number is supplied, it is used as the tolerance for all three cases given.
`itmax`	A vector of length three giving integer limits on the number of iterations for 1) the EM/CEM algorithm, 2) the inner loop for models with iterative M-step (`"VEI", "VEE", "EVE", "VVE", "VEV"`), and 3) the multinomial logistic regression in the gating network, respectively. The default is `c(.Machine$integer.max, .Machine$integer.max, 1000L)`, allowing termination to be completely governed by `tol[1]` & `tol[2]` for the inner and outer loops of the EM/CEM algorithm. If only one number is supplied, it is used as the iteration limit for the outer loop only and the other elements of `itmax` retain their usual defaults. If, for any model with gating covariates, the multinomial logistic regression in the gating network fails to converge in `itmax[3]` iterations at any stage of the EM/CEM algorithm, an appropriate warning will be printed, prompting the user to modify this argument.
`hc.args`	A list supplying select named parameters to control the initialisation of the cluster allocations when `init.z="hc"` (or when `init.z="mclust"`, which itself relies on `hc`), unless `exp.init$clustMD=TRUE`, the `clustMD` package is loaded, and none of the `clustMD` model types fail (otherwise irrelevant): `hcUse` A string specifying the type of input variables to be used. This defaults to `"VARS"` here, unlike mclust which defaults to `"SVD"`. Other allowable values are documented in `mclust.options`. See `asMclust` above. `hc.meth` A character string indicating the model to be used when hierarchical clustering (see `hc`) is employed for initialisation (either when `init.z="hc"` or `init.z="mclust"`). Defaults to `"EII"` for high-dimensional data, or `"VVV"` otherwise.
`km.args`	A list supplying select named parameters to control the initialisation of the cluster allocations when `init.z="kmeans"`, unless `exp.init$clustMD=TRUE`, the `clustMD` package is loaded, and none of the `clustMD` model types fail (otherwise irrelevant): `kstarts` The number of random initialisations to use. Defaults to `10`. `kiters` The maximum number of K-Means iterations allowed. Defaults to `10`.
`posidens`	A logical governing whether to continue running the algorithm even in the presence of positive log-densities. Defaults to `TRUE`, but setting `posidens=FALSE` can help to safeguard against spurious solutions, which will be instantly terminated if positive log-densities are encountered. Note that versions of MoEClust prior to and including version `1.3.1` always implicitly assumed `posidens=FALSE`.
`init.crit`	The criterion to be used to determine the optimal model type to initialise with, when `init.z="mclust"` or when `exp.init$clustMD=TRUE` and the `clustMD` package is loaded (one of `"bic"` or `"icl"`). Defaults to `"icl"` when `criterion="icl"`, otherwise defaults to `"bic"`. The `criterion` argument remains unaffected.
`warn.it`	A single number giving the iteration count at which a warning will be printed if the EM/CEM algorithm has failed to converge. Defaults to `0`, i.e. no warning (which is true for any `warn.it` value less than `3`), otherwise the message is printed regardless of the value of `verbose`. If non-zero, `warn.it` should be moderately large, but obviously less than `itmax[1]`. A warning will always be printed if one of more models fail to converge in `itmax[1]` iterations.
`MaxNWts`	The maximum allowable number of weights in the call to `multinom` for the multinomial logistic regression in the gating network. There is no intrinsic limit in the code, but increasing `MaxNWts` will probably allow fits that are very slow and time-consuming. It may be necessary to increase `MaxNWts` when categorical concomitant variables with many levels are included or the number of components is high.
`verbose`	Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is `TRUE` if the session is interactive, and `FALSE` otherwise. If `FALSE`, warnings and error messages will still be printed to the screen, but everything else will be suppressed.
`...`	Catches unused arguments.

Details

MoE_control is provided for assigning values and defaults within MoE_clust and MoE_stepwise.

While the criterion argument controls the choice of the optimal number of components and GPCM/mclust model type, MoE_compare is provided for choosing between fits with different combinations of covariates or different initialisation settings.

Value

A named list in which the names are the names of the arguments and the values are the values supplied to the arguments.

Note

Note that successfully invoking exp.init$clustMD (though it defaults to FALSE) affects the role of the arguments init.z, hc.args, and km.args. Please read the documentation above carefully in this instance.

The initial allocation matrices before and after the invocation of the exp.init related arguments are both stored as attributes in the object returned by MoE_clust (named "Z.init" and "Exp.init", respectively). If init.z="random.hard" or init.z="soft.random" and nstarts > 1, the allocations corresponding to the best random start are stored (regardless of whether exp.init$estart is invoked or not). This can be useful for supplying z.list for future fits.

Author(s)

Keefe Murphy - <[email protected]>

Examples

ctrl1 <- MoE_control(criterion="icl", itmax=100, warn.it=15, init.z="random.hard", nstarts=5)

data(CO2data)
GNP   <- CO2data$GNP
res   <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, control=ctrl1)

# Alternatively, specify control arguments directly
res2  <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, stopping="relative")

# Supplying ctrl1 without naming it as 'control' can throw an error
## Not run: 
res3  <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, ctrl1)
## End(Not run)

# Similarly, supplying control arguments via a mix of the ... construct
# and the named argument 'control' also throws an error
## Not run: 
res4  <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, control=ctrl1, init.z="kmeans")
## End(Not run)

# Initialise via the mixed-type joint distribution of response & covariates
# Let the ICL criterion determine the optimal clustMD model type
# Constrain the mixing proportions to be equal
ctrl2 <- MoE_control(exp.init=list(clustMD=TRUE), init.crit="icl", equalPro=TRUE)
data(ais)
library(clustMD)
res4  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, control=ctrl2)

# Include a noise component by specifying its prior mixing proportion
res5  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, tau0=0.1)
                   
# Include a noise component via an initial guess of which observations are noise
mdist <- mahalanobis(ais[,3:7], colMeans(ais[,3:7]), cov(ais[,3:7]))
cutp  <- qchisq(p=0.95, df=ncol(ais[,3:7]))
res6  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, noise.init=mdist > cutp)
                   
# Include a noise component by specifying tau0 as a vector
res7  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, tau0=pchisq(mdist, df=ncol(ais[,3:7])))                                    
                   
# Investigate the use of random starts
sex  <- ais$sex
# resA uses deterministic starting values (by default) for each G value
 system.time(resA <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE))
# resB passes each random start through the entire EM algorithm for each G value
 system.time(resB <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE,
                               init.z="random.hard", nstarts=10))
# resC passes only the "best" random start through the EM algorithm for each G value
# this time, we also use soft rather than hard random starts
 system.time(resC <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE,
                               init.z="soft.random", nstarts=10, estart=TRUE))
# Here, all three settings (given here in order of speed) identify & converge to the same model
 MoE_compare(resA, resC, resB)
ctrl1 <- MoE_control(criterion="icl", itmax=100, warn.it=15, init.z="random.hard", nstarts=5)

data(CO2data)
GNP   <- CO2data$GNP
res   <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, control=ctrl1)

# Alternatively, specify control arguments directly
res2  <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, stopping="relative")

# Supplying ctrl1 without naming it as 'control' can throw an error
## Not run: 
res3  <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, ctrl1)
## End(Not run)

# Similarly, supplying control arguments via a mix of the ... construct
# and the named argument 'control' also throws an error
## Not run: 
res4  <- MoE_clust(CO2data$CO2, G=2, expert = ~ GNP, control=ctrl1, init.z="kmeans")
## End(Not run)

# Initialise via the mixed-type joint distribution of response & covariates
# Let the ICL criterion determine the optimal clustMD model type
# Constrain the mixing proportions to be equal
ctrl2 <- MoE_control(exp.init=list(clustMD=TRUE), init.crit="icl", equalPro=TRUE)
data(ais)
library(clustMD)
res4  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, control=ctrl2)

# Include a noise component by specifying its prior mixing proportion
res5  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, tau0=0.1)
                   
# Include a noise component via an initial guess of which observations are noise
mdist <- mahalanobis(ais[,3:7], colMeans(ais[,3:7]), cov(ais[,3:7]))
cutp  <- qchisq(p=0.95, df=ncol(ais[,3:7]))
res6  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, noise.init=mdist > cutp)
                   
# Include a noise component by specifying tau0 as a vector
res7  <- MoE_clust(ais[,3:7], G=2, modelNames="EVE", expert= ~ sex,
                   network.data=ais, tau0=pchisq(mdist, df=ncol(ais[,3:7])))                                    
                   
# Investigate the use of random starts
sex  <- ais$sex
# resA uses deterministic starting values (by default) for each G value
 system.time(resA <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE))
# resB passes each random start through the entire EM algorithm for each G value
 system.time(resB <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE,
                               init.z="random.hard", nstarts=10))
# resC passes only the "best" random start through the EM algorithm for each G value
# this time, we also use soft rather than hard random starts
 system.time(resC <- MoE_clust(ais[,3:7], G=2, expert=~sex, equalPro=TRUE,
                               init.z="soft.random", nstarts=10, estart=TRUE))
# Here, all three settings (given here in order of speed) identify & converge to the same model
 MoE_compare(resA, resC, resB)

MoEClust BIC, ICL, and AIC Model-Selection Criteria

Description

Computes the BIC (Bayesian Information Criterion), ICL (Integrated Complete Likelihood), and AIC (Akaike Information Criterion) for parsimonious mixture of experts models given the log-likelihood, the dimension of the data, the number of mixture components in the model, the numbers of parameters in the gating and expert networks respectively, and, for the ICL, the numbers of observations in each component.

Usage

MoE_crit(modelName,
         loglik,
         n,
         d,
         G,
         gating.pen = G - 1L,
         expert.pen = G * d,
         z = NULL,
         df = NULL)
MoE_crit(modelName,
         loglik,
         n,
         d,
         G,
         gating.pen = G - 1L,
         expert.pen = G * d,
         z = NULL,
         df = NULL)

Arguments

`modelName`	A character string indicating the model. The help file for `mclustModelNames` describes the available models. Not necessary if `df` is supplied.
`loglik`	The log-likelihood for a data set with respect to the Gaussian mixture model specified in the `modelName` argument.
`n`, `d`, `G`	The number of observations in the data, dimension of the data, and number of components in the Gaussian mixture model, respectively, used to compute `loglik`. `d` & `G` are not necessary if `df` is supplied.
`gating.pen`	The number of parameters of the gating network of the MoEClust model. Defaults to `G - 1`, which corresponds to no gating covariates. If covariates are included, this should be the number of regression coefficients in the fitted `gating` object. If there are no covariates and mixing proportions are further assumed to be present in equal proportion, `gating.pen` should be `0`. The number of parameters used in the estimation of the noise component, if any, should also be included. Not necessary if `df` is supplied.
`expert.pen`	The number of parameters of the expert network of the MoEClust model. Defaults to `G * d`, which corresponds to no expert covariates. If covariates are included, this should be the number of regression coefficients in the fitted `expert` object. Not necessary if `df` is supplied.
`z`	The `n` times `G` responsibility matrix whose `[i,k]`-th entry is the probability that observation i belongs to the k-th component.. If supplied the ICL is also computed and returned, otherwise only the BIC and AIC.
`df`	An alternative way to specify the number of estimated parameters (or ‘used’ degrees of freedom) exactly. If supplied, the arguments `modelName`, `d`, `G`, `gating.pen`, and `expert.pen`, which are used to calculate the number of parameters, will be ignored. The number of parameters used in the estimation of the noise component, if any, should also be included.

Details

The function is vectorised with respect to the arguments modelName and loglik.

If model is an object of class "MoEClust" with G components, the number of parameters for the gating.pen and expert.pen are length(coef(model$gating)) and G * length(coef(model$expert[[1]])), respectively.

Models with a noise component are facilitated here too, provided the extra number of parameters are accounted for by the user.

Value

A simplified array containing the BIC, AIC, number of estimated parameters (df) and, if z is supplied, also the ICL, for each of the given input arguments.

Note

In order to speed up repeated calls to the function inside MoE_clust, no checks take place.

Author(s)

Keefe Murphy - <[email protected]>

References

Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7): 719-725.

Examples

MoE_crit(modelName=c("VVI", "VVE", "VVV"), n=120, d=8,
         G=3, loglik=c(-4036.99, -3987.12, -3992.45))

data(CO2data)
GNP   <- CO2data$GNP
model <- MoE_clust(CO2data$CO2, G=1:2, expert= ~ GNP)
G     <- model$G
name  <- model$modelName
ll    <- max(model$loglik)
n     <- length(CO2data$CO2)
z     <- model$z

# Compare BIC from MoE_crit to the BIC of the model
(bic2 <- MoE_crit(modelName=name, loglik=ll, n=n, d=1, G=G, z=z,
                  expert.pen=G * length(coef(model$expert[[1]])))["bic",])
identical(unname(bic2), model$bic) #TRUE

# Make the same comparison with the known number of estimated parameters
(bic3 <- MoE_crit(loglik=ll, n=n, df=model$df, z=z)["bic",])
identical(unname(bic3), bic2)      #TRUE
MoE_crit(modelName=c("VVI", "VVE", "VVV"), n=120, d=8,
         G=3, loglik=c(-4036.99, -3987.12, -3992.45))

data(CO2data)
GNP   <- CO2data$GNP
model <- MoE_clust(CO2data$CO2, G=1:2, expert= ~ GNP)
G     <- model$G
name  <- model$modelName
ll    <- max(model$loglik)
n     <- length(CO2data$CO2)
z     <- model$z

# Compare BIC from MoE_crit to the BIC of the model
(bic2 <- MoE_crit(modelName=name, loglik=ll, n=n, d=1, G=G, z=z,
                  expert.pen=G * length(coef(model$expert[[1]])))["bic",])
identical(unname(bic2), model$bic) #TRUE

# Make the same comparison with the known number of estimated parameters
(bic3 <- MoE_crit(loglik=ll, n=n, df=model$df, z=z)["bic",])
identical(unname(bic3), bic2)      #TRUE

C-step for MoEClust Models

Description

Function to compute the assignment matrix z and the conditional log-likelihood for MoEClust models, with the aid of MoE_dens.

Usage

MoE_cstep(data,
          mus,
          sigs,
          log.tau = 0L,
          Vinv = NULL,
          Dens = NULL)
MoE_cstep(data,
          mus,
          sigs,
          log.tau = 0L,
          Vinv = NULL,
          Dens = NULL)

Arguments

`data`	If there are no expert network covariates, `data` should be a numeric matrix or data frame, wherein rows correspond to observations (n) and columns correspond to variables (d). If there are expert network covariates, this should be a list of length G containing matrices/data.frames of (multivariate) WLS residuals for each component.
`mus`	The mean for each of G components. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model. For the univariate models, this is a G-vector of means. In the presence of expert network covariates, all values should be equal to `0`.
`sigs`	The `variance` component in the parameters list from the output to e.g. `MoE_clust`. The components of this list depend on the specification of `modelName` (see `mclustVariance` for details). The number of components `G`, the number of variables `d`, and the `modelName` are inferred from `sigs`.
`log.tau`	If covariates enter the gating network, an n times G matrix of mixing proportions, otherwise a G-vector of mixing proportions for the components of the mixture. Must be on the log-scale in both cases. The default of `0` effectively means densities (or log-densities) aren't scaled by the mixing proportions.
`Vinv`	An estimate of the reciprocal hypervolume of the data region. See the function `noise_vol`. Used only if an initial guess as to which observations are noise is supplied. Mixing proportion(s) must be included for the noise component also.
`Dens`	(Optional) A numeric matrix whose `[i,k]`-th entry is the log-density of observation i in component k, scaled by the mixing proportions, to which the function is to be applied, typically obtained by `MoE_dens` but this is not necessary. If this is supplied, all other arguments are ignored, otherwise `MoE_dens` is called according to the other supplied arguments. If a vector is supplied, it will be coerced to a matrix with one row.

Value

A list containing two elements:

`z`	A matrix with `n` rows and `G` columns containing 1 where the observation belongs to the cluster indicated by the column number, and 0 otherwise.
`loglik`	The estimated conditional log-likelihood.

Note

This function is intended for joint use with MoE_dens, using the log-densities. Caution is advised using this function without explicitly naming the arguments. Models with a noise component are facilitated here too.

The C-step can be replaced by an E-step, see MoE_estep and the algo argument to MoE_control.

Author(s)

Keefe Murphy - <[email protected]>

Examples

# MoE_cstep can be invoked for fitting MoEClust models via the CEM algorithm
# via the 'algo' argument to MoE_control:
data(ais)
hema   <- ais[,3:7]
model  <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais, algo="CEM")
Dens   <- MoE_dens(data=hema, mus=model$parameters$mean,
                   sigs=model$parameters$variance, log.tau=log(model$parameters$pro))

# Construct the z matrix and compute the conditional log-likelihood
Cstep  <- MoE_cstep(Dens=Dens)
(ll    <- Cstep$loglik)

# Check that the z matrix & classification are the same as those from the model
identical(max.col(Cstep$z), as.integer(unname(model$classification))) #TRUE
identical(Cstep$z, model$z)                                           #TRUE

# Call MoE_cstep directly
Cstep2 <- MoE_cstep(data=hema, sigs=model$parameters$variance,
                    mus=model$parameters$mean, log.tau=log(model$parameters$pro))
identical(Cstep2$loglik, ll)                                          #TRUE
# MoE_cstep can be invoked for fitting MoEClust models via the CEM algorithm
# via the 'algo' argument to MoE_control:
data(ais)
hema   <- ais[,3:7]
model  <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais, algo="CEM")
Dens   <- MoE_dens(data=hema, mus=model$parameters$mean,
                   sigs=model$parameters$variance, log.tau=log(model$parameters$pro))

# Construct the z matrix and compute the conditional log-likelihood
Cstep  <- MoE_cstep(Dens=Dens)
(ll    <- Cstep$loglik)

# Check that the z matrix & classification are the same as those from the model
identical(max.col(Cstep$z), as.integer(unname(model$classification))) #TRUE
identical(Cstep$z, model$z)                                           #TRUE

# Call MoE_cstep directly
Cstep2 <- MoE_cstep(data=hema, sigs=model$parameters$variance,
                    mus=model$parameters$mean, log.tau=log(model$parameters$pro))
identical(Cstep2$loglik, ll)                                          #TRUE

Density for MoEClust Mixture Models

Description

Computes densities (or log-densities) of observations in MoEClust mixture models.

Usage

MoE_dens(data,
         mus,
         sigs,
         log.tau = 0L,
         Vinv = NULL,
         logarithm = TRUE)
MoE_dens(data,
         mus,
         sigs,
         log.tau = 0L,
         Vinv = NULL,
         logarithm = TRUE)

Arguments

`data`	If there are no expert network covariates, `data` should be a numeric matrix or data frame, wherein rows correspond to observations (n) and columns correspond to variables (d). If there are expert network covariates, this should be a list of length G containing matrices/data.frames of (multivariate) WLS residuals for each component.
`mus`	The mean for each of G components. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model. For the univariate models, this is a G-vector of means. In the presence of expert network covariates, all values should be equal to `0`.
`sigs`	The `variance` component in the parameters list from the output to e.g. `MoE_clust`. The components of this list depend on the specification of `modelName` (see `mclustVariance` for details). The number of components `G`, the number of variables `d`, and the `modelName` are inferred from `sigs`.
`log.tau`	If covariates enter the gating network, an n times G matrix of mixing proportions, otherwise a G-vector of mixing proportions for the components of the mixture. Must be on the log-scale in both cases. The default of `0` effectively means densities (or log-densities) aren't scaled by the mixing proportions.
`Vinv`	An estimate of the reciprocal hypervolume of the data region. See the function `noise_vol`. Used only if an initial guess as to which observations are noise is supplied. Mixing proportion(s) must be included for the noise component also.
`logarithm`	A logical value indicating whether or not the logarithm of the component densities should be returned. This defaults to `TRUE`, otherwise component densities are returned, obtained from the component log-densities by exponentiation. The log-densities can be passed to `MoE_estep` or `MoE_cstep`.

Value

A numeric matrix whose [i,k]-th entry is the density or log-density of observation i in component k, scaled by the mixing proportions. These densities are unnormalised.

Note

This function is intended for joint use with MoE_estep or MoE_cstep, using the log-densities. Note that models with a noise component are facilitated here too.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
hema  <- ais[,3:7]
model <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)
Dens  <- MoE_dens(data=hema, mus=model$parameters$mean,
                  sigs=model$parameters$variance, log.tau=log(model$parameters$pro))

# Construct the z matrix and compute the log-likelihood
Estep <- MoE_estep(Dens=Dens)
(ll   <- Estep$loglik)

# Check that the z matrix & classification are the same as those from the model
identical(max.col(Estep$z), as.integer(unname(model$classification))) #TRUE
identical(Estep$z, model$z)                                           #TRUE

# The same can be done for models with expert covariates &/or a noise component
# Note for models with expert covariates that the mean has to be supplied as 0,
# and the data has to be supplied as "resid.data"
m2    <- MoE_clust(hema, G=2, expert= ~ sex, modelNames="EVE", network.data=ais, tau0=0.1)
Dens2 <- MoE_dens(data=m2$resid.data, sigs=m2$parameters$variance, mus=0, 
                  log.tau=log(m2$parameters$pro), Vinv=m2$parameters$Vinv)
data(ais)
hema  <- ais[,3:7]
model <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)
Dens  <- MoE_dens(data=hema, mus=model$parameters$mean,
                  sigs=model$parameters$variance, log.tau=log(model$parameters$pro))

# Construct the z matrix and compute the log-likelihood
Estep <- MoE_estep(Dens=Dens)
(ll   <- Estep$loglik)

# Check that the z matrix & classification are the same as those from the model
identical(max.col(Estep$z), as.integer(unname(model$classification))) #TRUE
identical(Estep$z, model$z)                                           #TRUE

# The same can be done for models with expert covariates &/or a noise component
# Note for models with expert covariates that the mean has to be supplied as 0,
# and the data has to be supplied as "resid.data"
m2    <- MoE_clust(hema, G=2, expert= ~ sex, modelNames="EVE", network.data=ais, tau0=0.1)
Dens2 <- MoE_dens(data=m2$resid.data, sigs=m2$parameters$variance, mus=0, 
                  log.tau=log(m2$parameters$pro), Vinv=m2$parameters$Vinv)

Entropy of a fitted MoEClust model

Description

Calculates the normalised entropy of a fitted MoEClust model.

Usage

MoE_entropy(x,
            group = FALSE)
MoE_entropy(x,
            group = FALSE)

Arguments

`x`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with gating and/or expert covariates and/or a noise component are facilitated here too.
`group`	A logical (defaults to `FALSE`) indicating whether component-specific average entropies should be returned instead.

Details

When group is FALSE, this function calculates the normalised entropy via

$H=-\frac{1}{n\log(G)}\sum_{i=1}^n\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})$

, where $n$ and $G$ are the sample size and number of components, respectively, and $\hat{z}_{ig}$ is the estimated posterior probability at convergence that observation $i$ belongs to component $g$ . Note that G=x$G for models without a noise component and G=x$G + 1 for models with a noise component.

When group is TRUE,

$H_i=-\frac{1}{\log(G)}\sum_{g=1}^G\hat{z}_{ig}\log(\hat{z}_{ig})$

is computed for each observation and averaged according to the MAP classification.

Value

When group is FALSE, a single number, given by $1-H$ , in the range [0,1], such that larger values indicate clearer separation of the clusters. Otherwise, a vector of length G containing the per-component averages of the observation-specific entries is returned.

Note

This function will always return a normalised entropy of 1 for models fitted using the "CEM" algorithm (see MoE_control), or models with only one component.

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

data(ais)
res <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex, 
                 modelNames="EEE", network.data=ais)

# Calculate the normalised entropy
MoE_entropy(res)

# Calculate the normalised entropy per cluster
MoE_entropy(res, group=TRUE)
data(ais)
res <- MoE_clust(ais[,3:7], G=3, gating= ~ BMI + sex, 
                 modelNames="EEE", network.data=ais)

# Calculate the normalised entropy
MoE_entropy(res)

# Calculate the normalised entropy per cluster
MoE_entropy(res, group=TRUE)

E-step for MoEClust Models

Description

Softmax function to compute the responsibility matrix z and the log-likelihood for MoEClust models, with the aid of MoE_dens.

Usage

MoE_estep(data,
          mus,
          sigs,
          log.tau = 0L,
          Vinv = NULL,
          Dens = NULL)
MoE_estep(data,
          mus,
          sigs,
          log.tau = 0L,
          Vinv = NULL,
          Dens = NULL)

Arguments

`data`	If there are no expert network covariates, `data` should be a numeric matrix or data frame, wherein rows correspond to observations (n) and columns correspond to variables (d). If there are expert network covariates, this should be a list of length G containing matrices/data.frames of (multivariate) WLS residuals for each component.
`mus`	The mean for each of G components. If there is more than one component, this is a matrix whose k-th column is the mean of the k-th component of the mixture model. For the univariate models, this is a G-vector of means. In the presence of expert network covariates, all values should be equal to `0`.
`sigs`	The `variance` component in the parameters list from the output to e.g. `MoE_clust`. The components of this list depend on the specification of `modelName` (see `mclustVariance` for details). The number of components `G`, the number of variables `d`, and the `modelName` are inferred from `sigs`.
`log.tau`	If covariates enter the gating network, an n times G matrix of mixing proportions, otherwise a G-vector of mixing proportions for the components of the mixture. Must be on the log-scale in both cases. The default of `0` effectively means densities (or log-densities) aren't scaled by the mixing proportions.
`Vinv`	An estimate of the reciprocal hypervolume of the data region. See the function `noise_vol`. Used only if an initial guess as to which observations are noise is supplied. Mixing proportion(s) must be included for the noise component also.
`Dens`	(Optional) A numeric matrix whose `[i,k]`-th entry is the log-density of observation i in component k, scaled by the mixing proportions, to which the `softmax` function is to be applied, typically obtained by `MoE_dens` but this is not necessary. If this is supplied, all other arguments are ignored, otherwise `MoE_dens` is called according to the other supplied arguments. If a vector is supplied, it will be coerced to a matrix with one row.

Value

A list containing two elements:

`z`	A matrix with `n` rows and `G` columns containing the probability of cluster membership for each of `n` observations and `G` clusters.
`loglik`	The estimated log-likelihood, computed efficiently via `logsumexp`.

Note

This softmax function is intended for joint use with MoE_dens, using the log-densities. Caution is advised using this function without explicitly naming the arguments. Models with a noise component are facilitated here too.

The E-step can be replaced by a C-step, see MoE_cstep and the algo argument to MoE_control.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
hema   <- ais[,3:7]
model  <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)
Dens   <- MoE_dens(data=hema, mus=model$parameters$mean,
                   sigs=model$parameters$variance, log.tau=log(model$parameters$pro))

# Construct the z matrix and compute the log-likelihood
Estep  <- MoE_estep(Dens=Dens)
(ll    <- Estep$loglik)

# Check that the z matrix & classification are the same as those from the model
identical(max.col(Estep$z), as.integer(unname(model$classification))) #TRUE
identical(Estep$z, model$z)                                           #TRUE

# Call MoE_estep directly
Estep2 <- MoE_estep(data=hema, sigs=model$parameters$variance,
                    mus=model$parameters$mean, log.tau=log(model$parameters$pro))
identical(Estep2$loglik, ll)                                          #TRUE

# The same can be done for models with expert covariates &/or a noise component
# Note for models with expert covariates that the mean has to be supplied as 0,
# and the data has to be supplied as "resid.data"
m2     <- MoE_clust(hema, G=2, expert= ~ sex, modelNames="EVE", network.data=ais, tau0=0.1)
Estep3 <- MoE_estep(data=m2$resid.data, sigs=m2$parameters$variance, mus=0, 
                    log.tau=log(m2$parameters$pro), Vinv=m2$parameters$Vinv)
data(ais)
hema   <- ais[,3:7]
model  <- MoE_clust(hema, G=3, gating= ~ BMI + sex, modelNames="EEE", network.data=ais)
Dens   <- MoE_dens(data=hema, mus=model$parameters$mean,
                   sigs=model$parameters$variance, log.tau=log(model$parameters$pro))

# Construct the z matrix and compute the log-likelihood
Estep  <- MoE_estep(Dens=Dens)
(ll    <- Estep$loglik)

# Check that the z matrix & classification are the same as those from the model
identical(max.col(Estep$z), as.integer(unname(model$classification))) #TRUE
identical(Estep$z, model$z)                                           #TRUE

# Call MoE_estep directly
Estep2 <- MoE_estep(data=hema, sigs=model$parameters$variance,
                    mus=model$parameters$mean, log.tau=log(model$parameters$pro))
identical(Estep2$loglik, ll)                                          #TRUE

# The same can be done for models with expert covariates &/or a noise component
# Note for models with expert covariates that the mean has to be supplied as 0,
# and the data has to be supplied as "resid.data"
m2     <- MoE_clust(hema, G=2, expert= ~ sex, modelNames="EVE", network.data=ais, tau0=0.1)
Estep3 <- MoE_estep(data=m2$resid.data, sigs=m2$parameters$variance, mus=0, 
                    log.tau=log(m2$parameters$pro), Vinv=m2$parameters$Vinv)

Generalised Pairs Plots for MoEClust Mixture Models

Description

Produces a matrix of plots showing pairwise relationships between continuous response variables and continuous/categorical/logical/ordinal associated covariates, as well as the clustering achieved, according to fitted MoEClust mixture models.

Usage

MoE_gpairs(res,
           response.type = c("points", "uncertainty", "density"),
           subset = list(...),
           scatter.type = c("lm", "points"),
           conditional = c("stripplot", "boxplot"),
           addEllipses = c("outer", "yes", "no", "inner", "both"),
           expert.covar = TRUE,
           border.col = c("purple", "black", "brown", "brown", "navy"),
           bg.col = c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk"),
           outer.margins = list(bottom = grid::unit(2, "lines"),
                                left = grid::unit(2, "lines"),
                                top = grid::unit(2, "lines"),
                                right = grid::unit(2, "lines")),
           outer.labels = NULL,
           outer.rot = c(0, 90),
           gap = 0.05,
           buffer = 0.025,
           uncert.cov = FALSE,
           scatter.pars = list(...),
           density.pars = list(...),
           diag.pars = list(...),
           stripplot.pars = list(...),
           boxplot.pars = list(...),
           barcode.pars = list(...),
           mosaic.pars = list(...),
           axis.pars = list(...),
           ...)
MoE_gpairs(res,
           response.type = c("points", "uncertainty", "density"),
           subset = list(...),
           scatter.type = c("lm", "points"),
           conditional = c("stripplot", "boxplot"),
           addEllipses = c("outer", "yes", "no", "inner", "both"),
           expert.covar = TRUE,
           border.col = c("purple", "black", "brown", "brown", "navy"),
           bg.col = c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk"),
           outer.margins = list(bottom = grid::unit(2, "lines"),
                                left = grid::unit(2, "lines"),
                                top = grid::unit(2, "lines"),
                                right = grid::unit(2, "lines")),
           outer.labels = NULL,
           outer.rot = c(0, 90),
           gap = 0.05,
           buffer = 0.025,
           uncert.cov = FALSE,
           scatter.pars = list(...),
           density.pars = list(...),
           diag.pars = list(...),
           stripplot.pars = list(...),
           boxplot.pars = list(...),
           barcode.pars = list(...),
           mosaic.pars = list(...),
           axis.pars = list(...),
           ...)

Arguments

`res`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`response.type`	The type of plot desired for the scatterplots comparing continuous response variables. Defaults to `"points"`. See `scatter.pars` below. Points can also be sized according to their associated clustering uncertainty with the option `"uncertainty"`. In doing so, the transparency of the points will also be proportional to their clustering uncertainty, provided the device supports transparency. See also `MoE_Uncertainty` for an alternative means of visualising observation-specific cluster uncertainties (especially for univariate data). See `scatter.pars` below, and note that models fitted via the `"CEM"` algorithm will have no associated clustering uncertainty. Alternatively, the bivariate parametric `"density"` contours can be displayed (see `density.pars`), provided there is at least one Gaussian component in the model. Caution is advised when producing density plots for models with covariates in the expert network; the required number of evaluations of the (multivariate) Gaussian density for each panel (`res$G * prod(density.pars$grid.size)`) increases by a factor of `res$n`, thus plotting may be slow (particularly for large data sets). However, this is offset somewhat by using pre-calculated densities from the corresponding upper-triangular panels when producing the lower-triangular panels. See `density.pars` below.
`subset`	A list giving named arguments for producing only a subset of panels: `show.map` Logical indicating whether to show panels involving the MAP classification (defaults to `TRUE`, unless there is only one component, in which case the MAP classification is never plotted.). `data.ind` For subsetting response variables: a vector of column indices corresponding to the variables in the columns of `res$data` which should be shown. Defaults to all. Can be `0`, in order to suppress plotting the response variables. Alternatively, character strings matching the column names of `res$data` can be supplied here. `cov.ind` For subsetting covariates: a vector of column indices corresponding to the covariates in the columns `res$net.covs` which should be shown. Defaults to all. Can be `0`, in order to suppress plotting the covariates. Alternatively, character strings matching the column names of `res$net.covs` can be supplied here. `submat` Can take the values `"all"` (default), `"upper"`, `"lower"`, or `"diagonal"`, for displaying all panels or only the upper/lower triangular panels or diagonal (marginal) panels of the plot matrix. The results of the subsetting must ensure that at least one panel of some sort can be plotted. The arguments `data.ind` and `cov.ind` can also be used to simply reorder the panels, without actually subsetting. Diagonal panels are always drawn, regardless of the value of `submat` (but can be somewhat suppressed using `diag.pars$show.hist=FALSE` and `diag.pars$show.dens=FALSE`; see `diag.pars` below). When `diag.pars$diagonal=TRUE` (the default), the triangular portions are the `"upper"`-right and `"lower"`-left, whereas they are the `"upper"`-left and `"lower"`-right when `diag.pars$diagonal=FALSE`. Generally, `submat="upper"` should be preferable to `submat="lower"`, as it ensures that response variables and covariates are displayed as appropriate on the y-axes and x-axes, respectively.
`scatter.type`	A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, pertaining to the associated covariates. Defaults to `"lm"` for covariate vs. response panels and `"points"` otherwise. Only relevant for models with continuous covariates in the gating &/or expert network. `"ci"` and `"lm"` type plots are only produced for plots pairing covariates with response, and never response vs. response or covariate vs. covariate. Note that lines &/or confidence intervals will only be drawn for continuous covariates included in the expert network; to include covariates included only in the gating network also, the options `"lm2"` or `"ci2"` can be used but this is not generally advisable. See `scatter.pars` below.
`conditional`	A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, for plots involving a mix of categorical and continuous variables. Defaults to `"stripplot"` in the upper triangle and `"boxplot"` in the lower triangle (see `panel.stripplot` and `panel.bwplot`). `"violin"` and `"barcode"` plots can also be produced. Only relevant for models with categorical covariates in the gating &/or expert network, unless `show.MAP` is `TRUE`. Comparisons of two categorical variables (which can only ever be covariates or the MAP classification) are always displayed via mosaic plots (see `strucplot`). All `conditional` panel types can be customised further; see `stripplot.pars`, `boxplot.pars` (for both `"boxplot"` and `"violin"` plots), `barcode.pars`, and `mosaic.pars` below. Note that when `conditional` is of length 1, that plot type will be used in both the upper and lower triangular portions of the plot, where relevant.
`addEllipses`	Controls whether to add MVN ellipses with axes corresponding to the within-cluster covariances for the response data. The options `"inner"` and `"outer"` (the default) will colour the axes or the perimeter of those ellipses, respectively, according to the cluster they represent (according to `scatter.pars$eci.col`). The option `"both"` will obviously colour both the axes and the perimeter. The `"yes"` or `"no"` options merely govern whether the ellipses are drawn, i.e. `"yes"` draws ellipses without any colouring. Ellipses are only ever drawn for multivariate data, and only when `response.type` is `"points"` or `"uncertainty"`. Ellipses are centered on the posterior mean of the fitted values when there are expert network covariates, otherwise on the posterior mean of the response variables. In the presence of expert network covariates, the component-specific covariance matrices are also (by default, via the argument `expert.covar` below) modified for plotting purposes via the function `expert_covar`, in order to account for the extra variability of the means, usually resulting in bigger shapes & sizes for the MVN ellipses.
`expert.covar`	Logical (defaults to `TRUE`) governing whether the extra variability in the component means is added to the MVN ellipses corresponding to the component covariance matrices in the presence of expert network covariates. See the function `expert_covar`. Only relevant when `response.type` is `"points"` or `"uncertainty"` when `addEllipses` is invoked accordingly, and only relevant for models with expert network covariates and multivariate responses.
`border.col`	A vector of length 5 (or 1) containing border colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively. Defaults to `c("purple", "black", "brown", "brown", "navy")`.
`bg.col`	A vector of length 5 (or 1) containing background colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively. Defaults to `c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk")`.
`outer.margins`	A list of length 4 with units as components named `bottom`, `left`, `top`, and `right`, giving the outer margins; the defaults uses two lines of text. A vector of length 4 with units (ordered properly) will work, as will a vector of length 4 with numeric variables (interpreted as lines). May need to be increased to accommodate outer labels in some cases.
`outer.labels`	The default is typically `NULL`, for alternating labels around the perimeter. If `"all"`, all labels are printed, and if `"none"`, no labels are printed. If `subset$submat="upper"` or `subset$submat="lower"`, `outer.labels` instead defaults to `"all"`. Note that axis labels always correspond to the range of the depicted variable, and thus should not be interpreted as indicating counts or densities for the diagonal panels when `diag.pars$show.hist=TRUE` &/or `diag.pars$show.dens=TRUE`.
`outer.rot`	A 2-vector (`x`, `y`) rotating the top/bottom outer labels `x` degrees and the left/right outer labels `y` degrees. Only works for categorical labels of boxplot, mosaic, strip plot, and violin plot panels. Defaults to `c(0, 90)`. Reordering via `data.ind` or `cov.ind` may improve appearance of outer labels in some cases.
`gap`	The gap between the tiles; defaulting to `0.05` of the width of a tile.
`buffer`	The fraction by which to expand the range of quantitative variables to provide plots that will not truncate plotting symbols. Defaults to `0.025`, i.e. 2.5 percent of the range. Particularly useful when ellipses are drawn (see `addEllipses`) to ensure ellipses are visible in full.
`uncert.cov`	A logical indicating whether the expansion factor for points on plots involving covariates should also be modified when `response.type="uncertainty"`. Defaults to `FALSE`, and only relevant for scatterplot and strip plot panels. When `TRUE`, `scatter.pars$uncert.pch` is invoked as the plotting symbols for covariate-related scatterplot and strip plot panels, otherwise `scatter.pars$scat.pch` and `stripplot.pars$strip.pch` are invoked for such panels.
`scatter.pars`	A list supplying select parameters for the continuous vs. continuous scatterplots. `NULL` is equivalent to: list(scat.pch=res$classification, uncert.pch=19, scat.col=res$classification, scat.size=unit(0.25, "char"), eci.col=1:res$G, noise.size=unit(0.2, "char")), where `scat.pch`, `scat.col`, and `scat.size` give the plotting symbols, colours, and sizes of the points in scatterplot panels, respectively. Note that `eci.col` gives both a) the colour of the fitted lines &/or confidence intervals for expert-related panels when `scatter.type` is one of `"ci"` or `"lm"` and b) the colour of the ellipses (if any) when `addEllipses` is one of `"outer"`, `"inner"`, or `"both"` and the response data is multivariate. Note that `eci.col` will inherit a suitable default from `scat.col` instead if the latter is supplied but the former is not. Note also that `scat.size` will be modified on an observation-by-observation level when `response.type` is `"uncertainty"`. Furthermore, note that the behaviour for plotting symbols when `response.type="uncertainty"` changes compared to `response.type="points"` depending on the value of the `uncert.cov` argument above. `uncert.pch` gives the plotting symbol used for all scatterplot (and strip plot) panels when `response.type="uncertainty"` and `uncert.cov` is `TRUE`. However, when `uncert.cov` is `FALSE`, `scat.pch` is invoked for scatterplots involving covariates and `uncert.pch` is used for panels involving only response variables. Finally, `noise.size` can be used to modify `scat.size` for observations assigned to the noise component (if any), but only when `response.type="points"`.
`density.pars`	A list supplying select parameters for visualising the bivariate parametric density contours, only when `response.type` is `"density"`. `NULL` is equivalent to: list(grid.size=c(100, 100), dcol="grey50", dens.points=FALSE, nlevels=11, show.labels=!dens.points, label.style="mixed"), where `grid.size` is a vector of length two giving the number of points in the x & y directions of the grid over which the density is evaluated, respectively (though `density.pars$grid.size` can also be supplied as a scalar, which will be automatically recycled to a vector of length 2), and `dcol` is either a single colour or a vector of length `nlevels` colours. `dens.points` governs whether points should be overlaid when `response.type="density"` (in other words, `dens.points=TRUE` is akin to specifying `response.type="points"` and `response.type="density"` simultaneously) and `show.labels` governs whether the density contours should be labelled. Note that contours are not labelled when `dens.points=TRUE` by default. Finally, `label.style` can take the values `"mixed"`, `"flat"`, or `"align"`.
`diag.pars`	A list supplying select parameters for panels along the diagonal. `NULL` is equivalent to: list(diag.fontsize=9, diagonal=TRUE, hist.color=hist.color, show.hist=TRUE, show.counts=TRUE, show.dens=FALSE, dens.grid=100), where `hist.color` is a vector of length 4, giving the colours for the response variables, gating covariates, expert covariates, and covariates entering both networks, respectively. By default, diagonal panels for response variables are `ifelse(diag.pars$show.dens, "white", "black")` and covariates of any kind are `"dimgrey"`. `hist.color` also governs the outer colour for mosaic panels and the fill colour for boxplot, and violin panels (except for those involving the MAP classification; see `boxplot.pars` below). However, in the case of response vs. (categorical) covariates boxplots and violin plots, the fill colour is always `"white"`. The MAP classification is always coloured by cluster membership, by default. The argument `show.counts` is only relevant for categorical variables. The argument `show.dens` toggles whether parametric density estimates are drawn over the diagonal panels for each response variable. When `show.dens=TRUE`, the component densities are shown via thin lines, with colours given by `scatter.pars$scat.col`, while a thick `"black"` line is used for the overall mixture density. This argument can be used with or without `show.hist` also being `TRUE`. Finally, the grid size when `show.dens=TRUE` is given by `diag.grid=100` by default. As per `response.type="density"`, plotting is liable to be a little slower when `show.dens=TRUE` for models with expert network covariates. This is why `show.dens=FALSE` by default; otherwise it is recommended to be set to `TRUE`. When `diagonal=TRUE` (the default), the diagonal from the top left to the bottom right is used for displaying the marginal distributions of variables (via histograms, with or without overlaid density estimates, or barplots, as appropriate). Specifying `diagonal=FALSE` will place the diagonal running from the top right down to the bottom left (with `subset$submat` accounted for accordingly).
`stripplot.pars`	A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of `conditional` is `"stripplot"`. `NULL` is equivalent to: list(strip.pch=res$classification, strip.size=unit(0.5, "char"), strip.col=res$classification, jitter=TRUE, size.noise=unit(0.4, "char")), where `strip.size` and `size.noise` retain the definitions for the similar arguments under `scatter.pars` above. However, `stripplot.pars$size.noise` is invoked regardless of the `response.type` (in contrast to `scatter.pars$noise.size`). Notably, `strip.col` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not. Note also that the `strip.pch` default is modified to `scatter.pars$uncert.pch` if `uncert.cov` is `TRUE`.
`boxplot.pars`	A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of `conditional` is `"boxplot"` or `"violin"`. `NULL` is equivalent to: list(box.pch="\|", box.col="black", varwidth=FALSE, notch=FALSE, notch.frac=0.5, box.fill=1:res$G). All of the above are relevant for `"boxplot"` panels, are passed to `panel.bwplot` when producing boxplots, and retain the same definitions as the similarly named arguments therein. However, only `box.col`, `varwidth`, and `box.fill` are relevant for `"violin"` panels, and in both cases `box.fill` is only invoked for panels where the categorical variable is the MAP classification (i.e. when `subset$show.map=TRUE`). See `diag.pars$hist.color` for controlling the colours of non-MAP-related boxplot/violin panels. Notably, `box.fill` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not.
`barcode.pars`	A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of `conditional` is `"barcode"`. `NULL` is equivalent to: list(bar.col=res$G:1, nint=0, ptsize=scatter.pars$scat.size, ptpch=scatter.pars$scat.pch, bcspace=NULL, use.points=FALSE), where `bar.col` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not. See the help file for `barcode::barcode` for details on the remaining arguments. Note that the arguments `ptsize` and `ptpch`, which are only relevant when `use.points=TRUE` are given by the corresponding `scatter.pars$scat.size`/`scatter.pars$noise.size` and `scatter.pars$scat.pch` arguments, by default.
`mosaic.pars`	A list supplying select parameters for categorical vs. categorical panels (if any). `NULL` is equivalent to: list(shade=NULL, gp_labels=grid::gpar(fontsize=9), gp_args=list(), gp=list(), mfill=TRUE, mcol=1:res$G). The current default arguments and values thereof are passed through to `strucplot` for producing mosaic tiles. When `shade` is not `FALSE`, `mfill` is a logical which governs the colouring scheme for panels (if any) involving the MAP classification. When `mfill` is `TRUE` (the default), `gp` is invoked here in such a way that tiles will inherit appropriate interior colours via `gp$fill` from `mcol` and a `"black"` outer colour via `gp$col`. When `mfill` is `FALSE`, or the panel involves two categorical covariates, the outer colours are inherited from `mcol` and the interior fill colour is inherited from `bg.col`. See `diag.pars$hist.color` for controlling the interior fill colour of non-MAP-related mosaic panels. Notably, `mcol` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not.
`axis.pars`	A list supplying select parameters for controlling the axes. `NULL` is equivalent to: list(n.ticks=5, axis.fontsize=9). The argument `n.ticks` will be overwritten for categorical variables with fewer than 5 levels.
`...`	Catches unused arguments. Alternatively, named arguments can be passed directly here to any/all of `scatter.pars`, `density.pars`, `diag.pars`, `stripplot.pars`, `boxplot.pars`, `barcode.pars`, `mosaic.pars`, and `axis.pars`.

Value

A generalised pairs plot showing all pairwise relationships between clustered response variables and associated gating &/or expert network continuous &/or categorical variables, coloured according to the MAP classification, with the marginal distributions of each variable along the diagonal.

Note

plot.MoEClust is a wrapper to MoE_gpairs which accepts the default arguments, and also produces other types of plots. Caution is advised producing generalised pairs plots when the dimension of the data is large.

Note that all colour-related defaults in scatter.pars, stripplot.pars, barcode.pars, and mosaic.pars above assume a specific colour-palette (see mclust.options("classPlotColors")). Thus, for instance, specifying scatter.pars$scat.col=res$classification will produce different results compared to leaving this argument unspecified. This is especially true for models with a noise component, for which the default is handled quite differently (for one thing, res$G is the number of non-noise components). Similarly, all pch-related defaults in scatter.pars and stripplot.pars above assume a specific set of plotting symbols also (see mclust.options("classPlotSymbols")). Generally, all colour and symbol related arguments are strongly recommended to be left at their default values, unless being supplied as a single character string, e.g. "black" for colours. To help in this regard, colour-related arguments sensibly inherent their defaults from scatter.pars$scat.col if that is supplied and the argument in question is not.

Warning

For MoEClust models with more than one expert network covariate, fitted lines produced in continuous covariate vs. continuous response scatterplots via scatter.type="lm" or scatter.type="ci" will NOT correspond to the coefficients in the expert network (res$expert).

Caution is advised when producing "barcode" plots for the conditional panels. In some cases, resizing the graphics device after the production of the plot will result in distortion because of the way the rotation of non-horizontal barcodes is performed. Thus, when any(conditional == "barcode"), it is advisable to ensure the dimensions of the overall plot are square. Furthermore, such plots may not display correctly anyway in RStudio's “Plots” pane and so a different graphics device may need to be used (but not subsequently resized).

Caution is also advised when producing generalised pairs plots when the dimension of the data is large.

Author(s)

Keefe Murphy - <[email protected]>

References

Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H. and Wickham, H. (2013). The generalized pairs plot. Journal of Computational and Graphical Statistics, 22(1): 79-91.

Examples

data(ais)
res   <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                   network.data=ais, modelNames="EVE")
MoE_gpairs(res)

# Produce the same plot, but with a violin plot in the lower triangle.
# Colour the outline of the mosaic tiles rather than the interior using mfill.
# Size points in the response vs. response panels by their clustering uncertainty.

MoE_gpairs(res, conditional=c("stripplot", "violin"),
           mfill=FALSE, response.type="uncertainty")

# Instead show the bivariate density contours of the response variables (without labels).
# (Plotting may be slow when response.type="density" for models with expert covariates.)
# Use different colours for histograms of covariates in the gating/expert/both networks.
# Also use different colours for response vs. covariate & covariate vs. response panels.

MoE_gpairs(res, response.type="density", show.labels=FALSE, dens.points=TRUE,
           hist.color=c("black", "cyan", "hotpink", "chartreuse"),
           bg.col=c("whitesmoke", "white", "mintcream", "mintcream", "floralwhite"))
           
# Examine effect of expert.covar & diag.grid in conjunction with show.dens & show.hist
MoE_gpairs(res, show.dens=TRUE, expert.covar=FALSE, show.hist=FALSE, diag.grid=20)
MoE_gpairs(res, show.dens=TRUE, expert.covar=TRUE, show.hist=TRUE, diag.grid=200)
           
# Explore various options to subset and rearrange the panels
MoE_gpairs(res, data.ind=5:1, cov.ind=0, 
           show.map=FALSE, show.hist=FALSE, 
           submat="upper", diagonal=FALSE)          
           
# Produce a generalised pairs plot for a model with a noise component.
# Reorder the covariates and omit the variables "Hc" and "Hg".
# Use barcode plots for the categorical/continuous pairs.
# Magnify the size of scatter points assigned to the noise component.

resN  <- MoE_clust(ais[,3:7], G=2, gating= ~ SSF + Ht, expert= ~ sex,
                   network.data=ais, modelNames="EEE", tau0=0.1, noise.gate=FALSE)
                   
# Note that non-horizontal barcode panels may not display correctly in RStudio's "Plots" pane 
# it may be necessary to first open a new device:
# dev.new()
MoE_gpairs(resN, data.ind=c(1,2,5), cov.ind=c(3,1,2), use.points=TRUE,
           conditional="barcode", noise.size=grid::unit(0.5, "char"))
           
# Plots can be modified to show only a single (diagonal) panel of interest
MoE_gpairs(resN, data.ind=0, cov.ind=0)
MoE_gpairs(resN, data.ind=0, cov.ind="sex", show.map=FALSE)
MoE_gpairs(resN, data.ind="RCC", cov.ind=0, show.map=FALSE, show.dens=TRUE)
data(ais)
res   <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                   network.data=ais, modelNames="EVE")
MoE_gpairs(res)

# Produce the same plot, but with a violin plot in the lower triangle.
# Colour the outline of the mosaic tiles rather than the interior using mfill.
# Size points in the response vs. response panels by their clustering uncertainty.

MoE_gpairs(res, conditional=c("stripplot", "violin"),
           mfill=FALSE, response.type="uncertainty")

# Instead show the bivariate density contours of the response variables (without labels).
# (Plotting may be slow when response.type="density" for models with expert covariates.)
# Use different colours for histograms of covariates in the gating/expert/both networks.
# Also use different colours for response vs. covariate & covariate vs. response panels.

MoE_gpairs(res, response.type="density", show.labels=FALSE, dens.points=TRUE,
           hist.color=c("black", "cyan", "hotpink", "chartreuse"),
           bg.col=c("whitesmoke", "white", "mintcream", "mintcream", "floralwhite"))
           
# Examine effect of expert.covar & diag.grid in conjunction with show.dens & show.hist
MoE_gpairs(res, show.dens=TRUE, expert.covar=FALSE, show.hist=FALSE, diag.grid=20)
MoE_gpairs(res, show.dens=TRUE, expert.covar=TRUE, show.hist=TRUE, diag.grid=200)
           
# Explore various options to subset and rearrange the panels
MoE_gpairs(res, data.ind=5:1, cov.ind=0, 
           show.map=FALSE, show.hist=FALSE, 
           submat="upper", diagonal=FALSE)          
           
# Produce a generalised pairs plot for a model with a noise component.
# Reorder the covariates and omit the variables "Hc" and "Hg".
# Use barcode plots for the categorical/continuous pairs.
# Magnify the size of scatter points assigned to the noise component.

resN  <- MoE_clust(ais[,3:7], G=2, gating= ~ SSF + Ht, expert= ~ sex,
                   network.data=ais, modelNames="EEE", tau0=0.1, noise.gate=FALSE)
                   
# Note that non-horizontal barcode panels may not display correctly in RStudio's "Plots" pane 
# it may be necessary to first open a new device:
# dev.new()
MoE_gpairs(resN, data.ind=c(1,2,5), cov.ind=c(3,1,2), use.points=TRUE,
           conditional="barcode", noise.size=grid::unit(0.5, "char"))
           
# Plots can be modified to show only a single (diagonal) panel of interest
MoE_gpairs(resN, data.ind=0, cov.ind=0)
MoE_gpairs(resN, data.ind=0, cov.ind="sex", show.map=FALSE)
MoE_gpairs(resN, data.ind="RCC", cov.ind=0, show.map=FALSE, show.dens=TRUE)

Mahalanobis Distance Outlier Detection for Multivariate Response

Description

Computes the Mahalanobis distance between the response variable(s) and the fitted values of linear regression models with multivariate or univariate responses.

Usage

MoE_mahala(fit,
           resids,
           squared = FALSE,
           identity = NULL)
MoE_mahala(fit,
           resids,
           squared = FALSE,
           identity = NULL)

Arguments

`fit`	A fitted `lm` model, inheriting either the `"mlm"` or `"lm"` class.
`resids`	The residuals. Can be residuals for observations included in the model, or residuals arising from predictions on unseen data. Must be coercible to a matrix with the number of columns being the number of response variables. Missing values are not allowed.
`squared`	A logical. By default (`FALSE`), the generalized interpoint distance is computed. Set this flag to `TRUE` for the squared value.
`identity`	A logical indicating whether the identity matrix is used in place of the precision matrix in the Mahalanobis distance calculation. Defaults to `FALSE` for multivariate response data but defaults to `TRUE` for univariate response data, where `TRUE` corresponds to the use of the Euclidean distance. Setting `identity=TRUE` with multivariate data may be advisable when the dimensions of the data are such that the covariance matrix cannot be inverted (otherwise, the pseudo-inverse is used under the `FALSE` default).

Value

A vector giving the Mahalanobis distance (or squared Mahalanobis distance) between response(s) and fitted values for each observation.

Author(s)

Keefe Murphy - <[email protected]>

References

Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences, India, 2(1): 49-55.

Examples


data(ais)
hema <- as.matrix(ais[,3:7])
mod  <- lm(hema ~ sex + BMI, data=ais)
res  <- hema - predict(mod)
MoE_mahala(mod, res, squared=TRUE)

data(CO2data)
CO2  <- CO2data$CO2
GNP  <- CO2data$GNP
mod2 <- lm(CO2 ~ GNP, data=CO2data)
pred <- predict(mod2)
res2 <- CO2 - pred
maha <- MoE_mahala(mod2, res2)

# Highlight outlying observations
plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
lines(GNP, pred, col="red")
points(GNP, CO2, cex=maha, lwd=2)
text(GNP, CO2, col="blue", 
     labels=replace(as.character(CO2data$country), maha < 1, ""))
     
# Replicate initialisation strategy using 2 randomly chosen components
# Repeat the random initialisation if necessary
# (until 'crit' at convergence is minimised)
G       <- 3L
z       <- sample(seq_len(G), nrow(CO2data), replace=TRUE)
old     <- Inf
crit    <- .Machine$double.xmax
while(crit < old)   {
  Sys.sleep(1)
  old   <- crit
  maha  <- NULL
  plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
  for(g in seq_len(G)) { 
   ind  <- which(z == g)
   mod  <- lm(CO2 ~ GNP, data=CO2data, sub=ind)
   pred <- predict(mod, newdata=CO2data[,"CO2", drop=FALSE])
   maha <- cbind(maha, MoE_mahala(mod, CO2 - pred))
   lines(GNP, pred, col=g + 1L)
  }
  min.M <- rowMins(maha)
  crit  <- sum(min.M)
  z     <- max.col(maha == min.M)
  points(GNP, CO2, cex=min.M, lwd=2, col=z + 1L)
  text(GNP, CO2, col=z + 1L, 
       labels=replace(as.character(CO2data$country), which(min.M <= 1), ""))
}
crit
data(ais)
hema <- as.matrix(ais[,3:7])
mod  <- lm(hema ~ sex + BMI, data=ais)
res  <- hema - predict(mod)
MoE_mahala(mod, res, squared=TRUE)

data(CO2data)
CO2  <- CO2data$CO2
GNP  <- CO2data$GNP
mod2 <- lm(CO2 ~ GNP, data=CO2data)
pred <- predict(mod2)
res2 <- CO2 - pred
maha <- MoE_mahala(mod2, res2)

# Highlight outlying observations
plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
lines(GNP, pred, col="red")
points(GNP, CO2, cex=maha, lwd=2)
text(GNP, CO2, col="blue", 
     labels=replace(as.character(CO2data$country), maha < 1, ""))
     
# Replicate initialisation strategy using 2 randomly chosen components
# Repeat the random initialisation if necessary
# (until 'crit' at convergence is minimised)
G       <- 3L
z       <- sample(seq_len(G), nrow(CO2data), replace=TRUE)
old     <- Inf
crit    <- .Machine$double.xmax
while(crit < old)   {
  Sys.sleep(1)
  old   <- crit
  maha  <- NULL
  plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
  for(g in seq_len(G)) { 
   ind  <- which(z == g)
   mod  <- lm(CO2 ~ GNP, data=CO2data, sub=ind)
   pred <- predict(mod, newdata=CO2data[,"CO2", drop=FALSE])
   maha <- cbind(maha, MoE_mahala(mod, CO2 - pred))
   lines(GNP, pred, col=g + 1L)
  }
  min.M <- rowMins(maha)
  crit  <- sum(min.M)
  z     <- max.col(maha == min.M)
  points(GNP, CO2, cex=min.M, lwd=2, col=z + 1L)
  text(GNP, CO2, col=z + 1L, 
       labels=replace(as.character(CO2data$country), which(min.M <= 1), ""))
}
crit

Show the NEWS file

Description

Show the NEWS file of the MoEClust package.

Usage

MoE_news()
MoE_news()

Value

The MoEClust NEWS file, provided the session is interactive.

Examples

MoE_news()
MoE_news()

Model Selection Criteria Plot for MoEClust Mixture Models

Description

Plots the BIC, ICL, AIC, or log-likelihood values of a fitted MoEClust object.

Usage

MoE_plotCrit(res,
             criterion = c("bic", "icl", "aic", "loglik", "df", "iters"),
             ...)
MoE_plotCrit(res,
             criterion = c("bic", "icl", "aic", "loglik", "df", "iters"),
             ...)

Arguments

`res`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`criterion`	The criterion to be plotted. Defaults to `"bic"`. Recall that `MoE_control` only allows `"bic"`, `"icl"`, and `"aic"` to be used as model selection criteria within `MoE_clust`. The same applies to `MoE_control`. Uppercase `crit` will be coerced to lowercase.
`...`	Catches other arguments, or additional arguments to be passed to `plot.mclustBIC` (or equivalent functions for the other `criterion` arguments). In particular, the argument `legendArgs` to `plot.mclustBIC` can be passed.

Value

A plot of the values of the chosen criterion. The values themselves can also be returned invisibly.

Note

plot.MoEClust is a wrapper to MoE_plotCrit which accepts the default arguments, and also produces other types of plots.

Author(s)

Keefe Murphy - <[email protected]>

Examples

# data(ais)
# res   <- MoE_clust(ais[,3:7], expert= ~ sex, network.data=ais)
# (crit <- MoE_plotCrit(res))

# Plots can also be produced directly
# plot(res$ICL)
# data(ais)
# res   <- MoE_clust(ais[,3:7], expert= ~ sex, network.data=ais)
# (crit <- MoE_plotCrit(res))

# Plots can also be produced directly
# plot(res$ICL)

Plot MoEClust Gating Network

Description

Plots the gating network for fitted MoEClust models, i.e. the observation index against the mixing proportions for that observation, coloured by cluster.

Usage

MoE_plotGate(res,
             x.axis = NULL,
             type = "b",
             pch = 1,
             xlab = "Observation",
             ylab = expression(widehat(tau)[g]),
             ylim = c(0, 1),
             col = NULL,
             ...)
MoE_plotGate(res,
             x.axis = NULL,
             type = "b",
             pch = 1,
             xlab = "Observation",
             ylab = expression(widehat(tau)[g]),
             ylim = c(0, 1),
             col = NULL,
             ...)

Arguments

`res`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`x.axis`	Optional argument for the x-axis against which the mixing proportions are plotted. Defaults to `1:res$n` if missing. Supplying `x.axis` changes the defaults for the `type` and `xlab` arguments. Users are advised to only use quantities related to the gating network of the fitted model here. Furthermore, use of the `x.axis` argument is not recommended for models with more than one gating network covariate.
`type`, `pch`, `xlab`, `ylab`, `ylim`, `col`	These graphical parameters retain their definitions from `matplot`. `col` defaults to the settings in `mclust.options`. Note that the default value of `type` changes depending on whether `x.axis` is supplied and whether the gating network contains multiple covariates &/or categorical covariates.
`...`	Catches unused arguments, or additional arguments to be passed to `matplot`.

Value

A plot of the gating network of the fitted MoEClust model. The parameters of the gating network can also be returned invisibly.

Note

plot.MoEClust is a wrapper to MoE_plotGate which accepts the default arguments, and also produces other types of plots.

By default, the noise component (if any) will be coloured "darkgrey".

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
res   <- MoE_clust(ais[,3:7], gating= ~ BMI, G=3, modelNames="EEE",
                   network.data=ais, noise.gate=FALSE, tau0=0.1)

# Plot against the observation index and examine the gating network coefficients
(gate <- MoE_plotGate(res))

# Plot against BMI
MoE_plotGate(res, x.axis=ais$BMI, xlab="BMI")

# Plot against a categorical covariate
res2  <- MoE_clust(ais[,3:7], gating= ~ sex, G=3, modelNames="EVE", network.data=ais)
MoE_plotGate(res2, x.axis=ais$sex, xlab="sex")
data(ais)
res   <- MoE_clust(ais[,3:7], gating= ~ BMI, G=3, modelNames="EEE",
                   network.data=ais, noise.gate=FALSE, tau0=0.1)

# Plot against the observation index and examine the gating network coefficients
(gate <- MoE_plotGate(res))

# Plot against BMI
MoE_plotGate(res, x.axis=ais$BMI, xlab="BMI")

# Plot against a categorical covariate
res2  <- MoE_clust(ais[,3:7], gating= ~ sex, G=3, modelNames="EVE", network.data=ais)
MoE_plotGate(res2, x.axis=ais$sex, xlab="sex")

Plot the Log-Likelihood of a MoEClust Mixture Model

Description

Plots the log-likelihood at every iteration of the EM/CEM algorithm used to fit a MoEClust mixture model.

Usage

MoE_plotLogLik(res,
               type = "l",
               xlab = "Iteration",
               ylab = "Log-Likelihood",
               xaxt = "n",
               ...)
MoE_plotLogLik(res,
               type = "l",
               xlab = "Iteration",
               ylab = "Log-Likelihood",
               xaxt = "n",
               ...)

Arguments

`res`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`type`, `xlab`, `ylab`, `xaxt`	These graphical parameters retain their usual definitions from `plot`.
`...`	Catches unused arguments, or additional arguments to be passed to `plot`.

Value

A plot of the log-likelihood versus the number EM iterations. A list with the vector of log-likelihood values and the final value at convergence can also be returned invisibly.

Note

plot.MoEClust is a wrapper to MoE_plotLogLik which accepts the default arguments, and also produces other types of plots.

res$LOGLIK can also be plotted, to compare maximal log-likelihood values for all fitted models.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ BMI, expert= ~ sex, tau0=0.1,
                 G=2, modelNames="EVE", network.data=ais)
(ll <- MoE_plotLogLik(res))
data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ BMI, expert= ~ sex, tau0=0.1,
                 G=2, modelNames="EVE", network.data=ais)
(ll <- MoE_plotLogLik(res))

Plot the Similarity Matrix of a MoEClust Mixture Model

Description

Produces a heatmap of the similarity matrix constructed from the res$z matrix at convergence of a MoEClust mixture model.

Usage

MoE_Similarity(res,
               col = grDevices::heat.colors(30L, rev=TRUE), 
               reorder = TRUE, 
               legend = TRUE,
               ...)
MoE_Similarity(res,
               col = grDevices::heat.colors(30L, rev=TRUE), 
               reorder = TRUE, 
               legend = TRUE,
               ...)

Arguments

`res`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`col`	A vector of colours as per `image`. Will be checked for validity.
`reorder`	A logical (defaults to `TRUE`) indicating whether observations should be reordered for visual clarity.
`legend`	A logical (defaults to `TRUE`) indicating whether to append a colour key legend.
`...`	Catches unused arguments, or arguments to be passed to `hclust` when `reorder=TRUE`.

Value

The similarity matrix in the form of a heatmap is plotted; the matrix itself can also be returned invisibly. The invisibly returned matrix will also be reordered if reordered=TRUE.

Note

plot.MoEClust is a wrapper to MoE_Similarity which accepts the default arguments, and also produces other types of plots.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
mod <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", gating= ~ SSF + Ht,
                 expert= ~ sex, network.data=ais, tau0=0.1, noise.gate=FALSE)
sim <- MoE_Similarity(mod)
data(ais)
mod <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", gating= ~ SSF + Ht,
                 expert= ~ sex, network.data=ais, tau0=0.1, noise.gate=FALSE)
sim <- MoE_Similarity(mod)

Stepwise model/variable selection for MoEClust models

Description

Conducts a greedy forward stepwise search to identify the optimal MoEClust model according to some criterion. Components and/or gating covariates and/or expert covariates are added to new MoE_clust fits at each step, while each step is evaluated for all valid modelNames.

Usage

MoE_stepwise(data,
             network.data = NULL,
             gating = NULL,
             expert = NULL,
             modelNames = NULL,
             fullMoE = FALSE,
             noise = FALSE,
             initialModel = NULL,
             initialG = NULL,
             stepG = TRUE,
             criterion = c("bic", "icl", "aic"),
             equalPro = c("all", "both", "yes", "no"),
             noise.gate = c("all", "both", "yes", "no"),
             verbose = interactive(),
             ...)
MoE_stepwise(data,
             network.data = NULL,
             gating = NULL,
             expert = NULL,
             modelNames = NULL,
             fullMoE = FALSE,
             noise = FALSE,
             initialModel = NULL,
             initialG = NULL,
             stepG = TRUE,
             criterion = c("bic", "icl", "aic"),
             equalPro = c("all", "both", "yes", "no"),
             noise.gate = c("all", "both", "yes", "no"),
             verbose = interactive(),
             ...)

Arguments

`data`	A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.
`network.data`	An optional matrix or data frame in which to look for the covariates specified in the `gating` &/or `expert` networks, if any. Must include column names. Columns in `network.data` corresponding to columns in `data` will be automatically removed. While a single covariate can be supplied as a vector (provided the '`$`' operator or '`[]`' subset operator are not used), it is safer to supply a named 1-column matrix or data frame in this instance.
`gating`	A vector giving the names of columns in `network.data` used to define the scope of the gating network. By default, the initial model will contain no covariates (unless `initialModel` is supplied with gating covariates), thereafter all variables in `gating` (save for those in `initialModel`, if any) will be considered for inclusion where appropriate. If `gating` is not supplied (or set to `NULL`), all variables in `network.data` will be considered for the gating network. `gating` can also be supplied as `NA`, in which case no gating network covariates will ever be considered (save for those in `initialModel`, if any). Supplying `gating` and `expert` can be used to ensure different subsets of covariates enter different parts of the model.
`expert`	A vector giving the names of columns in `network.data` used to define the scope of the expert network. By default, the initial model will contain no covariates (unless `initialModel` is supplied with expert covariates), thereafter all variables in `expert` (save for those in `initialModel`, if any) will be considered for inclusion where appropriate. If `expert` is not supplied (or set to `NULL`), all variables in `network.data` will be considered for the expert network. `expert` can also be supplied as `NA`, in which case no expert network covariates will ever be considered (save for those in `initialModel`, if any). Supplying `expert` and `gating` can be used to ensure different subsets of covariates enter different parts of the model.
`modelNames`	A character string of valid model names, to be used to restrict the size of the search space, if desired. By default, all valid model types are explored. Rather than considering the changing of the model type as an additional step, every step is evaluated over all entries in `modelNames`. See `MoE_clust` for more details. Note that if `initialModel` is supplied (see below), `modelNames` will be augmented with `initialModel$modelName` if needs be.
`fullMoE`	A logical which, when `TRUE`, ensures that only models where the same covariates enter both parts of the model (the gating and expert networks) are considered. This restricts the search space to exclude models where covariates differ across networks. Thus, the search is likely to be faster, at the expense of potentially missing out on optimal models. Defaults to `FALSE`. Furthermore, when `TRUE`, the set of candidate covariates is automatically taken to be the union of the named covariates in `gating` and `expert`, for convenience. In other words, `gating=NA` will only work if `expert=NA` also, and both should be set to `NULL` in order to consider all potential covariates. In addition, caution is advised using this argument in conjunction with `initialModel`, which must satisfy the constraint that the same set of covariates be used in both parts of the model, for initial models where gating covariates are allowable. Finally, note that this argument does not preclude a model with only expert covariates included if the number of components is such that the inclusion of gating covariates is infeasible.
`noise`	A logical indicating whether to assume all models contain an additional noise component (`TRUE`) or not (`FALSE`, the default). If `initialModel` or `initialG` is not specified, the search starts from a `G=0` noise-only model when `noise` is `TRUE`, otherwise the search starts from a `G=1` model with no covariates when `noise` is `FALSE`. See `MoE_control` for more details. Note, however, that if the model specified in `initialModel` contains a noise component, the value of the `noise` argument will be overridden to `TRUE`; similarly, if the `initialModel` model does not contain a noise component, `noise` will be overridden to `FALSE`.
`initialModel`	An object of class `"MoEClust"` generated by `MoE_clust` or an object of class `"MoECompare"` generated by `MoE_compare`. This gives the initial model to use at the first step of the selection algorithm, to which components and/or covariates etc. can be added. Especially useful if the model is expected to have more than one component a priori (see `initialG` below as an alternative). The `initialModel` model must have been fitted to the same data in `data`. If `initialModel` is not specified, the search starts from a `G=0` noise-only model when `noise` is `TRUE`, otherwise the search starts from a `G=1` model with no covariates when `noise` is `FALSE`. If `initialModel` is supplied and it contains a noise component, only models with a noise component will be considered thereafter (i.e. the `noise` argument can be overridden by the `initialModel` argument). If `initialModel` contains gating &/or expert covariates, these covariates will be included in all subsequent searches, with covariates in `expert` and `gating` still considered as candidates for additional inclusion, as normal. However, while `initialModel` can include covariates not specified in `gating` &/or `expert`, the `initialModel$modelName` should be included in the specified `modelNames`; if it is not, `modelNames` will be forcibly augmented with `initialModel$modelName` (as stated above). Furthermore, it is assumed that `initialModel` is already optimal with respect to the model type. If it is not, the algorithm may be liable to converge to a sub-optimal model, and so a warning will be printed if the function suspects that this might be the case.
`initialG`	A single (positive) integer giving the number of mixture components (clusters) to initialise the stepwise search algorithm with. This is a simpler alternative to the `initialModel` argument, to be used when the only prior knowledge relates to the number of components, and not other features of the model (e.g. the covariates which should be included). Consequently, `initialG` is only relevant when `initialModel` is not supplied. When neither `initialG` nor `initialModel` is specified, the search starts from a `G=0` noise-only model when `noise` is `TRUE`, otherwise the search starts from a `G=1` model with no covariates when `noise` is `FALSE`. See `stepG` below for fixing the number of components at this `initialG` value.
`stepG`	A logical indicating whether the algorithm should consider incrementing the number of components at each step. Defaults to `TRUE`; use `FALSE` when searching only over configurations with the same number of components is of interest. Setting `stepG` to `FALSE` is possible with or without specifying `initialModel` or `initialG`, but is primarily intended for use when one of these arguments is supplied, otherwise the algorithm will be stuck forever with only one component.
`criterion`	The model selection criterion used to determine the optimal action at each step. Defaults to `"bic"`.
`equalPro`	A character string indicating whether models with equal mixing proportions should be considered. `"both"` means models with both equal and unequal mixing proportions will be considered, `"yes"` means only models with equal mixing proportions will be considered, and `"no"` means only models with unequal mixing proportions will be considered. Notably, no setting for `equalPro` is enough to rule out models with `gating` covariates from consideration. The default (`"all"`) is equivalent to `"both"` with the addition that all possible mixing proportion constraints will be tried for the `initialModel` (if any, provided it doesn't contain gating covariate(s)) or `initialG` before adding a component or additional covariates; otherwise, this `equalPro` argument only governs whether mixing proportion constraints are considered as components are added. Considering `"all"` (or `"both"`) equal and unequal mixing proportion models increases the search space and the computational burden, but this argument becomes irrelevant after a model, if any, with gating network covariate(s) is considered optimal for a given step. The `"all"` default is strongly recommended so that viable candidate models are not missed out on, particularly when `initialModel` or `initialG` are given. However, this does not guarantee that an optimal model will not be skipped; if `equalPro` is restricted via `"yes"` or `"no"`, a suboptimal model at one step may ultimately lead to a better final model, in some edge cases. See `MoE_control` for more details.
`noise.gate`	A character string indicating whether models where the gating network for the noise component depends on covariates are considered. `"yes"` means only models where this is the case will be considered, `"no"` means only models for which the noise component's mixing proportion is constant will be considered and `"both"` means both of these scenarios will be considered. The default (`"all"`) is equivalent to `"both"` with the addition that all possible gating network noise settings will be tried for the `initialModel` (if any, provided it contains gating covariates and a noise component) before adding a component or additional covariates; otherwise, this `noise.gate` argument only governs the inclusion/exclusion of this constraint as components or covariates are added. Considering `"all"` (or `"both"`) settings increases the search space and the computational burden, but this argument is only relevant when `noise=TRUE` and `gating` covariates are being considered. The `"all"` default is strongly recommended so that viable candidate models are not missed out on, particularly when `initialModel` or `initialG` are given. However, this does not guarantee that an optimal model will not be skipped; if `noise.gate` is restricted via `"yes"` or `"no"`, a suboptimal model at one step may ultimately lead to a better final model, in some edge cases. See `MoE_control` for more details.
`verbose`	Logical indicating whether to print messages pertaining to progress to the screen during fitting. By default is `TRUE` if the session is interactive, and `FALSE` otherwise. If `FALSE`, warnings and error messages will still be printed to the screen, but everything else will be suppressed.
`...`	Additional arguments to `MoE_control`, except for those arguments of the same name which are already listed here, e.g. `equalPro` and `noise.gate`. Note that these arguments will be supplied to all candidate models for every step. For arguments specific to `MoE_control` (e.g. `stopping`, `algo`, etc.), it is recommended to run `MoE_stepwise` multiple times while toggling these arguments, if desired.

Details

The arguments modelNames, equalPro, and noise.gate are provided for computational convenience. They can be used to reduce the number of models under consideration at each stage.

The same is true of the arguments gating and expert, which can each separately (or jointly, if fullMoE is TRUE) be made to consider all variables in network.data, or a subset, or none at all.

Finally, initialModel or initialG can be used to kick-start the search algorithm by incorporating prior information in a more direct way; in the latter case, only in the form of the number of components; in the former case, a full model with a given number of components, certain included gating and expert network covariates, and a certain model type can give the model an even more informed head start. In either case, the stepG argument can be used to fix the number of components and only search over different configurations of covariates.

Without any prior information, it is best to accept the defaults at the expense of a longer run-time.

Value

An object of class "MoECompare" containing information on all visited models and the optimal model (accessible via x$optimal).

Note

It is advised to run this function once with noise=FALSE and once with noise=TRUE and then choose the optimal model across both sets of results.

At present, only additions (of components and covariates) are considered. In future updates, it may be possible to allow both additions and removals.

The function will attempt to remove duplicate variables found in both data and network.data; in particular, they will be removed from network.data. Users are however advised to carefully specify data and network.data such that there are no duplicates, especially if the desired variable(s) should belong to network.data.

Finally, if the user intends to search for the best model according to the "icl" criterion, then specifying either initialModel or initialG is advisable. This is because the algorithm otherwise starts with a single component and thus there is no entropy term, meaning the stepwise search can quickly and easily get stuck at G=1. See the examples below.

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

# data(CO2data)
# Search over all models where the single covariate can enter either network
# (mod1  <- MoE_stepwise(CO2data$CO2, CO2data[,"GNP", drop=FALSE]))
#
# data(ais)
# Only look for EVE & EEE models with at most one expert network covariate
# Do not consider any gating covariates and only consider models with equal mixing proportions
# (mod2  <- MoE_stepwise(ais[,3:7], ais, gating=NA, expert="sex",
#                        equalPro="yes", modelNames=c("EVE", "EEE")))
#
# Look for models with noise & only those where the noise component's mixing proportion is constant
# Speed up the search with an initialModel, fix G, and restrict the covariates & model type
# init   <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", 
#                     expert= ~ sex, network.data=ais, tau0=0.1)
# (mod3  <- MoE_stepwise(ais[,3:7], ais, noise=TRUE, expert="sex",
#                        gating=c("SSF", "Ht"), noise.gate="no", 
#                        initialModel=init, stepG=FALSE, modelNames="EEE"))
#
# Compare both sets of results (with & without a noise component) for the ais data
# (comp1 <- MoE_compare(mod2, mod3, optimal.only=TRUE))
# comp1$optimal
#
# Target a model for the AIS data which is optimal in terms of ICL, without any restrictions
# mod4   <- MoE_stepwise(ais[,3:7], ais, criterion="icl")
# 
# This gets stuck at a G=1 model, so specify an initial G value as a head start
# mod5   <- MoE_stepwise(ais[,3:7], ais, criterion="icl", initialG=2)
#
# Check that specifying an initial G value enables a better model to be found
# (comp2 <- MoE_compare(mod4, mod5, optimal.only=TRUE, criterion="icl"))

# Finally, restrict the search to full MoE models only
# Notice that the candidate covariates are the union of gating and expert
# Notice also that the algorithm initially traverses models with only
#   expert covariates when the inclusion of gating covariates is infeasible
# mod6   <- MoE_stepwise(ais[,3:7], ais, fullMoE=TRUE, gating="BMI", expert="Bfat")
# data(CO2data)
# Search over all models where the single covariate can enter either network
# (mod1  <- MoE_stepwise(CO2data$CO2, CO2data[,"GNP", drop=FALSE]))
#
# data(ais)
# Only look for EVE & EEE models with at most one expert network covariate
# Do not consider any gating covariates and only consider models with equal mixing proportions
# (mod2  <- MoE_stepwise(ais[,3:7], ais, gating=NA, expert="sex",
#                        equalPro="yes", modelNames=c("EVE", "EEE")))
#
# Look for models with noise & only those where the noise component's mixing proportion is constant
# Speed up the search with an initialModel, fix G, and restrict the covariates & model type
# init   <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", 
#                     expert= ~ sex, network.data=ais, tau0=0.1)
# (mod3  <- MoE_stepwise(ais[,3:7], ais, noise=TRUE, expert="sex",
#                        gating=c("SSF", "Ht"), noise.gate="no", 
#                        initialModel=init, stepG=FALSE, modelNames="EEE"))
#
# Compare both sets of results (with & without a noise component) for the ais data
# (comp1 <- MoE_compare(mod2, mod3, optimal.only=TRUE))
# comp1$optimal
#
# Target a model for the AIS data which is optimal in terms of ICL, without any restrictions
# mod4   <- MoE_stepwise(ais[,3:7], ais, criterion="icl")
# 
# This gets stuck at a G=1 model, so specify an initial G value as a head start
# mod5   <- MoE_stepwise(ais[,3:7], ais, criterion="icl", initialG=2)
#
# Check that specifying an initial G value enables a better model to be found
# (comp2 <- MoE_compare(mod4, mod5, optimal.only=TRUE, criterion="icl"))

# Finally, restrict the search to full MoE models only
# Notice that the candidate covariates are the union of gating and expert
# Notice also that the algorithm initially traverses models with only
#   expert covariates when the inclusion of gating covariates is infeasible
# mod6   <- MoE_stepwise(ais[,3:7], ais, fullMoE=TRUE, gating="BMI", expert="Bfat")

Plot Clustering Uncertainties

Description

Plots the clustering uncertainty for every observation from a fitted "MoEClust" model, including models with a noise component.

Usage

MoE_Uncertainty(res,
                type = c("barplot", "profile"),
                truth = NULL,
                decreasing = FALSE,
                col = c("cluster", "uncertain", "none"),
                rug1d = TRUE,
                ...)
MoE_Uncertainty(res,
                type = c("barplot", "profile"),
                truth = NULL,
                decreasing = FALSE,
                col = c("cluster", "uncertain", "none"),
                rug1d = TRUE,
                ...)

Arguments

`res`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`type`	The type of plot to be produced (defaults to `"barplot"`). The `"profile"` option instead displays uncertainties in increasing/decreasing order of magnitude (see `decreasing`).
`truth`	An optional argument giving the true classification of the data. When `truth` is supplied and `type="barplot"`, misclassified observations are highlighted in a different colour, otherwise observations with uncertainty greater than `1/res$G` are given in a different colour. When `truth` is supplied and `type="profile"`, the uncertainty of misclassified observations is marked by vertical lines on the plot.
`decreasing`	A logical indicating whether uncertainties should be ordered in decreasing order (defaults to `FALSE`). Only relevant when `type="profile"`.
`col`	Only relevant when `truth=NULL`. A character string indicating whether points should be coloured according to their `"cluster"`-memberships (the default) in `res$classification`, according to whether they exceed the threshold to be deemed `"uncertain"`, or not at all (`"none"`). `"cluster"` is inadvisable when the number of clusters `res$G` is large.
`rug1d`	A logical which is relevant only when `type="barplot"` and `res$d == 1`, i.e. the model was fitted to univariate data. When `rug1d=TRUE`, the x-axis is given by the actual observations (indicated with a rug), otherwise the x-axis is given by the indices of the observations as per similar plots when `res$d > 1`.
`...`	Catches unused arguments.

Details

The y-axis of this plot runs from 0 to 1 - 1/res$G, with a horizontal line also drawn at 1/res$G. When type="barplot", uncertainties greater than this value are given a different colour when truth is not supplied, otherwise misclassified observations are given a different colour. Note, however, that $G^{(0)}$ = res$G + 1 is used in place of res$G for models with a noise component.

Value

A plot showing the clustering uncertainty of each observation (sorted in increasing/decreasing order when type="profile"). The (unsorted) vector of uncertainties can also be returned invisibly. When truth is supplied, the indices of the misclassified observations are also invisibly returned.

Note

plot.MoEClust is a wrapper to MoE_Uncertainty which accepts the default arguments, and also produces other types of plots.

An alternative means of visualising clustering uncertainties (at least for multivariate data) is provided by the functions MoE_gpairs and plot.MoEClust, specifically when their argument response.type is given as "uncertainty".

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ sex, G=3, modelNames="EEE", network.data=ais)

# Produce an uncertainty barplot
MoE_Uncertainty(res)

# Change the colour scheme
MoE_Uncertainty(res, col="uncertain")

# Produce an uncertainty profile plot
MoE_Uncertainty(res, type="profile")

# Let's assume the true clusters correspond to sex
(ub <- MoE_Uncertainty(res, truth=ais$sex))
(up <- MoE_Uncertainty(res, type="profile", truth=ais$sex))

# Examine the effect of rug1d for univariate models
mod <- MoE_clust(CO2data$CO2, expert=~GNP, G=2, modelNames="V", network.data=CO2data)
MoE_Uncertainty(mod, rug1d=FALSE)
MoE_Uncertainty(mod)
data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ sex, G=3, modelNames="EEE", network.data=ais)

# Produce an uncertainty barplot
MoE_Uncertainty(res)

# Change the colour scheme
MoE_Uncertainty(res, col="uncertain")

# Produce an uncertainty profile plot
MoE_Uncertainty(res, type="profile")

# Let's assume the true clusters correspond to sex
(ub <- MoE_Uncertainty(res, truth=ais$sex))
(up <- MoE_Uncertainty(res, type="profile", truth=ais$sex))

# Examine the effect of rug1d for univariate models
mod <- MoE_clust(CO2data$CO2, expert=~GNP, G=2, modelNames="V", network.data=CO2data)
MoE_Uncertainty(mod, rug1d=FALSE)
MoE_Uncertainty(mod)

Approximate Hypervolume Estimate

Description

Computes simple approximations to the hypervolume of univariate and multivariate data sets. Also returns the location of the centre of mass.

Usage

noise_vol(data,
          method = c("hypvol", "convexhull", "ellipsoidhull"),
          reciprocal = FALSE)
noise_vol(data,
          method = c("hypvol", "convexhull", "ellipsoidhull"),
          reciprocal = FALSE)

Arguments

`data`	A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed, and covariates should not be included. If a matrix or data frame, rows correspond to observations and columns correspond to variables. There must be more observations than variables.
`method`	The method used to estimate the hypervolume. The default method uses the function `hypvol`. The `"convexhull"` and `"ellipsoidhull"` options require loading the geometry and cluster packages, respectively. This argument is only relevant for multivariate data; for univariate data, the range of the data is used. Note that the `"convexhull"` `method` is liable to be slow when `data` has many columns.
`reciprocal`	A logical variable indicating whether or not the reciprocal hypervolume is desired rather than the hypervolume itself. The default is to return the hypervolume.

Value

A list with the following two elements:

vol

A hypervolume estimate (or its inverse).

This can be used as the hypervolume parameter for the noise component when observations are designated as noise in MoE_clust.

loc

A vector of length ncol(data) giving the location of the centre of mass.

This can help in predicting the fitted values of models fitted with noise components via MoE_clust.

Note

This function is called when adding a noise component to MoEClust models via the function MoE_control, specifically using its arguments noise.meth &/or tau0. The function internally only uses the response variables, and not the covariates. However, one can bypass the invocation of this function by specifying the noise.vol argument of MoE_control directly. This is explicitly necessary for models for high-dimensional data which include a noise component for which this function cannot estimate a (hyper)volume.

Note that supplying the volume manually to MoE_clust can affect the summary of the means in parameters$mean and by extension the location of the MVN ellipses in MoE_gpairs plots for models with both expert network covariates and a noise component. The location cannot be estimated when the volume is supplied manually; in this case, prediction is made on the basis of renormalising the z matrix after discarding the column corresponding to the noise component. Otherwise, the mean of the noise component is accounted for. The renormalisation approach can be forced by specifying noise.args$discard.noise=TRUE, even when the mean of the noise component is available.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(ais)
noise_vol(ais[,3:7], reciprocal=TRUE)

noise_vol(ais[,3:7], reciprocal=FALSE, method="convexhull")
data(ais)
noise_vol(ais[,3:7], reciprocal=TRUE)

noise_vol(ais[,3:7], reciprocal=FALSE, method="convexhull")

Plot MoEClust Results

Description

Plot results for fitted MoE_clust mixture models with gating &/or expert network covariates: generalised pairs plots, model selection criteria, the log-likelihood vs. the EM iterations, and the gating network are all currently visualisable.

Usage

## S3 method for class 'MoEClust'
plot(x,
     what = c("gpairs", "gating", "criterion", "loglik", "similarity", "uncertainty"),
     ...)
## S3 method for class 'MoEClust'
plot(x,
     what = c("gpairs", "gating", "criterion", "loglik", "similarity", "uncertainty"),
     ...)

Arguments

x

An object of class "MoEClust" generated by MoE_clust, or an object of class "MoECompare" generated by MoE_compare. Models with a noise component are facilitated here too.

what

The type of graph requested:

gpairs: A generalised pairs plot. To further customise this plot, arguments to MoE_gpairs can be supplied.
gating: The gating network. To further customise this plot, arguments to MoE_plotGate and matplot can be supplied.
criterion: The model selection criteria. To further customise this plot, arguments to MoE_plotCrit and plot.mclustBIC can be supplied.
loglik: The log-likelihood vs. the iterations of the EM algorithm. To further customise this plot, arguments to MoE_plotLogLik and plot can be supplied.
similarity: The similarity matrix constructed from x$z at convergence, in the form of a heatmap. To further customise this plot, arguments to MoE_Similarity can be supplied.
uncertainty: The clustering uncertainty for every observation. To further customise this plot, arguments to MoE_Uncertainty can be supplied.

By default, all of the above graphs are produced.

...

Optional arguments to be passed to MoE_gpairs, MoE_plotGate, MoE_plotCrit, MoE_plotLogLik, MoE_Similarity, MoE_Uncertainty, matplot, plot.mclustBIC and plot. In particular, the argument legendArgs to plot.mclustBIC can be passed to MoE_plotCrit.

Details

For more flexibility in plotting, use MoE_gpairs, MoE_plotGate, MoE_plotCrit, MoE_plotLogLik, MoE_Similarity, and MoE_Uncertainty directly.

Value

The visualisation according to what of the results of a fitted MoEClust model.

Note

Other plotting options are available by first calling as.Mclust on the fitted object, and then calling plot.Mclust on the results. However, caution is advised for models with gating &/or expert covariates (see the Note in as.Mclust).

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ BMI, expert= ~ sex,
                 G=2, modelNames="EVE", network.data=ais)

# Plot the gating network
plot(res, what="gating", x.axis=ais$BMI, xlab="BMI")

# Plot the log-likelihood
plot(res, what="loglik", col="blue")

# Plot the uncertainty profile
plot(res, what="uncertainty", type="profile")

# Produce a generalised pairs plot
plot(res, what="gpairs")

# Produce a heatmap of the similarity matrix
plot(res, what="similarity")

# Modify the gpairs plot by passing arguments to MoE_gpairs()
plot(res, what="gpairs", response.type="density", varwidth=TRUE, show.dens=TRUE,
     data.ind=c(5,3,4,1,2), jitter=FALSE, show.counts=FALSE)
data(ais)
res <- MoE_clust(ais[,3:7], gating= ~ BMI, expert= ~ sex,
                 G=2, modelNames="EVE", network.data=ais)

# Plot the gating network
plot(res, what="gating", x.axis=ais$BMI, xlab="BMI")

# Plot the log-likelihood
plot(res, what="loglik", col="blue")

# Plot the uncertainty profile
plot(res, what="uncertainty", type="profile")

# Produce a generalised pairs plot
plot(res, what="gpairs")

# Produce a heatmap of the similarity matrix
plot(res, what="similarity")

# Modify the gpairs plot by passing arguments to MoE_gpairs()
plot(res, what="gpairs", response.type="density", varwidth=TRUE, show.dens=TRUE,
     data.ind=c(5,3,4,1,2), jitter=FALSE, show.counts=FALSE)

Predictions from MoEClust expert networks

Description

Predictions (point estimates) of observation-specific component means from each (non-noise) component's expert network linear regression.

Usage

## S3 method for class 'MoE_expert'
predict(object,
        newdata = NULL,
        simplify = FALSE,
        droplevels = FALSE,
        ...)

## S3 method for class 'MoE_expert'
fitted(object,
       ...)

## S3 method for class 'MoE_expert'
residuals(object,
          ...)
## S3 method for class 'MoE_expert'
predict(object,
        newdata = NULL,
        simplify = FALSE,
        droplevels = FALSE,
        ...)

## S3 method for class 'MoE_expert'
fitted(object,
       ...)

## S3 method for class 'MoE_expert'
residuals(object,
          ...)

Arguments

`object`	An object of class `"MoE_expert"` (typically `x$expert`, where `x` is of class `"MoEClust"`).
`newdata`	A matrix or data frame of test examples. If omitted, the fitted values are used.
`simplify`	Logical indicating whether to simplify the output (in the form of a list) to a 3-dimensional array with dimensions given by the number of new observations, the number of variables, and the number of clusters. The first dimension of such an array is of length `1` when there are no expert network covariates, in which case the entries correspond to `object$parameters$mean`. Defaults to `FALSE`.
`droplevels`	A logical indicating whether unseen factor levels in categorical variables within `newdata` should be dropped (with `NA` predicted in their place). Defaults to `FALSE`. See `drop_levels`.
`...`	Catches unused arguments or allows the `simplify` argument to be passed through `fitted` and `residuals`.

Details

This function is effectively just a shortcut to lapply(x$expert, predict.lm, newdata=...). It can also be thought of as a wrapper to predict.MoEClust(x, ...)$mean, although it returns a list (by default) rather than a 3-dimensional array and also always preserves the dimensions of newdata, even for models without expert network covariates.

Value

For simplify=FALSE, either a list of vectors or predictions (for univariate data) or a list of matrices of predictions (for multivariate data). These lists are of the same length as number of non-noise components in the fitted model. When simplify=TRUE, a 3-dimensional array of predictions is returned, with respective dimensions given by the number of observations, variables, and non-noise components.

Author(s)

Keefe Murphy - <[email protected]>

Examples

data(CO2data)
res <- MoE_clust(CO2data$CO2, G=3, equalPro=TRUE, expert= ~ GNP, network.data=CO2data)
predict(res$expert)

# Try with newdata and simplify=TRUE
predict(res$expert, newdata=CO2data[1:5,"GNP", drop=FALSE], simplify=TRUE)
data(CO2data)
res <- MoE_clust(CO2data$CO2, G=3, equalPro=TRUE, expert= ~ GNP, network.data=CO2data)
predict(res$expert)

# Try with newdata and simplify=TRUE
predict(res$expert, newdata=CO2data[1:5,"GNP", drop=FALSE], simplify=TRUE)

Predictions from MoEClust gating networks

Description

Predicts mixing proportions from MoEClust gating networks. Effectively akin to predicting from a multinomial logistic regression via multinom, although here the noise component (if any) is properly accounted for. So too are models with no gating covariates at all, or models with the equal mixing proportion constraint. Prior probabilities are returned by default.

Usage

## S3 method for class 'MoE_gating'
predict(object,
        newdata = NULL,
        type = c("probs", "class"),
        keep.noise = TRUE,
        droplevels = FALSE,
        ...)

## S3 method for class 'MoE_gating'
fitted(object,
       ...)

## S3 method for class 'MoE_gating'
residuals(object,
          ...)
## S3 method for class 'MoE_gating'
predict(object,
        newdata = NULL,
        type = c("probs", "class"),
        keep.noise = TRUE,
        droplevels = FALSE,
        ...)

## S3 method for class 'MoE_gating'
fitted(object,
       ...)

## S3 method for class 'MoE_gating'
residuals(object,
          ...)

Arguments

`object`	An object of class `"MoE_gating"` (typically `x$gating`, where `x` is of class `"MoEClust"`).
`newdata`	A matrix or data frame of test examples. If omitted, the fitted values are used.
`type`	The type of output desired. The default (`"probs"`) returns prior probabilities, while `"class"` returns labels indicating the most likely group a priori. Note that observations classified assigned the noise component (if any) are given a label of `0`.
`keep.noise`	A logical indicating whether the output should acknowledge the noise component (if any). Defaults to `TRUE`; when `FALSE`, this column is discarded and the matrix of probabilities is renormalised accordingly.
`droplevels`	A logical indicating whether unseen factor levels in categorical variables within `newdata` should be dropped (with `NA` predicted in their place). Defaults to `FALSE`. See `drop_levels`.
`...`	Catches unused arguments or allows the `type` and `keep.noise` arguments to be passed through `fitted` and the `keep.noise` argument to be passed through `residuals`.

Details

This function is effectively a shortcut to predict.MoEClust(x, ...)$pro, which (unlike the predict method for multinom on which predict.MoE_gating is based) accounts for the various ways of treating gating covariates, equal mixing proportion constraints, and noise components, although its type argument defaults to "probs" rather than "class".

Value

The return value depends on whether newdata is supplied or not and whether the model includes gating covariates to begin with. When newdata is not supplied, the fitted values are returned (as a matrix if the model contained gating covariates, otherwise as a vector as per x$parameters$pro). If newdata is supplied, the output is always a matrix with the same number of rows as the newdata.

Note

Note that the keep.noise argument does not correspond in any way to the discard.noise argument to predict.MoEClust; there, the noise component is respected in the computation of the mixing proportions and only discarded (if at all) in the prediction of the responses.

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

data(ais)
mod    <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", gating= ~ SSF + Ht,
                 expert= ~ sex, network.data=ais, tau0=0.1, noise.gate=FALSE)
(preds <- predict(mod$gating, newdata=ais[1:5,]))

all.equal(preds, predict(mod, newdata=ais[1:5,])$pro) #TRUE

# Note that the predictions are not the same as the multinom predict method
# in this instance, owing to the invocation of noise.gate=FALSE above
mod2   <- mod 
class(mod2$gating) <- c("multinom", "nnet")
predict(mod2$gating, newdata=ais[1:5,], type="probs")

# We can make this function behave in the same way by invoking keep.noise=FALSE
predict(mod$gating, keep.noise=FALSE, newdata=ais[1:5,])

# ... although keep.noise=FALSE in predict.MoE_gating does not
# yield the same output as discard.noise=TRUE in predict.MoEClust
predict(mod, discard.noise=TRUE, newdata=ais[1:5,])$pro
data(ais)
mod    <- MoE_clust(ais[,3:7], G=2, modelNames="EEE", gating= ~ SSF + Ht,
                 expert= ~ sex, network.data=ais, tau0=0.1, noise.gate=FALSE)
(preds <- predict(mod$gating, newdata=ais[1:5,]))

all.equal(preds, predict(mod, newdata=ais[1:5,])$pro) #TRUE

# Note that the predictions are not the same as the multinom predict method
# in this instance, owing to the invocation of noise.gate=FALSE above
mod2   <- mod 
class(mod2$gating) <- c("multinom", "nnet")
predict(mod2$gating, newdata=ais[1:5,], type="probs")

# We can make this function behave in the same way by invoking keep.noise=FALSE
predict(mod$gating, keep.noise=FALSE, newdata=ais[1:5,])

# ... although keep.noise=FALSE in predict.MoE_gating does not
# yield the same output as discard.noise=TRUE in predict.MoEClust
predict(mod, discard.noise=TRUE, newdata=ais[1:5,])$pro

Predictions for MoEClust models

Description

Predicts both cluster membership probabilities and fitted response values from a MoEClust model, using covariates and response data, or covariates only. The predicted MAP classification, mixing proportions, and component means are all also reported in both cases, as well as the predictions of the expert network corresponding to the most probable component.

Usage

## S3 method for class 'MoEClust'
predict(object,
        newdata,
        resid = FALSE,
        discard.noise = FALSE,
        MAPresids = FALSE,
        use.y = TRUE,
        ...)

## S3 method for class 'MoEClust'
fitted(object,
       ...)

## S3 method for class 'MoEClust'
residuals(object,
          newdata,
          ...)
## S3 method for class 'MoEClust'
predict(object,
        newdata,
        resid = FALSE,
        discard.noise = FALSE,
        MAPresids = FALSE,
        use.y = TRUE,
        ...)

## S3 method for class 'MoEClust'
fitted(object,
       ...)

## S3 method for class 'MoEClust'
residuals(object,
          newdata,
          ...)

Arguments

`object`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Predictions for models with a noise component are facilitated here too (see `discard.noise`).
`newdata`	A list with two named components, each of which must be a `data.frame` or `matrix` with named columns, giving the data for which predictions are desired. `new.x` The new covariates for the `gating` &/or `expert` networks. Must be supplied when `newdata$new.y` is supplied. `new.y` (Optional) response data (see `use.y` below). When supplied, cluster and response prediction is based on both `newdata$new.x` and `newdata$new.y`, otherwise only on the covariates in `newdata$new.x`. If supplied as a list with elements `new.x` and `new.y`, both must have the same number of rows. Alternatively, a single `data.frame` or `matrix` can be supplied and an attempt will be made to extract & separate covariate and response columns (if any) into `newdata$new.x` and `newdata$new.y` based on the variable names in `object$data` and `object$net.covs`. When `newdata` is not supplied in any way, the covariates and response variables used in the fitting of the model are used here. It is possible to not supply `new.y` and to supply an empty `data.frame` or `matrix` for `new.x` (or to equivalently supply an empty `data.frame` or `matrix` for `newdata` itself) for models with no covariates of any kind, which effectively predicts the weighted mean of the component means.
`resid`	A logical indicating whether to return the residuals also. Defaults to `FALSE`. Only allowed when response variables are supplied in some form. The function `residuals` is a wrapper to `predict` with the argument `resid` set to `TRUE`, with only the residuals returned.
`discard.noise`	A logical governing how predictions of the responses are made for models with a noise component (otherwise this argument is irrelevant). By default (`FALSE`), the mean of the noise component is accounted for. Otherwise, or when the mean of the noise component is unavailable (due to having been manually supplied through `MoE_control` via `noise.args$noise.vol`), prediction of the responses is performed using a `z` matrix which is renormalised after discarding the column corresponding to the noise component. The renormalisation approach can be forced by specifying `TRUE`, even when the mean of the noise component is available. For models with a noise component fitted with `algo="CEM"`, a small extra E-step is conducted for observations assigned to the non-noise components in this case.
`MAPresids`	A logical indicating whether residuals are computed against `y` (`TRUE`, the default) or `MAPy` when `FALSE`. Not relevant for models with equal mixing proportions when only `new.x` is available. See Value below for more details.
`use.y`	A logical indicating whether the response variables (if any are supplied either via `new.y` or via `newdata` itself) are actually used in the prediction. Defaults to `TRUE`, but useful when `FALSE` for computing residuals as though only the covariates in `new.x` were supplied. For out-of-sample prediction, typically `new.y` would not be supplied anyway and so the `use.y=TRUE` default becomes irrelevant.
`...`	Catches unused arguments (and allows the `predict` arguments `discard.noise` &/or `use.y` to be passed through `fitted` or the `discard.noise`, `MAPresids`, and/or `use.y` arguments to be passed through `residuals`).

Details

Predictions can also be made for models with a noise component, in which case z will include the probability of belonging to "Cluster0" & classification will include labels with the value 0 for observations classified as noise (if any). The argument discard.noise governs how the responses are predicted in the presence of a noise component (see noise_vol for more details).

Note that the argument discard.noise is invoked for any models with a noise component, while the similar MoE_control argument noise.args$discard.noise is only invoked for models with both a noise component and expert network covariates.

Please be aware that a model considered optimal from a clustering point of view may not necessarily be optimal from a prediction point of view. In particular, full MoE models with covariates in both networks (for which both the cluster membership probabilities and component means are observation-specific) are recommended for out-of-sample prediction when only new covariates are observed (see new.x and new.y above, as well as use.y).

Value

A list with the following named components, regardless of whether newdata$new.x and newdata$new.y were used, or newdata$new.x only.

`y`	Aggregated fitted values of the response variables.
`z`	A matrix whose `[i,k]`-th entry is the probability that observation i of the `newdata` belongs to the k-th component. For models with a noise component, the final column gives the probability of belonging to the so-called Cluster0.
`classification`	The vector of predicted cluster labels for the `newdata`. `0` is returned for observations assigned to the noise component.
`pro`	The predicted mixing proportions for the `newdata`, i.e. predicted values of the gating network. `object$parameters$pro` is returned for models without gating network covariates. See `predict.MoE_gating`.
`mean`	The predicted component means for the `newdata`, i.e. predicted values of the expert network. Given as a 3-dimensional array with dimensions given by the number of new observations, the number of variables, and the number of clusters. The first dimension is of length `1` when there are no expert network covariates, in which case the entries correspond to `object$parameters$mean`. See `predict.MoE_expert`.
`MAPy`	Fitted values of the single expert network to which each observation is most probably assigned. Not returned for models with equal mixing proportions when only `new.x` is available. Likely to only be of use for models with gating and expert covariates when only `new.x` is supplied. Note that `MAPy` and `y` will coincide for models fitted via the CEM algorithm (see `MoE_control` and its argument `algo`).

When residuals is called, only the residuals (governed by MAPresids) are returned; when predict is called with resid=TRUE, the list above will also contain the element resids, containing the residuals.

The returned values of pro and mean are always the same, regardless of whether newdata$new.x and newdata$new.y were used, or newdata$new.x only.

Finally, fitted is simply a wrapper to predict.MoEClust(object)$y without any newdata, and with the resid and MAPresids arguments also ignored.

Note

Note that a dedicated predict function is also provided for objects of class "MoE_gating" (typically object$gating, where object is of class "MoEClust"). This function is effectively a shortcut to predict(object, ...)$pro, which (unlike the predict method for multinom on which it is based) accounts for the various ways of treating gating covariates and noise components, although its type argument defaults to "probs" rather than "class". Notably, its keep.noise argument behaves differently from the discard.noise argument here; here, the noise component is only discarded in the computation of the predicted responses. See predict.MoE_gating for further details.

Similarly, a dedicated predict function is also provided for objects of class "MoE_expert" (typically object$expert, where object is of class "MoE_expert"). This function is effectively a wrapper to predict(object, ...)$mean, albeit it returns a list (by default) rather than a 3-dimensional array and also always preserves the dimensions of newdata, even for models without expert network covariates. See predict.MoE_expert for further details.

Author(s)

Keefe Murphy - <[email protected]>

References

Examples

data(ais)
# Fit a MoEClust model and predict the same data
res     <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                     modelNames="EVE", network.data=ais)
pred1   <- predict(res)

# Get only the fitted responses
fits    <- fitted(res)
all.equal(pred1$y, fits) #TRUE

# Remove some rows of the data for prediction purposes
ind     <- sample(1:nrow(ais), 5)
dat     <- ais[-ind,]

# Fit another MoEClust model to the retained data
res2    <- MoE_clust(dat[,3:7], G=3, gating= ~ BMI + sex,
                     modelNames="EEE", network.data=dat)

# Predict held back data using the covariates & response variables
(pred2  <- predict(res2, newdata=ais[ind,]))
# pred2 <- predict(res2, newdata=list(new.y=ais[ind,3:7],
#                                     new.x=ais[ind,c("BMI", "sex")]))

# Get the residuals
residuals(res2, newdata=ais[ind,])

# Predict held back data using only the covariates
(pred3  <- predict(res2, newdata=ais[ind,], use.y=FALSE))
# pred3 <- predict(res2, newdata=list(new.x=ais[ind,c("BMI", "sex")]))
# pred3 <- predict(res2, newdata=ais[ind,c("BMI", "sex")])
data(ais)
# Fit a MoEClust model and predict the same data
res     <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                     modelNames="EVE", network.data=ais)
pred1   <- predict(res)

# Get only the fitted responses
fits    <- fitted(res)
all.equal(pred1$y, fits) #TRUE

# Remove some rows of the data for prediction purposes
ind     <- sample(1:nrow(ais), 5)
dat     <- ais[-ind,]

# Fit another MoEClust model to the retained data
res2    <- MoE_clust(dat[,3:7], G=3, gating= ~ BMI + sex,
                     modelNames="EEE", network.data=dat)

# Predict held back data using the covariates & response variables
(pred2  <- predict(res2, newdata=ais[ind,]))
# pred2 <- predict(res2, newdata=list(new.y=ais[ind,3:7],
#                                     new.x=ais[ind,c("BMI", "sex")]))

# Get the residuals
residuals(res2, newdata=ais[ind,])

# Predict held back data using only the covariates
(pred3  <- predict(res2, newdata=ais[ind,], use.y=FALSE))
# pred3 <- predict(res2, newdata=list(new.x=ais[ind,c("BMI", "sex")]))
# pred3 <- predict(res2, newdata=ais[ind,c("BMI", "sex")])

Quantile-Based Clustering for Univariate Data

Description

Returns a quantile-based clustering for univariate data.

Usage

quant_clust(x,
            G)
quant_clust(x,
            G)

Arguments

`x`	A vector of numeric data.
`G`	The desired number of clusters.

Value

The vector of cluster labels.

Examples

data(CO2data)
quant_clust(CO2data$CO2, G=2)
data(CO2data)
quant_clust(CO2data$CO2, G=2)

Package 'MoEClust'

Help Index

MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Description

Details

Usage

Author(s)

References

See Also

Examples

Australian Institute of Sport data

Description

Usage

Format

Details

References

Examples

Aitken Acceleration

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Convert MoEClust objects to the Mclust class

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

GNP and CO2 Data Set

Description

Usage

Format

References

Examples

Drop constant variables from a formula

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Examples

Drop unused factor levels to predict from unseen data

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Examples

Account for extra variability in covariance matrices with expert covariates

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Compute the Frobenius (adjusted) Rand index

Description

Usage

Arguments

Details

Value