2 title: Use morpheus package
10 ```{r setup, results="hide", include=FALSE}
11 knitr::opts_chunk$set(echo = TRUE, include = TRUE,
12 cache = TRUE, comment="", cache.lazy = FALSE,
13 out.width = "100%", fig.align = "center")
17 <!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
19 *morpheus* is a contributed R package which attempts to find the parameters of a
20 mixture of logistic classifiers.
21 When the data under study come from several groups that have different characteristics,
22 using mixture models is a very popular way to handle heterogeneity.
23 Thus, many algorithms were developed to deal with various mixtures models.
24 Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
25 *flexmix* is an R package which implements these kinds of algorithms.
27 However, one problem of such methods is that they can converge to local maxima,
28 so several starting points must be explored.
29 Recently, spectral methods were developed to bypass EM algorithms and they were proved
30 able to recover the directions of the regression parameter
31 in models with known link function and random covariates (see [XX]).
32 Our package extends such moment methods using least squares to get estimators of the
33 whole parameters (with theoretical garantees, see [XX]).
34 Currently it can handle only binary output $-$ which is a common case.
38 Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
39 A binary regression model assumes that for some link function $g$, the probability that
40 $Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
41 $\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
42 Popular examples of link functions are the logit link function where for any real $z$,
43 $g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
44 the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
45 Both are implemented in the package.
47 If now we want to modelise heterogeneous populations, let $K$ be the number of
48 populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
49 $\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
50 Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
51 by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
52 $b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
53 $b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
54 matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
55 The model of population mixture of binary regressions is given by:
59 \PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
62 ## Algorithm, theoretical garantees
64 The algorithm uses spectral properties of some tensor matrices to estimate the model
65 parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
66 proved that the algorithm converges to the correct values (its speed is known too).
67 For more informations on that subject, however, please refer to our article [XX].
68 In this vignette let's rather focus on package usage.
71 <!--We assume that the random variable $X$ has a Gaussian distribution.
72 We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
73 identity $d\times d$ matrix. All results may be easily extended to the situation
74 where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
75 symetric $d\times d$ matrix. ***** TODO: take this into account? -->
77 The two main functions are:
78 * computeMu(), which estimates the parameters directions, and
79 * optimParams(), which builds an object \code{o} to estimate all other parameters
80 when calling \code{o$run()}, starting from the directions obtained by the
82 A third function is useful to run Monte-Carlo or bootstrap estimations using
83 different models in various contexts: multiRun(). We'll show example for all of them.
85 ### Estimation of directions
87 In a real situation you would have (maybe after some pre-processing) the matrices
88 X and Y which contain vector inputs and binary output.
89 However, a function is provided in the package to generate such data following a
92 ```{r, results="show", include=TRUE, echo=TRUE}
94 io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit")
95 # io$X and io$Y contain the sample data
98 $n$ is the total number of samples (lines in X, number of elements in Y)
99 $p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced
100 from the others: $p$ elements sums to 1) [TODO: omega or p?]
101 $\beta$ is the matrix of linear coefficients, as written above in the model.
102 $b$ is the vector of intercepts (as in linear regression, and as in the model above)
103 link can be either "logit" or "probit", as mentioned earlier.
105 This function outputs a list containing in particular the matrices X and Y, allowing to
106 use the other functions (which all require either these, or the moments).
108 ```{r, results="show", include=TRUE, echo=TRUE}
109 mu <- computeMu(io$X, io$Y, optargs=list(K=2))
112 The optional argument, "optargs", is a list which can provide
113 * the number of clusters $K$,
114 * the moments matrix $M$ (computed with the "computeMoments()" function),
115 * the joint-diagonalisation method ("uwedge" or "jedi"),
116 * the number of random vectors for joint-diagonalization.
117 See ?computeMu and the code for more details.
119 ### Estimation of the other parameters
121 The other parameters are estimated by solving an optimization problem.
122 The following function builds and return an optimization algorithm object:
124 ```{r, results="show", include=TRUE, echo=TRUE}
125 M <- computeMoments(io$X, io$Y)
126 # X and Y must be provided if the moments matrix is not given
127 algopt <- optimParams(K=2, link="probit", optargs=list(M=M))
128 # Optimization starts at beta = mu, b = 0 and p = uniform distribution
129 x0 <- list(beta = mu)
130 theta <- algopt$run(x0)
133 Now theta is a list with three slots:
134 * $p$: estimated proportions,
135 * $\beta$: estimated regression matrix,
136 * $b$: estimated bias.
138 ### Monte-Carlo and bootstrap
140 The package provides a function to compare methods on several computations on random data.
141 It takes in input a list of parameters, then a list of functions which output some quantities
142 (on the first example, our "computeMu()" method versus flexmix way of estimating directions),
143 and finally a method to prepare the arguments which will be given to the functions in the
144 list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples
145 for each compared method. The two last arguments to "multiRun()" control the number of runs,
146 and the number of cores (using the package parallel).
148 ```{r, results="show", include=TRUE, echo=TRUE}
149 beta <- matrix(c(1,-2,3,1), ncol=2)
150 io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit")
151 mu <- normalize(beta)
153 # Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K
154 mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list(
159 computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs)
164 source("../patch_Bettina/FLXMRglm.R")
167 dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) )
168 out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K,
169 model=FLXMRglm(family="binomial") ) )
170 normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) )
172 prepareArgs = function(fargs,index) {
173 # Always include the non-shuffled dataset
175 fargs$ind <- 1:nrow(fargs$X)
177 fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE)
180 # The result is correct up to matrices columns permutations; align them:
182 mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact")
185 Several plots are available: histograms, boxplots, or curves of coefficients.
186 We illustrate boxplots and curves here (histograms function uses the same arguments,
190 # Second row, first column; morpheus on the left, flexmix on the right
191 plotBox(mr1, 2, 1, "Target value: -1")
194 ```{r, results="show", include=TRUE, echo=TRUE}
195 # Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b
196 mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list(
200 mu <- computeMu(fargs$X, fargs$Y, fargs$optargs)
201 optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta
206 source("../patch_Bettina/FLXMRglm.R")
207 dat <- as.data.frame( cbind(fargs$Y,fargs$X) )
208 out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K,
209 model=FLXMRglm(family="binomial") ) )
210 sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) )
212 prepareArgs = function(fargs,index) {
214 io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link)
217 fargs$K = ncol(fargs$beta)
218 fargs$link = fargs$optargs$link
219 fargs$optargs$M = computeMoments(io$X,io$Y)
222 # As in example 1, align results:
224 mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact")
228 # Second argument = true parameters matrix; third arg = index of method (here "morpheus")
229 plotCoefs(mr2, beta, 1)
230 # Real params are on the continous line; estimations = dotted line