Update vignette + optim code
[morpheus.git] / vignettes / report.Rmd
CommitLineData
3d5b5060 1---
c83df166 2title: Use morpheus package
3d5b5060
BA
3
4output:
5 pdf_document:
6 number_sections: true
7 toc_depth: 1
8---
9
10```{r setup, results="hide", include=FALSE}
11knitr::opts_chunk$set(echo = TRUE, include = TRUE,
12 cache = TRUE, comment="", cache.lazy = FALSE,
13 out.width = "100%", fig.align = "center")
14```
15
c83df166
BA
16## Introduction
17<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
3d5b5060 18
dad25cd2
BA
19*morpheus* is a contributed R package which attempts to find the parameters of a
20mixture of logistic classifiers.
21When the data under study come from several groups that have different characteristics,
22using mixture models is a very popular way to handle heterogeneity.
23Thus, many algorithms were developed to deal with various mixtures models.
24Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
cff1083b 25*flexmix* is an R package which implements these kinds of algorithms.
3d5b5060 26
dad25cd2
BA
27However, one problem of such methods is that they can converge to local maxima,
28so several starting points must be explored.
29Recently, spectral methods were developed to bypass EM algorithms and they were proved
30able to recover the directions of the regression parameter
c83df166 31in models with known link function and random covariates (see [XX]).
dad25cd2
BA
32Our package extends such moment methods using least squares to get estimators of the
33whole parameters (with theoretical garantees, see [XX]).
cff1083b 34Currently it can handle only binary output $-$ which is a common case.
3d5b5060 35
c83df166
BA
36## Model
37
dad25cd2
BA
38Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
39A binary regression model assumes that for some link function $g$, the probability that
40$Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
41$\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
42Popular examples of link functions are the logit link function where for any real $z$,
43$g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
44the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
45Both are implemented in the package.
46
47If now we want to modelise heterogeneous populations, let $K$ be the number of
48populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
49$\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
50Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
51by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
52$b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
53$b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
54matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
e36b1046 55The model of population mixture of binary regressions is given by:
dad25cd2 56
e36b1046
BA
57\begin{equation}
58\label{mixturemodel1}
59\PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
60\end{equation}
61
dad25cd2 62## Algorithm, theoretical garantees
e36b1046 63
dad25cd2
BA
64The algorithm uses spectral properties of some tensor matrices to estimate the model
65parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
66proved that the algorithm converges to the correct values (its speed is known too).
67For more informations on that subject, however, please refer to our article [XX].
68In this vignette let's rather focus on package usage.
3d5b5060 69
dad25cd2 70## Usage
85e0343a
BA
71<!--We assume that the random variable $X$ has a Gaussian distribution.
72We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
73identity $d\times d$ matrix. All results may be easily extended to the situation
74where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
75symetric $d\times d$ matrix. ***** TODO: take this into account? -->
e36b1046 76
5859426b
BA
77The two main functions are:
78 * computeMu(), which estimates the parameters directions, and
79 * optimParams(), which builds an object \code{o} to estimate all other parameters
80 when calling \code{o$run()}, starting from the directions obtained by the
81 previous function.
82A third function is useful to run Monte-Carlo or bootstrap estimations using
83different models in various contexts: multiRun(). We'll show example for all of them.
e36b1046 84
5859426b 85### Estimation of directions
e36b1046 86
5859426b
BA
87In a real situation you would have (maybe after some pre-processing) the matrices
88X and Y which contain vector inputs and binary output.
89However, a function is provided in the package to generate such data following a
90pre-defined law:
e36b1046 91
d294ece1
BA
92```{r, results="show", include=TRUE, echo=TRUE}
93library(morpheus)
5859426b 94io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit")
d294ece1
BA
95# io$X and io$Y contain the sample data
96```
5859426b 97
d294ece1
BA
98$n$ is the total number of samples (lines in X, number of elements in Y)
99$p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced
100from the others: $p$ elements sums to 1) [TODO: omega or p?]
101$\beta$ is the matrix of linear coefficients, as written above in the model.
102$b$ is the vector of intercepts (as in linear regression, and as in the model above)
5859426b
BA
103link can be either "logit" or "probit", as mentioned earlier.
104
105This function outputs a list containing in particular the matrices X and Y, allowing to
106use the other functions (which all require either these, or the moments).
107
d294ece1
BA
108```{r, results="show", include=TRUE, echo=TRUE}
109mu <- computeMu(io$X, io$Y, optargs=list(K=2))
110```
5859426b 111
d294ece1
BA
112The optional argument, "optargs", is a list which can provide
113 * the number of clusters $K$,
114 * the moments matrix $M$ (computed with the "computeMoments()" function),
115 * the joint-diagonalisation method ("uwedge" or "jedi"),
116 * the number of random vectors for joint-diagonalization.
117See ?computeMu and the code for more details.
118
119### Estimation of the other parameters
120
121The other parameters are estimated by solving an optimization problem.
122The following function builds and return an optimization algorithm object:
123
124```{r, results="show", include=TRUE, echo=TRUE}
125M <- computeMoments(io$X, io$Y)
126# X and Y must be provided if the moments matrix is not given
127algopt <- optimParams(K=2, link="probit", optargs=list(M=M))
128# Optimization starts at beta = mu, b = 0 and p = uniform distribution
129x0 <- list(beta = mu)
130theta <- algopt$run(x0)
131```
5859426b 132
d294ece1
BA
133Now theta is a list with three slots:
134 * $p$: estimated proportions,
135 * $\beta$: estimated regression matrix,
136 * $b$: estimated bias.
5859426b
BA
137
138### Monte-Carlo and bootstrap
139
d294ece1
BA
140The package provides a function to compare methods on several computations on random data.
141It takes in input a list of parameters, then a list of functions which output some quantities
142(on the first example, our "computeMu()" method versus flexmix way of estimating directions),
143and finally a method to prepare the arguments which will be given to the functions in the
144list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples
145for each compared method. The two last arguments to "multiRun()" control the number of runs,
146and the number of cores (using the package parallel).
147
148```{r, results="show", include=TRUE, echo=TRUE}
149beta <- matrix(c(1,-2,3,1), ncol=2)
150io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit")
151mu <- normalize(beta)
152
153# Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K
154mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list(
155 # morpheus
156 function(fargs) {
157 library(morpheus)
158 ind <- fargs$ind
159 computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs)
160 },
161 # flexmix
162 function(fargs) {
163 library(flexmix)
164 source("../patch_Bettina/FLXMRglm.R")
165 ind <- fargs$ind
166 K <- fargs$optargs$K
167 dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) )
168 out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K,
169 model=FLXMRglm(family="binomial") ) )
170 normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) )
171 } ),
172 prepareArgs = function(fargs,index) {
173 # Always include the non-shuffled dataset
174 if (index == 1)
175 fargs$ind <- 1:nrow(fargs$X)
176 else
177 fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE)
178 fargs
179 }, N=10, ncores=3)
180# The result is correct up to matrices columns permutations; align them:
181for (i in 1:2)
182 mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact")
183```
184
185Several plots are available: histograms, boxplots, or curves of coefficients.
186We illustrate boxplots and curves here (histograms function uses the same arguments,
187see ?plotHist).
188
189```
190# Second row, first column; morpheus on the left, flexmix on the right
191plotBox(mr1, 2, 1, "Target value: -1")
192```
193
194```{r, results="show", include=TRUE, echo=TRUE}
195# Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b
196mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list(
197 # morpheus
198 function(fargs) {
199 library(morpheus)
200 mu <- computeMu(fargs$X, fargs$Y, fargs$optargs)
201 optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta
202 },
203 # flexmix
204 function(fargs) {
205 library(flexmix)
206 source("../patch_Bettina/FLXMRglm.R")
207 dat <- as.data.frame( cbind(fargs$Y,fargs$X) )
208 out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K,
209 model=FLXMRglm(family="binomial") ) )
210 sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) )
211 } ),
212 prepareArgs = function(fargs,index) {
213 library(morpheus)
214 io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link)
215 fargs$X = io$X
216 fargs$Y = io$Y
217 fargs$K = ncol(fargs$beta)
218 fargs$link = fargs$optargs$link
219 fargs$optargs$M = computeMoments(io$X,io$Y)
220 fargs
221 }, N=10, ncores=3)
222# As in example 1, align results:
223for (i in 1:2)
224 mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact")
225```
226
227```
228# Second argument = true parameters matrix; third arg = index of method (here "morpheus")
229plotCoefs(mr2, beta, 1)
230# Real params are on the continous line; estimations = dotted line
231```