Error when K > d for computeMu and optimParams
[morpheus.git] / vignettes / report.Rmd
CommitLineData
3d5b5060 1---
c83df166 2title: Use morpheus package
3d5b5060
BA
3
4output:
5 pdf_document:
6 number_sections: true
7 toc_depth: 1
8---
9
1b8eb253
BA
10\renewcommand{\P}{\mathrm{P}}
11\newcommand{\R}{\mathbb{R}}
12
3d5b5060
BA
13```{r setup, results="hide", include=FALSE}
14knitr::opts_chunk$set(echo = TRUE, include = TRUE,
15 cache = TRUE, comment="", cache.lazy = FALSE,
16 out.width = "100%", fig.align = "center")
17```
18
c83df166
BA
19## Introduction
20<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
3d5b5060 21
dad25cd2
BA
22*morpheus* is a contributed R package which attempts to find the parameters of a
23mixture of logistic classifiers.
24When the data under study come from several groups that have different characteristics,
25using mixture models is a very popular way to handle heterogeneity.
26Thus, many algorithms were developed to deal with various mixtures models.
27Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
cff1083b 28*flexmix* is an R package which implements these kinds of algorithms.
3d5b5060 29
dad25cd2
BA
30However, one problem of such methods is that they can converge to local maxima,
31so several starting points must be explored.
32Recently, spectral methods were developed to bypass EM algorithms and they were proved
33able to recover the directions of the regression parameter
c83df166 34in models with known link function and random covariates (see [XX]).
dad25cd2
BA
35Our package extends such moment methods using least squares to get estimators of the
36whole parameters (with theoretical garantees, see [XX]).
cff1083b 37Currently it can handle only binary output $-$ which is a common case.
3d5b5060 38
c83df166
BA
39## Model
40
dad25cd2
BA
41Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
42A binary regression model assumes that for some link function $g$, the probability that
43$Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
44$\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
45Popular examples of link functions are the logit link function where for any real $z$,
46$g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
47the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
48Both are implemented in the package.
49
50If now we want to modelise heterogeneous populations, let $K$ be the number of
51populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
52$\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
53Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
54by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
55$b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
56$b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
57matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
e36b1046 58The model of population mixture of binary regressions is given by:
dad25cd2 59
e36b1046
BA
60\begin{equation}
61\label{mixturemodel1}
1b8eb253 62\P_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
e36b1046
BA
63\end{equation}
64
dad25cd2 65## Algorithm, theoretical garantees
e36b1046 66
dad25cd2
BA
67The algorithm uses spectral properties of some tensor matrices to estimate the model
68parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
69proved that the algorithm converges to the correct values (its speed is known too).
70For more informations on that subject, however, please refer to our article [XX].
71In this vignette let's rather focus on package usage.
3d5b5060 72
dad25cd2 73## Usage
85e0343a
BA
74<!--We assume that the random variable $X$ has a Gaussian distribution.
75We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
76identity $d\times d$ matrix. All results may be easily extended to the situation
77where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
78symetric $d\times d$ matrix. ***** TODO: take this into account? -->
e36b1046 79
5859426b
BA
80The two main functions are:
81 * computeMu(), which estimates the parameters directions, and
82 * optimParams(), which builds an object \code{o} to estimate all other parameters
83 when calling \code{o$run()}, starting from the directions obtained by the
84 previous function.
85A third function is useful to run Monte-Carlo or bootstrap estimations using
86different models in various contexts: multiRun(). We'll show example for all of them.
e36b1046 87
5859426b 88### Estimation of directions
e36b1046 89
5859426b
BA
90In a real situation you would have (maybe after some pre-processing) the matrices
91X and Y which contain vector inputs and binary output.
92However, a function is provided in the package to generate such data following a
93pre-defined law:
e36b1046 94
d294ece1
BA
95```{r, results="show", include=TRUE, echo=TRUE}
96library(morpheus)
5859426b 97io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit")
d294ece1
BA
98# io$X and io$Y contain the sample data
99```
5859426b 100
d294ece1
BA
101$n$ is the total number of samples (lines in X, number of elements in Y)
102$p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced
103from the others: $p$ elements sums to 1) [TODO: omega or p?]
104$\beta$ is the matrix of linear coefficients, as written above in the model.
105$b$ is the vector of intercepts (as in linear regression, and as in the model above)
5859426b
BA
106link can be either "logit" or "probit", as mentioned earlier.
107
108This function outputs a list containing in particular the matrices X and Y, allowing to
109use the other functions (which all require either these, or the moments).
110
d294ece1
BA
111```{r, results="show", include=TRUE, echo=TRUE}
112mu <- computeMu(io$X, io$Y, optargs=list(K=2))
113```
5859426b 114
d294ece1 115The optional argument, "optargs", is a list which can provide
1b8eb253 116
d294ece1
BA
117 * the number of clusters $K$,
118 * the moments matrix $M$ (computed with the "computeMoments()" function),
119 * the joint-diagonalisation method ("uwedge" or "jedi"),
120 * the number of random vectors for joint-diagonalization.
1b8eb253 121
d294ece1
BA
122See ?computeMu and the code for more details.
123
124### Estimation of the other parameters
125
126The other parameters are estimated by solving an optimization problem.
127The following function builds and return an optimization algorithm object:
128
129```{r, results="show", include=TRUE, echo=TRUE}
130M <- computeMoments(io$X, io$Y)
131# X and Y must be provided if the moments matrix is not given
132algopt <- optimParams(K=2, link="probit", optargs=list(M=M))
133# Optimization starts at beta = mu, b = 0 and p = uniform distribution
134x0 <- list(beta = mu)
135theta <- algopt$run(x0)
136```
5859426b 137
d294ece1 138Now theta is a list with three slots:
1b8eb253 139
d294ece1
BA
140 * $p$: estimated proportions,
141 * $\beta$: estimated regression matrix,
142 * $b$: estimated bias.
5859426b
BA
143
144### Monte-Carlo and bootstrap
145
d294ece1
BA
146The package provides a function to compare methods on several computations on random data.
147It takes in input a list of parameters, then a list of functions which output some quantities
148(on the first example, our "computeMu()" method versus flexmix way of estimating directions),
149and finally a method to prepare the arguments which will be given to the functions in the
150list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples
151for each compared method. The two last arguments to "multiRun()" control the number of runs,
152and the number of cores (using the package parallel).
153
154```{r, results="show", include=TRUE, echo=TRUE}
155beta <- matrix(c(1,-2,3,1), ncol=2)
156io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit")
157mu <- normalize(beta)
158
159# Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K
160mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list(
161 # morpheus
162 function(fargs) {
163 library(morpheus)
164 ind <- fargs$ind
165 computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs)
166 },
167 # flexmix
168 function(fargs) {
169 library(flexmix)
170 source("../patch_Bettina/FLXMRglm.R")
171 ind <- fargs$ind
172 K <- fargs$optargs$K
173 dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) )
174 out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K,
175 model=FLXMRglm(family="binomial") ) )
176 normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) )
177 } ),
178 prepareArgs = function(fargs,index) {
179 # Always include the non-shuffled dataset
180 if (index == 1)
181 fargs$ind <- 1:nrow(fargs$X)
182 else
183 fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE)
184 fargs
185 }, N=10, ncores=3)
186# The result is correct up to matrices columns permutations; align them:
187for (i in 1:2)
188 mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact")
189```
190
191Several plots are available: histograms, boxplots, or curves of coefficients.
192We illustrate boxplots and curves here (histograms function uses the same arguments,
193see ?plotHist).
194
1b8eb253 195```{r, results="show", include=TRUE, echo=TRUE}
d294ece1
BA
196# Second row, first column; morpheus on the left, flexmix on the right
197plotBox(mr1, 2, 1, "Target value: -1")
198```
199
200```{r, results="show", include=TRUE, echo=TRUE}
201# Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b
202mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list(
203 # morpheus
204 function(fargs) {
205 library(morpheus)
206 mu <- computeMu(fargs$X, fargs$Y, fargs$optargs)
207 optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta
208 },
209 # flexmix
210 function(fargs) {
211 library(flexmix)
212 source("../patch_Bettina/FLXMRglm.R")
213 dat <- as.data.frame( cbind(fargs$Y,fargs$X) )
214 out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K,
215 model=FLXMRglm(family="binomial") ) )
216 sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) )
217 } ),
218 prepareArgs = function(fargs,index) {
219 library(morpheus)
220 io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link)
221 fargs$X = io$X
222 fargs$Y = io$Y
223 fargs$K = ncol(fargs$beta)
224 fargs$link = fargs$optargs$link
225 fargs$optargs$M = computeMoments(io$X,io$Y)
226 fargs
227 }, N=10, ncores=3)
228# As in example 1, align results:
229for (i in 1:2)
230 mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact")
231```
232
1b8eb253 233```{r, results="show", include=TRUE, echo=TRUE}
d294ece1
BA
234# Second argument = true parameters matrix; third arg = index of method (here "morpheus")
235plotCoefs(mr2, beta, 1)
236# Real params are on the continous line; estimations = dotted line
237```