[morpheus.git] / vignettes / report.Rmd

---
title: Use morpheus package

output:
  pdf_document:
    number_sections: true
    toc_depth: 1
---

```{r setup,  results="hide", include=FALSE}
knitr::opts_chunk$set(echo = TRUE, include = TRUE,
  cache = TRUE, comment="", cache.lazy = FALSE,
  out.width = "100%", fig.align = "center")
```

## Introduction
<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->

*morpheus* is a contributed R package which attempts to find the parameters of a
mixture of logistic classifiers.
When the data under study come from several groups that have different characteristics,
using mixture models is a very popular way to handle heterogeneity.
Thus, many algorithms were developed to deal with various mixtures models.
Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
*flexmix* is an R package which implements these kinds of algorithms.

However, one problem of such methods is that they can converge to local maxima,
so several starting points must be explored.
Recently, spectral methods were developed to bypass EM algorithms and they were proved
able to recover the directions of the regression parameter
in models with known link function and random covariates (see [XX]).
Our package extends such moment methods using least squares to get estimators of the
whole parameters (with theoretical garantees, see [XX]).
Currently it can handle only binary output $-$ which is a common case.

## Model

Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
A binary regression model assumes that for some link function $g$, the probability that
$Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
$\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
Popular examples of link functions are the logit link function where for any real $z$,
$g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
Both are implemented in the package.

If now we want to modelise heterogeneous populations, let $K$ be the number of
populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
$\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
$b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
$b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
The model of population mixture of binary regressions is given by:

\begin{equation}
\label{mixturemodel1}
\PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
\end{equation}

## Algorithm, theoretical garantees

The algorithm uses spectral properties of some tensor matrices to estimate the model
parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
proved that the algorithm converges to the correct values (its speed is known too).
For more informations on that subject, however, please refer to our article [XX].
In this vignette let's rather focus on package usage.

## Usage
<!--We assume that the random variable $X$ has a Gaussian distribution.
We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
identity $d\times d$ matrix. All results may be easily extended to the situation
where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
symetric $d\times d$ matrix. ***** TODO: take this into account? -->

The two main functions are:
 * computeMu(), which estimates the parameters directions, and
 * optimParams(), which builds an object \code{o} to estimate all other parameters
   when calling \code{o$run()}, starting from the directions obtained by the
   previous function.
A third function is useful to run Monte-Carlo or bootstrap estimations using
different models in various contexts: multiRun(). We'll show example for all of them.

### Estimation of directions

In a real situation you would have (maybe after some pre-processing) the matrices
X and Y which contain vector inputs and binary output.
However, a function is provided in the package to generate such data following a
pre-defined law:

```{r, results="show", include=TRUE, echo=TRUE}
library(morpheus)
io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit")
# io$X and io$Y contain the sample data
```

$n$ is the total number of samples (lines in X, number of elements in Y)
$p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced
from the others: $p$ elements sums to 1) [TODO: omega or p?]
$\beta$ is the matrix of linear coefficients, as written above in the model.
$b$ is the vector of intercepts (as in linear regression, and as in the model above)
link can be either "logit" or "probit", as mentioned earlier.

This function outputs a list containing in particular the matrices X and Y, allowing to
use the other functions (which all require either these, or the moments).

```{r, results="show", include=TRUE, echo=TRUE}
mu <- computeMu(io$X, io$Y, optargs=list(K=2))
```

The optional argument, "optargs", is a list which can provide
 * the number of clusters $K$,
 * the moments matrix $M$ (computed with the "computeMoments()" function),
 * the joint-diagonalisation method ("uwedge" or "jedi"),
 * the number of random vectors for joint-diagonalization.
See ?computeMu and the code for more details.

### Estimation of the other parameters

The other parameters are estimated by solving an optimization problem.
The following function builds and return an optimization algorithm object:

```{r, results="show", include=TRUE, echo=TRUE}
M <- computeMoments(io$X, io$Y)
# X and Y must be provided if the moments matrix is not given
algopt <- optimParams(K=2, link="probit", optargs=list(M=M))
# Optimization starts at beta = mu, b = 0 and p = uniform distribution
x0 <- list(beta = mu)
theta <- algopt$run(x0)
```

Now theta is a list with three slots:
 * $p$: estimated proportions,
 * $\beta$: estimated regression matrix,
 * $b$: estimated bias.

### Monte-Carlo and bootstrap

The package provides a function to compare methods on several computations on random data.
It takes in input a list of parameters, then a list of functions which output some quantities
(on the first example, our "computeMu()" method versus flexmix way of estimating directions),
and finally a method to prepare the arguments which will be given to the functions in the
list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples
for each compared method. The two last arguments to "multiRun()" control the number of runs,
and the number of cores (using the package parallel).

```{r, results="show", include=TRUE, echo=TRUE}
beta <- matrix(c(1,-2,3,1), ncol=2)
io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit")
mu <- normalize(beta)

# Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K
mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list(
  # morpheus
  function(fargs) {
    library(morpheus)
    ind <- fargs$ind
    computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs)
  },
  # flexmix
  function(fargs) {
    library(flexmix)
    source("../patch_Bettina/FLXMRglm.R")
    ind <- fargs$ind
    K <- fargs$optargs$K
    dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) )
    out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K,
      model=FLXMRglm(family="binomial") ) )
    normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) )
  } ),
  prepareArgs = function(fargs,index) {
    # Always include the non-shuffled dataset
    if (index == 1)
      fargs$ind <- 1:nrow(fargs$X)
    else
      fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE)
    fargs
  }, N=10, ncores=3)
# The result is correct up to matrices columns permutations; align them:
for (i in 1:2)
  mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact")
```

Several plots are available: histograms, boxplots, or curves of coefficients.
We illustrate boxplots and curves here (histograms function uses the same arguments,
see ?plotHist).

```
# Second row, first column; morpheus on the left, flexmix on the right
plotBox(mr1, 2, 1, "Target value: -1")
```

```{r, results="show", include=TRUE, echo=TRUE}
# Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b
mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list(
  # morpheus
  function(fargs) {
    library(morpheus)
    mu <- computeMu(fargs$X, fargs$Y, fargs$optargs)
    optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta
  },
  # flexmix
  function(fargs) {
    library(flexmix)
    source("../patch_Bettina/FLXMRglm.R")
    dat <- as.data.frame( cbind(fargs$Y,fargs$X) )
    out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K,
      model=FLXMRglm(family="binomial") ) )
    sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) )
  } ),
  prepareArgs = function(fargs,index) {
    library(morpheus)
    io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link)
    fargs$X = io$X
    fargs$Y = io$Y
    fargs$K = ncol(fargs$beta)
    fargs$link = fargs$optargs$link
    fargs$optargs$M = computeMoments(io$X,io$Y)
    fargs
  }, N=10, ncores=3)
# As in example 1, align results:
for (i in 1:2)
  mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact")
```

```
# Second argument = true parameters matrix; third arg = index of method (here "morpheus")
plotCoefs(mr2, beta, 1)
# Real params are on the continous line; estimations = dotted line
```
Commit	Line	Data
3d5b5060	1	---
c83df166	2	title: Use morpheus package
3d5b5060 BA	3
	4	output:
	5	pdf_document:
	6	number_sections: true
	7	toc_depth: 1
	8	---
	9
	10	```{r setup, results="hide", include=FALSE}
	11	knitr::opts_chunk$set(echo = TRUE, include = TRUE,
	12	cache = TRUE, comment="", cache.lazy = FALSE,
	13	out.width = "100%", fig.align = "center")
	14	```
	15
c83df166 BA	16	## Introduction
c83df166 BA	17	<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
3d5b5060	18
dad25cd2 BA	19	morpheus is a contributed R package which attempts to find the parameters of a
	20	mixture of logistic classifiers.
	21	When the data under study come from several groups that have different characteristics,
	22	using mixture models is a very popular way to handle heterogeneity.
	23	Thus, many algorithms were developed to deal with various mixtures models.
	24	Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
cff1083b	25	flexmix is an R package which implements these kinds of algorithms.
3d5b5060	26
dad25cd2 BA	27	However, one problem of such methods is that they can converge to local maxima,
	28	so several starting points must be explored.
	29	Recently, spectral methods were developed to bypass EM algorithms and they were proved
	30	able to recover the directions of the regression parameter
c83df166	31	in models with known link function and random covariates (see [XX]).
dad25cd2 BA	32	Our package extends such moment methods using least squares to get estimators of the
dad25cd2 BA	33	whole parameters (with theoretical garantees, see [XX]).
cff1083b	34	Currently it can handle only binary output $-$ which is a common case.
3d5b5060	35
c83df166 BA	36	## Model
c83df166 BA	37
dad25cd2 BA	38	Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
	39	A binary regression model assumes that for some link function $g$, the probability that
	40	$Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
	41	$\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
	42	Popular examples of link functions are the logit link function where for any real $z$,
	43	$g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
	44	the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
	45	Both are implemented in the package.
	46
	47	If now we want to modelise heterogeneous populations, let $K$ be the number of
	48	populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
	49	$\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
	50	Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
	51	by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
	52	$b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
	53	$b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
	54	matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
e36b1046	55	The model of population mixture of binary regressions is given by:
dad25cd2	56
e36b1046 BA	57	\begin{equation}
	58	\label{mixturemodel1}
	59	\PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
	60	\end{equation}
	61
dad25cd2	62	## Algorithm, theoretical garantees
e36b1046	63
dad25cd2 BA	64	The algorithm uses spectral properties of some tensor matrices to estimate the model
	65	parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
	66	proved that the algorithm converges to the correct values (its speed is known too).
	67	For more informations on that subject, however, please refer to our article [XX].
	68	In this vignette let's rather focus on package usage.
3d5b5060	69
dad25cd2	70	## Usage
85e0343a BA	71	<!--We assume that the random variable $X$ has a Gaussian distribution.
	72	We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
	73	identity $d\times d$ matrix. All results may be easily extended to the situation
	74	where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
	75	symetric $d\times d$ matrix. ***** TODO: take this into account? -->
e36b1046	76
5859426b BA	77	The two main functions are:
	78	* computeMu(), which estimates the parameters directions, and
	79	* optimParams(), which builds an object \code{o} to estimate all other parameters
	80	when calling \code{o$run()}, starting from the directions obtained by the
	81	previous function.
	82	A third function is useful to run Monte-Carlo or bootstrap estimations using
	83	different models in various contexts: multiRun(). We'll show example for all of them.
e36b1046	84
5859426b	85	### Estimation of directions
e36b1046	86
5859426b BA	87	In a real situation you would have (maybe after some pre-processing) the matrices
	88	X and Y which contain vector inputs and binary output.
	89	However, a function is provided in the package to generate such data following a
	90	pre-defined law:
e36b1046	91
d294ece1 BA	92	```{r, results="show", include=TRUE, echo=TRUE}
d294ece1 BA	93	library(morpheus)
5859426b	94	io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit")
d294ece1 BA	95	# io$X and io$Y contain the sample data
d294ece1 BA	96	```
5859426b	97
d294ece1 BA	98	$n$ is the total number of samples (lines in X, number of elements in Y)
	99	$p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced
	100	from the others: $p$ elements sums to 1) [TODO: omega or p?]
	101	$\beta$ is the matrix of linear coefficients, as written above in the model.
	102	$b$ is the vector of intercepts (as in linear regression, and as in the model above)
5859426b BA	103	link can be either "logit" or "probit", as mentioned earlier.
	104
	105	This function outputs a list containing in particular the matrices X and Y, allowing to
	106	use the other functions (which all require either these, or the moments).
	107
d294ece1 BA	108	```{r, results="show", include=TRUE, echo=TRUE}
	109	mu <- computeMu(io$X, io$Y, optargs=list(K=2))
	110	```
5859426b	111
d294ece1 BA	112	The optional argument, "optargs", is a list which can provide
	113	* the number of clusters $K$,
	114	* the moments matrix $M$ (computed with the "computeMoments()" function),
	115	* the joint-diagonalisation method ("uwedge" or "jedi"),
	116	* the number of random vectors for joint-diagonalization.
	117	See ?computeMu and the code for more details.
	118
	119	### Estimation of the other parameters
	120
	121	The other parameters are estimated by solving an optimization problem.
	122	The following function builds and return an optimization algorithm object:
	123
	124	```{r, results="show", include=TRUE, echo=TRUE}
	125	M <- computeMoments(io$X, io$Y)
	126	# X and Y must be provided if the moments matrix is not given
	127	algopt <- optimParams(K=2, link="probit", optargs=list(M=M))
	128	# Optimization starts at beta = mu, b = 0 and p = uniform distribution
	129	x0 <- list(beta = mu)
	130	theta <- algopt$run(x0)
	131	```
5859426b	132
d294ece1 BA	133	Now theta is a list with three slots:
	134	* $p$: estimated proportions,
	135	* $\beta$: estimated regression matrix,
	136	* $b$: estimated bias.
5859426b BA	137
	138	### Monte-Carlo and bootstrap
	139
d294ece1 BA	140	The package provides a function to compare methods on several computations on random data.
	141	It takes in input a list of parameters, then a list of functions which output some quantities
	142	(on the first example, our "computeMu()" method versus flexmix way of estimating directions),
	143	and finally a method to prepare the arguments which will be given to the functions in the
	144	list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples
	145	for each compared method. The two last arguments to "multiRun()" control the number of runs,
	146	and the number of cores (using the package parallel).
	147
	148	```{r, results="show", include=TRUE, echo=TRUE}
	149	beta <- matrix(c(1,-2,3,1), ncol=2)
	150	io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit")
	151	mu <- normalize(beta)
	152
	153	# Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K
	154	mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list(
	155	# morpheus
	156	function(fargs) {
	157	library(morpheus)
	158	ind <- fargs$ind
	159	computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs)
	160	},
	161	# flexmix
	162	function(fargs) {
	163	library(flexmix)
	164	source("../patch_Bettina/FLXMRglm.R")
	165	ind <- fargs$ind
	166	K <- fargs$optargs$K
	167	dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) )
	168	out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K,
	169	model=FLXMRglm(family="binomial") ) )
	170	normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) )
	171	} ),
	172	prepareArgs = function(fargs,index) {
	173	# Always include the non-shuffled dataset
	174	if (index == 1)
	175	fargs$ind <- 1:nrow(fargs$X)
	176	else
	177	fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE)
	178	fargs
	179	}, N=10, ncores=3)
	180	# The result is correct up to matrices columns permutations; align them:
	181	for (i in 1:2)
	182	mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact")
	183	```
	184
	185	Several plots are available: histograms, boxplots, or curves of coefficients.
	186	We illustrate boxplots and curves here (histograms function uses the same arguments,
	187	see ?plotHist).
	188
	189	```
	190	# Second row, first column; morpheus on the left, flexmix on the right
	191	plotBox(mr1, 2, 1, "Target value: -1")
	192	```
	193
	194	```{r, results="show", include=TRUE, echo=TRUE}
	195	# Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b
	196	mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list(
	197	# morpheus
	198	function(fargs) {
	199	library(morpheus)
	200	mu <- computeMu(fargs$X, fargs$Y, fargs$optargs)
	201	optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta
	202	},
	203	# flexmix
204	function(fargs) {
205	library(flexmix)
206	source("../patch_Bettina/FLXMRglm.R")
207	dat <- as.data.frame( cbind(fargs$Y,fargs$X) )
208	out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K,
209	model=FLXMRglm(family="binomial") ) )
210	sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) )
211	} ),
212	prepareArgs = function(fargs,index) {
213	library(morpheus)
214	io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link)
215	fargs$X = io$X
216	fargs$Y = io$Y
217	fargs$K = ncol(fargs$beta)
218	fargs$link = fargs$optargs$link
219	fargs$optargs$M = computeMoments(io$X,io$Y)
220	fargs
221	}, N=10, ncores=3)
222	# As in example 1, align results:
223	for (i in 1:2)
224	mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact")
225	```
226
227	```
228	# Second argument = true parameters matrix; third arg = index of method (here "morpheus")
229	plotCoefs(mr2, beta, 1)
230	# Real params are on the continous line; estimations = dotted line
231	```