[morpheus.git] / vignettes / report.Rmd

---
title: Use morpheus package

output:
  pdf_document:
    number_sections: true
    toc_depth: 1
---

```{r setup,  results="hide", include=FALSE}
knitr::opts_chunk$set(echo = TRUE, include = TRUE,
  cache = TRUE, comment="", cache.lazy = FALSE,
  out.width = "100%", fig.align = "center")
```

## Introduction
<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->

*morpheus* is a contributed R package which attempts to find the parameters of a
mixture of logistic classifiers.
When the data under study come from several groups that have different characteristics,
using mixture models is a very popular way to handle heterogeneity.
Thus, many algorithms were developed to deal with various mixtures models.
Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
*flexmix* is an R package which implements these kinds of algorithms.

However, one problem of such methods is that they can converge to local maxima,
so several starting points must be explored.
Recently, spectral methods were developed to bypass EM algorithms and they were proved
able to recover the directions of the regression parameter
in models with known link function and random covariates (see [XX]).
Our package extends such moment methods using least squares to get estimators of the
whole parameters (with theoretical garantees, see [XX]).
Currently it can handle only binary output $-$ which is a common case.

## Model

Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
A binary regression model assumes that for some link function $g$, the probability that
$Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
$\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
Popular examples of link functions are the logit link function where for any real $z$,
$g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
Both are implemented in the package.

If now we want to modelise heterogeneous populations, let $K$ be the number of
populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
$\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
$b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
$b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
The model of population mixture of binary regressions is given by:

\begin{equation}
\label{mixturemodel1}
\PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
\end{equation}

## Algorithm, theoretical garantees

The algorithm uses spectral properties of some tensor matrices to estimate the model
parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
proved that the algorithm converges to the correct values (its speed is known too).
For more informations on that subject, however, please refer to our article [XX].
In this vignette let's rather focus on package usage.

## Usage
<!--We assume that the random variable $X$ has a Gaussian distribution.
We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
identity $d\times d$ matrix. All results may be easily extended to the situation
where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
symetric $d\times d$ matrix. ***** TODO: take this into account? -->

TODO

3) Experiments: show package usage

\subsection{Experiments}
In this section, we evaluate our algorithm in a first step using mean squared error (MSE). In a second step, we compare experimentally our moments method (morpheus package \cite{Loum_Auder}) and the likelihood method (with felxmix package \cite{bg-papers:Gruen+Leisch:2007a}). 

TODO.........
Commit	Line	Data
3d5b5060	1	---
c83df166	2	title: Use morpheus package
3d5b5060 BA	3
	4	output:
	5	pdf_document:
	6	number_sections: true
	7	toc_depth: 1
	8	---
	9
	10	```{r setup, results="hide", include=FALSE}
	11	knitr::opts_chunk$set(echo = TRUE, include = TRUE,
	12	cache = TRUE, comment="", cache.lazy = FALSE,
	13	out.width = "100%", fig.align = "center")
	14	```
	15
c83df166 BA	16	## Introduction
c83df166 BA	17	<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
3d5b5060	18
dad25cd2 BA	19	morpheus is a contributed R package which attempts to find the parameters of a
	20	mixture of logistic classifiers.
	21	When the data under study come from several groups that have different characteristics,
	22	using mixture models is a very popular way to handle heterogeneity.
	23	Thus, many algorithms were developed to deal with various mixtures models.
	24	Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
cff1083b	25	flexmix is an R package which implements these kinds of algorithms.
3d5b5060	26
dad25cd2 BA	27	However, one problem of such methods is that they can converge to local maxima,
	28	so several starting points must be explored.
	29	Recently, spectral methods were developed to bypass EM algorithms and they were proved
	30	able to recover the directions of the regression parameter
c83df166	31	in models with known link function and random covariates (see [XX]).
dad25cd2 BA	32	Our package extends such moment methods using least squares to get estimators of the
dad25cd2 BA	33	whole parameters (with theoretical garantees, see [XX]).
cff1083b	34	Currently it can handle only binary output $-$ which is a common case.
3d5b5060	35
c83df166 BA	36	## Model
c83df166 BA	37
dad25cd2 BA	38	Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
	39	A binary regression model assumes that for some link function $g$, the probability that
	40	$Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
	41	$\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
	42	Popular examples of link functions are the logit link function where for any real $z$,
	43	$g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
	44	the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
	45	Both are implemented in the package.
	46
	47	If now we want to modelise heterogeneous populations, let $K$ be the number of
	48	populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
	49	$\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
	50	Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
	51	by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
	52	$b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
	53	$b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
	54	matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
e36b1046	55	The model of population mixture of binary regressions is given by:
dad25cd2	56
e36b1046 BA	57	\begin{equation}
	58	\label{mixturemodel1}
	59	\PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
	60	\end{equation}
	61
dad25cd2	62	## Algorithm, theoretical garantees
e36b1046	63
dad25cd2 BA	64	The algorithm uses spectral properties of some tensor matrices to estimate the model
	65	parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
	66	proved that the algorithm converges to the correct values (its speed is known too).
	67	For more informations on that subject, however, please refer to our article [XX].
	68	In this vignette let's rather focus on package usage.
3d5b5060	69
dad25cd2	70	## Usage
85e0343a BA	71	<!--We assume that the random variable $X$ has a Gaussian distribution.
	72	We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
	73	identity $d\times d$ matrix. All results may be easily extended to the situation
	74	where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
	75	symetric $d\times d$ matrix. ***** TODO: take this into account? -->
e36b1046	76
85e0343a	77	TODO
e36b1046	78
cff1083b	79	3) Experiments: show package usage
e36b1046 BA	80
	81	\subsection{Experiments}
	82	In this section, we evaluate our algorithm in a first step using mean squared error (MSE). In a second step, we compare experimentally our moments method (morpheus package \cite{Loum_Auder}) and the likelihood method (with felxmix package \cite{bg-papers:Gruen+Leisch:2007a}).
	83
	84	TODO.........