[morpheus.git] / vignettes / report.Rmd

---
title: Use morpheus package

output:
  pdf_document:
    number_sections: true
    toc_depth: 1
---

```{r setup,  results="hide", include=FALSE}
knitr::opts_chunk$set(echo = TRUE, include = TRUE,
  cache = TRUE, comment="", cache.lazy = FALSE,
  out.width = "100%", fig.align = "center")
```

## Introduction
<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->

*morpheus* is a contributed R package which attempts to find the parameters of a mixture of logistic classifiers.
When the data under study come from several groups that have different characteristics, using mixture models is a very popular way to handle heterogeneity.
Thus, many algorithms were developed to deal with various mixtures models. Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
*flexmix* is an R package which implements these kinds of algorithms.

However, one problem of such methods is that they can converge to local maxima, so several starting points must be explored.
Recently, spectral methods were developed to bypass EM algorithms and they were proved able to recover the directions of the regression parameter
in models with known link function and random covariates (see [XX]).
Our package extends such moment methods using least squares to get estimators of the whole parameters (with theoretical garantees, see [XX]).
Currently it can handle only binary output $-$ which is a common case.

## Model

TODO: adapt

Let us denote $[n]$ the set $\lbrace 1,2,\ldots,n\rbrace$ and $e_i\in\mathbb{R}^d,$ the i-th canonical basis vector of $\mathbb{R}^d.$ Denote also $I_d\in\mathbb{R}^{d\times d}$ the identity matrix in $\mathbb{R}^{d}$. The tensor product of $p$ euclidean spaces $\mathbb{R}^{d_i},\,\,i\in [p]$ is noted $\bigotimes_{i=1}^p\mathbb{R}^{d_i}.$ $T$ is called a real p-th order tensor if $T\in \bigotimes_{i=1}^p\mathbb{R}^{d_i}.$ For $p=1,$ $T$ is a vector in $\mathbb{R}^d$ and for $p=2$, $T$ is a $d\times d$ real matrix. The $(i_1,i_2,\ldots,i_p)$-th coordinate of $T$ with respect the canonical basis is denoted   $T[i_1,i_2,\ldots,i_p]$, $ i_1,i_2,\ldots,i_p\in [d].$\\

\noindent
Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output. \\

\noindent
A binary regression model assumes that for some link function $g$, the probability that $Y=1$ conditionally to $X=x$ is given by $g(\langle \beta , x \rangle +b)$, where $\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept. Popular examples of link functions are the logit link function where for any real $z$,  $g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$  with $\Phi$  the cumulative distribution function of the standard normal ${\cal N}(0,1)$. \\
If now we want to modelise heterogeneous populations, let $K$ be the number of populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that $\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$. Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by $b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,   $b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$ matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
The model of population mixture of binary regressions is given by:
\begin{equation}
\label{mixturemodel1}
\PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
\end{equation}

\noindent
We assume that the random variable $X$ has a Gaussian distribution. We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the identity $d\times d$ matrix. All results may be easily extended to the situation where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and symetric $d\times d$ matrix. \\

\noindent

2) Algorithm (as in article)

TODO: find it...

The developed R-package is called \verb"morpheus" \cite{Loum_Auder} and divided into two main parts:
\begin{enumerate}
	\item the computation of the directions matrix $\mu$, based on the empirical
		cross-moments as described in the previous sections;
	\item the optimization of all parameters (including $\mu$), using the initially estimated
		directions as a starting point.
\end{enumerate}
The former is a straightforward translation of the mathematical formulas (file R/computeMu.R),
while the latter calls R constrOptim() method on the objective function expression and its
derivative (file R/optimParams.R). For usage examples, please refer to the package help.

3) Experiments: show package usage

\subsection{Experiments}
In this section, we evaluate our algorithm in a first step using mean squared error (MSE). In a second step, we compare experimentally our moments method (morpheus package \cite{Loum_Auder}) and the likelihood method (with felxmix package \cite{bg-papers:Gruen+Leisch:2007a}). 

TODO.........
Commit	Line	Data
3d5b5060	1	---
c83df166	2	title: Use morpheus package
3d5b5060 BA	3
	4	output:
	5	pdf_document:
	6	number_sections: true
	7	toc_depth: 1
	8	---
	9
	10	```{r setup, results="hide", include=FALSE}
	11	knitr::opts_chunk$set(echo = TRUE, include = TRUE,
	12	cache = TRUE, comment="", cache.lazy = FALSE,
	13	out.width = "100%", fig.align = "center")
	14	```
	15
c83df166 BA	16	## Introduction
c83df166 BA	17	<!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
3d5b5060	18
cff1083b BA	19	morpheus is a contributed R package which attempts to find the parameters of a mixture of logistic classifiers.
	20	When the data under study come from several groups that have different characteristics, using mixture models is a very popular way to handle heterogeneity.
	21	Thus, many algorithms were developed to deal with various mixtures models. Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
	22	flexmix is an R package which implements these kinds of algorithms.
3d5b5060	23
cff1083b BA	24	However, one problem of such methods is that they can converge to local maxima, so several starting points must be explored.
cff1083b BA	25	Recently, spectral methods were developed to bypass EM algorithms and they were proved able to recover the directions of the regression parameter
c83df166	26	in models with known link function and random covariates (see [XX]).
cff1083b BA	27	Our package extends such moment methods using least squares to get estimators of the whole parameters (with theoretical garantees, see [XX]).
cff1083b BA	28	Currently it can handle only binary output $-$ which is a common case.
3d5b5060	29
c83df166 BA	30	## Model
c83df166 BA	31
e36b1046	32	TODO: adapt
c83df166	33
e36b1046	34	Let us denote $[n]$ the set $\lbrace 1,2,\ldots,n\rbrace$ and $e_i\in\mathbb{R}^d,$ the i-th canonical basis vector of $\mathbb{R}^d.$ Denote also $I_d\in\mathbb{R}^{d\times d}$ the identity matrix in $\mathbb{R}^{d}$. The tensor product of $p$ euclidean spaces $\mathbb{R}^{d_i},\,\,i\in [p]$ is noted $\bigotimes_{i=1}^p\mathbb{R}^{d_i}.$ $T$ is called a real p-th order tensor if $T\in \bigotimes_{i=1}^p\mathbb{R}^{d_i}.$ For $p=1,$ $T$ is a vector in $\mathbb{R}^d$ and for $p=2$, $T$ is a $d\times d$ real matrix. The $(i_1,i_2,\ldots,i_p)$-th coordinate of $T$ with respect the canonical basis is denoted $T[i_1,i_2,\ldots,i_p]$, $ i_1,i_2,\ldots,i_p\in [d].$\\
3d5b5060	35
e36b1046 BA	36	\noindent
	37	Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output. \\
	38
	39	\noindent
	40	A binary regression model assumes that for some link function $g$, the probability that $Y=1$ conditionally to $X=x$ is given by $g(\langle \beta , x \rangle +b)$, where $\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept. Popular examples of link functions are the logit link function where for any real $z$, $g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$ the cumulative distribution function of the standard normal ${\cal N}(0,1)$. \\
	41	If now we want to modelise heterogeneous populations, let $K$ be the number of populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that $\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$. Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by $b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$, $b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$ matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
	42	The model of population mixture of binary regressions is given by:
	43	\begin{equation}
	44	\label{mixturemodel1}
	45	\PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
	46	\end{equation}
	47
	48	\noindent
	49	We assume that the random variable $X$ has a Gaussian distribution. We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the identity $d\times d$ matrix. All results may be easily extended to the situation where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and symetric $d\times d$ matrix. \\
	50
	51	\noindent
3d5b5060	52
cff1083b	53	2) Algorithm (as in article)
3d5b5060	54
e36b1046 BA	55	TODO: find it...
	56
	57	The developed R-package is called \verb"morpheus" \cite{Loum_Auder} and divided into two main parts:
	58	\begin{enumerate}
	59	\item the computation of the directions matrix $\mu$, based on the empirical
	60	cross-moments as described in the previous sections;
	61	\item the optimization of all parameters (including $\mu$), using the initially estimated
	62	directions as a starting point.
	63	\end{enumerate}
	64	The former is a straightforward translation of the mathematical formulas (file R/computeMu.R),
	65	while the latter calls R constrOptim() method on the objective function expression and its
	66	derivative (file R/optimParams.R). For usage examples, please refer to the package help.
	67
cff1083b	68	3) Experiments: show package usage
e36b1046 BA	69
	70	\subsection{Experiments}
	71	In this section, we evaluate our algorithm in a first step using mean squared error (MSE). In a second step, we compare experimentally our moments method (morpheus package \cite{Loum_Auder}) and the likelihood method (with felxmix package \cite{bg-papers:Gruen+Leisch:2007a}).
	72
	73	TODO.........