vignettes/report.Rmd

   1 ---
   2 title: Use morpheus package
   3
   4 output:
   5   pdf_document:
   6     number_sections: true
   7     toc_depth: 1
   8 ---
   9
  10 ```{r setup,  results="hide", include=FALSE}
  11 knitr::opts_chunk$set(echo = TRUE, include = TRUE,
  12   cache = TRUE, comment="", cache.lazy = FALSE,
  13   out.width = "100%", fig.align = "center")
  14 ```
  15
  16 ## Introduction
  17 <!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
  18
  19 *morpheus* is a contributed R package which attempts to find the parameters of a
  20 mixture of logistic classifiers.
  21 When the data under study come from several groups that have different characteristics,
  22 using mixture models is a very popular way to handle heterogeneity.
  23 Thus, many algorithms were developed to deal with various mixtures models.
  24 Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
  25 *flexmix* is an R package which implements these kinds of algorithms.
  26
  27 However, one problem of such methods is that they can converge to local maxima,
  28 so several starting points must be explored.
  29 Recently, spectral methods were developed to bypass EM algorithms and they were proved
  30 able to recover the directions of the regression parameter
  31 in models with known link function and random covariates (see [XX]).
  32 Our package extends such moment methods using least squares to get estimators of the
  33 whole parameters (with theoretical garantees, see [XX]).
  34 Currently it can handle only binary output $-$ which is a common case.
  35
  36 ## Model
  37
  38 Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
  39 A binary regression model assumes that for some link function $g$, the probability that
  40 $Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
  41 $\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
  42 Popular examples of link functions are the logit link function where for any real $z$,
  43 $g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
  44 the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
  45 Both are implemented in the package.
  46
  47 If now we want to modelise heterogeneous populations, let $K$ be the number of
  48 populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
  49 $\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
  50 Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
  51 by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
  52 $b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
  53 $b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
  54 matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
  55 The model of population mixture of binary regressions is given by:
  56
  57 \begin{equation}
  58 \label{mixturemodel1}
  59 \PP_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
  60 \end{equation}
  61
  62 ## Algorithm, theoretical garantees
  63
  64 The algorithm uses spectral properties of some tensor matrices to estimate the model
  65 parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
  66 proved that the algorithm converges to the correct values (its speed is known too).
  67 For more informations on that subject, however, please refer to our article [XX].
  68 In this vignette let's rather focus on package usage.
  69
  70 ## Usage
  71 <!--We assume that the random variable $X$ has a Gaussian distribution.
  72 We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
  73 identity $d\times d$ matrix. All results may be easily extended to the situation
  74 where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
  75 symetric $d\times d$ matrix. ***** TODO: take this into account? -->
  76
  77 The two main functions are:
  78  * computeMu(), which estimates the parameters directions, and
  79  * optimParams(), which builds an object \code{o} to estimate all other parameters
  80    when calling \code{o$run()}, starting from the directions obtained by the
  81    previous function.
  82 A third function is useful to run Monte-Carlo or bootstrap estimations using
  83 different models in various contexts: multiRun(). We'll show example for all of them.
  84
  85 ### Estimation of directions
  86
  87 In a real situation you would have (maybe after some pre-processing) the matrices
  88 X and Y which contain vector inputs and binary output.
  89 However, a function is provided in the package to generate such data following a
  90 pre-defined law:
  91
  92 io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit")
  93
  94 n is the total number of samples (lines in X, number of elements in Y)
  95 p is a vector of proportions, of size d-1 (because the last proportion is deduced from
  96   the others: p elements sums to 1) [TODO: omega or p?]
  97 beta is the matrix of linear coefficients, as written above in the model.
  98 b is the vector of intercepts (as in linear regression, and as in the model above)
  99 link can be either "logit" or "probit", as mentioned earlier.
 100
 101 This function outputs a list containing in particular the matrices X and Y, allowing to
 102 use the other functions (which all require either these, or the moments).
 103
 104 TODO: computeMu(), explain input/output
 105
 106 ### Estimation of other parameters
 107
 108 TODO: just run optimParams$run(...)
 109
 110 ### Monte-Carlo and bootstrap
 111
 112 TODO: show example comparison with flexmix, show plots.