vignettes/report.Rmd

   1 ---
   2 title: Use morpheus package
   3
   4 output:
   5   pdf_document:
   6     number_sections: true
   7     toc_depth: 1
   8 ---
   9
  10 \renewcommand{\P}{\mathrm{P}}
  11 \newcommand{\R}{\mathbb{R}}
  12
  13 ```{r setup,  results="hide", include=FALSE}
  14 knitr::opts_chunk$set(echo = TRUE, include = TRUE,
  15   cache = TRUE, comment="", cache.lazy = FALSE,
  16   out.width = "100%", fig.align = "center")
  17 ```
  18
  19 ## Introduction
  20 <!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.-->
  21
  22 *morpheus* is a contributed R package which attempts to find the parameters of a
  23 mixture of logistic classifiers.
  24 When the data under study come from several groups that have different characteristics,
  25 using mixture models is a very popular way to handle heterogeneity.
  26 Thus, many algorithms were developed to deal with various mixtures models.
  27 Most of them use likelihood methods or Bayesian methods that are likelihood dependent.
  28 *flexmix* is an R package which implements these kinds of algorithms.
  29
  30 However, one problem of such methods is that they can converge to local maxima,
  31 so several starting points must be explored.
  32 Recently, spectral methods were developed to bypass EM algorithms and they were proved
  33 able to recover the directions of the regression parameter
  34 in models with known link function and random covariates (see [XX]).
  35 Our package extends such moment methods using least squares to get estimators of the
  36 whole parameters (with theoretical garantees, see [XX]).
  37 Currently it can handle only binary output $-$ which is a common case.
  38
  39 ## Model
  40
  41 Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output.
  42 A binary regression model assumes that for some link function $g$, the probability that
  43 $Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where
  44 $\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept.
  45 Popular examples of link functions are the logit link function where for any real $z$,
  46 $g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$
  47 the cumulative distribution function of the standard normal ${\cal N}(0,1)$.
  48 Both are implemented in the package.
  49
  50 If now we want to modelise heterogeneous populations, let $K$ be the number of
  51 populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that
  52 $\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$.
  53 Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population
  54 by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by
  55 $b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$,
  56 $b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$
  57 matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$.
  58 The model of population mixture of binary regressions is given by:
  59
  60 \begin{equation}
  61 \label{mixturemodel1}
  62 \P_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k).
  63 \end{equation}
  64
  65 ## Algorithm, theoretical garantees
  66
  67 The algorithm uses spectral properties of some tensor matrices to estimate the model
  68 parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be
  69 proved that the algorithm converges to the correct values (its speed is known too).
  70 For more informations on that subject, however, please refer to our article [XX].
  71 In this vignette let's rather focus on package usage.
  72
  73 ## Usage
  74 <!--We assume that the random variable $X$ has a Gaussian distribution.
  75 We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the
  76 identity $d\times d$ matrix. All results may be easily extended to the situation
  77 where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and
  78 symetric $d\times d$ matrix. ***** TODO: take this into account? -->
  79
  80 The two main functions are:
  81  * computeMu(), which estimates the parameters directions, and
  82  * optimParams(), which builds an object \code{o} to estimate all other parameters
  83    when calling \code{o$run()}, starting from the directions obtained by the
  84    previous function.
  85 A third function is useful to run Monte-Carlo or bootstrap estimations using
  86 different models in various contexts: multiRun(). We'll show example for all of them.
  87
  88 ### Estimation of directions
  89
  90 In a real situation you would have (maybe after some pre-processing) the matrices
  91 X and Y which contain vector inputs and binary output.
  92 However, a function is provided in the package to generate such data following a
  93 pre-defined law:
  94
  95 ```{r, results="show", include=TRUE, echo=TRUE}
  96 library(morpheus)
  97 io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit")
  98 # io$X and io$Y contain the sample data
  99 ```
 100
 101 $n$ is the total number of samples (lines in X, number of elements in Y)
 102 $p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced
 103 from the others: $p$ elements sums to 1) [TODO: omega or p?]
 104 $\beta$ is the matrix of linear coefficients, as written above in the model.
 105 $b$ is the vector of intercepts (as in linear regression, and as in the model above)
 106 link can be either "logit" or "probit", as mentioned earlier.
 107
 108 This function outputs a list containing in particular the matrices X and Y, allowing to
 109 use the other functions (which all require either these, or the moments).
 110
 111 ```{r, results="show", include=TRUE, echo=TRUE}
 112 mu <- computeMu(io$X, io$Y, optargs=list(K=2))
 113 ```
 114
 115 The optional argument, "optargs", is a list which can provide
 116
 117  * the number of clusters $K$,
 118  * the moments matrix $M$ (computed with the "computeMoments()" function),
 119  * the joint-diagonalisation method ("uwedge" or "jedi"),
 120  * the number of random vectors for joint-diagonalization.
 121
 122 See ?computeMu and the code for more details.
 123
 124 ### Estimation of the other parameters
 125
 126 The other parameters are estimated by solving an optimization problem.
 127 The following function builds and return an optimization algorithm object:
 128
 129 ```{r, results="show", include=TRUE, echo=TRUE}
 130 M <- computeMoments(io$X, io$Y)
 131 # X and Y must be provided if the moments matrix is not given
 132 algopt <- optimParams(K=2, link="probit", optargs=list(M=M))
 133 # Optimization starts at beta = mu, b = 0 and p = uniform distribution
 134 x0 <- list(beta = mu)
 135 theta <- algopt$run(x0)
 136 ```
 137
 138 Now theta is a list with three slots:
 139
 140  * $p$: estimated proportions,
 141  * $\beta$: estimated regression matrix,
 142  * $b$: estimated bias.
 143
 144 ### Monte-Carlo and bootstrap
 145
 146 The package provides a function to compare methods on several computations on random data.
 147 It takes in input a list of parameters, then a list of functions which output some quantities
 148 (on the first example, our "computeMu()" method versus flexmix way of estimating directions),
 149 and finally a method to prepare the arguments which will be given to the functions in the
 150 list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples
 151 for each compared method. The two last arguments to "multiRun()" control the number of runs,
 152 and the number of cores (using the package parallel).
 153
 154 ```{r, results="show", include=TRUE, echo=TRUE}
 155 beta <- matrix(c(1,-2,3,1), ncol=2)
 156 io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit")
 157 mu <- normalize(beta)
 158
 159 # Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K
 160 mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list(
 161   # morpheus
 162   function(fargs) {
 163     library(morpheus)
 164     ind <- fargs$ind
 165     computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs)
 166   },
 167   # flexmix
 168   function(fargs) {
 169     library(flexmix)
 170     source("../patch_Bettina/FLXMRglm.R")
 171     ind <- fargs$ind
 172     K <- fargs$optargs$K
 173     dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) )
 174     out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K,
 175       model=FLXMRglm(family="binomial") ) )
 176     normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) )
 177   } ),
 178   prepareArgs = function(fargs,index) {
 179     # Always include the non-shuffled dataset
 180     if (index == 1)
 181       fargs$ind <- 1:nrow(fargs$X)
 182     else
 183       fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE)
 184     fargs
 185   }, N=10, ncores=3)
 186 # The result is correct up to matrices columns permutations; align them:
 187 for (i in 1:2)
 188   mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact")
 189 ```
 190
 191 Several plots are available: histograms, boxplots, or curves of coefficients.
 192 We illustrate boxplots and curves here (histograms function uses the same arguments,
 193 see ?plotHist).
 194
 195 ```{r, results="show", include=TRUE, echo=TRUE}
 196 # Second row, first column; morpheus on the left, flexmix on the right
 197 plotBox(mr1, 2, 1, "Target value: -1")
 198 ```
 199
 200 ```{r, results="show", include=TRUE, echo=TRUE}
 201 # Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b
 202 mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list(
 203   # morpheus
 204   function(fargs) {
 205     library(morpheus)
 206     mu <- computeMu(fargs$X, fargs$Y, fargs$optargs)
 207     optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta
 208   },
 209   # flexmix
 210   function(fargs) {
 211     library(flexmix)
 212     source("../patch_Bettina/FLXMRglm.R")
 213     dat <- as.data.frame( cbind(fargs$Y,fargs$X) )
 214     out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K,
 215       model=FLXMRglm(family="binomial") ) )
 216     sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) )
 217   } ),
 218   prepareArgs = function(fargs,index) {
 219     library(morpheus)
 220     io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link)
 221     fargs$X = io$X
 222     fargs$Y = io$Y
 223     fargs$K = ncol(fargs$beta)
 224     fargs$link = fargs$optargs$link
 225     fargs$optargs$M = computeMoments(io$X,io$Y)
 226     fargs
 227   }, N=10, ncores=3)
 228 # As in example 1, align results:
 229 for (i in 1:2)
 230   mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact")
 231 ```
 232
 233 ```{r, results="show", include=TRUE, echo=TRUE}
 234 # Second argument = true parameters matrix; third arg = index of method (here "morpheus")
 235 plotCoefs(mr2, beta, 1)
 236 # Real params are on the continous line; estimations = dotted line
 237 ```