| 1 | --- |
| 2 | title: Use morpheus package |
| 3 | |
| 4 | output: |
| 5 | pdf_document: |
| 6 | number_sections: true |
| 7 | toc_depth: 1 |
| 8 | --- |
| 9 | |
| 10 | \renewcommand{\P}{\mathrm{P}} |
| 11 | \newcommand{\R}{\mathbb{R}} |
| 12 | |
| 13 | ```{r setup, results="hide", include=FALSE} |
| 14 | knitr::opts_chunk$set(echo = TRUE, include = TRUE, |
| 15 | cache = TRUE, comment="", cache.lazy = FALSE, |
| 16 | out.width = "100%", fig.align = "center") |
| 17 | ``` |
| 18 | |
| 19 | ## Introduction |
| 20 | <!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.--> |
| 21 | |
| 22 | *morpheus* is a contributed R package which attempts to find the parameters of a |
| 23 | mixture of logistic classifiers. |
| 24 | When the data under study come from several groups that have different characteristics, |
| 25 | using mixture models is a very popular way to handle heterogeneity. |
| 26 | Thus, many algorithms were developed to deal with various mixtures models. |
| 27 | Most of them use likelihood methods or Bayesian methods that are likelihood dependent. |
| 28 | *flexmix* is an R package which implements these kinds of algorithms. |
| 29 | |
| 30 | However, one problem of such methods is that they can converge to local maxima, |
| 31 | so several starting points must be explored. |
| 32 | Recently, spectral methods were developed to bypass EM algorithms and they were proved |
| 33 | able to recover the directions of the regression parameter |
| 34 | in models with known link function and random covariates (see [XX]). |
| 35 | Our package extends such moment methods using least squares to get estimators of the |
| 36 | whole parameters (with theoretical garantees, see [XX]). |
| 37 | Currently it can handle only binary output $-$ which is a common case. |
| 38 | |
| 39 | ## Model |
| 40 | |
| 41 | Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output. |
| 42 | A binary regression model assumes that for some link function $g$, the probability that |
| 43 | $Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where |
| 44 | $\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept. |
| 45 | Popular examples of link functions are the logit link function where for any real $z$, |
| 46 | $g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$ |
| 47 | the cumulative distribution function of the standard normal ${\cal N}(0,1)$. |
| 48 | Both are implemented in the package. |
| 49 | |
| 50 | If now we want to modelise heterogeneous populations, let $K$ be the number of |
| 51 | populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that |
| 52 | $\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$. |
| 53 | Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population |
| 54 | by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by |
| 55 | $b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$, |
| 56 | $b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$ |
| 57 | matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$. |
| 58 | The model of population mixture of binary regressions is given by: |
| 59 | |
| 60 | \begin{equation} |
| 61 | \label{mixturemodel1} |
| 62 | \P_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k). |
| 63 | \end{equation} |
| 64 | |
| 65 | ## Algorithm, theoretical garantees |
| 66 | |
| 67 | The algorithm uses spectral properties of some tensor matrices to estimate the model |
| 68 | parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be |
| 69 | proved that the algorithm converges to the correct values (its speed is known too). |
| 70 | For more informations on that subject, however, please refer to our article [XX]. |
| 71 | In this vignette let's rather focus on package usage. |
| 72 | |
| 73 | ## Usage |
| 74 | <!--We assume that the random variable $X$ has a Gaussian distribution. |
| 75 | We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the |
| 76 | identity $d\times d$ matrix. All results may be easily extended to the situation |
| 77 | where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and |
| 78 | symetric $d\times d$ matrix. ***** TODO: take this into account? --> |
| 79 | |
| 80 | The two main functions are: |
| 81 | * computeMu(), which estimates the parameters directions, and |
| 82 | * optimParams(), which builds an object \code{o} to estimate all other parameters |
| 83 | when calling \code{o$run()}, starting from the directions obtained by the |
| 84 | previous function. |
| 85 | A third function is useful to run Monte-Carlo or bootstrap estimations using |
| 86 | different models in various contexts: multiRun(). We'll show example for all of them. |
| 87 | |
| 88 | ### Estimation of directions |
| 89 | |
| 90 | In a real situation you would have (maybe after some pre-processing) the matrices |
| 91 | X and Y which contain vector inputs and binary output. |
| 92 | However, a function is provided in the package to generate such data following a |
| 93 | pre-defined law: |
| 94 | |
| 95 | ```{r, results="show", include=TRUE, echo=TRUE} |
| 96 | library(morpheus) |
| 97 | io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit") |
| 98 | # io$X and io$Y contain the sample data |
| 99 | ``` |
| 100 | |
| 101 | $n$ is the total number of samples (lines in X, number of elements in Y) |
| 102 | $p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced |
| 103 | from the others: $p$ elements sums to 1) [TODO: omega or p?] |
| 104 | $\beta$ is the matrix of linear coefficients, as written above in the model. |
| 105 | $b$ is the vector of intercepts (as in linear regression, and as in the model above) |
| 106 | link can be either "logit" or "probit", as mentioned earlier. |
| 107 | |
| 108 | This function outputs a list containing in particular the matrices X and Y, allowing to |
| 109 | use the other functions (which all require either these, or the moments). |
| 110 | |
| 111 | ```{r, results="show", include=TRUE, echo=TRUE} |
| 112 | mu <- computeMu(io$X, io$Y, optargs=list(K=2)) |
| 113 | ``` |
| 114 | |
| 115 | The optional argument, "optargs", is a list which can provide |
| 116 | |
| 117 | * the number of clusters $K$, |
| 118 | * the moments matrix $M$ (computed with the "computeMoments()" function), |
| 119 | * the joint-diagonalisation method ("uwedge" or "jedi"), |
| 120 | * the number of random vectors for joint-diagonalization. |
| 121 | |
| 122 | See ?computeMu and the code for more details. |
| 123 | |
| 124 | ### Estimation of the other parameters |
| 125 | |
| 126 | The other parameters are estimated by solving an optimization problem. |
| 127 | The following function builds and return an optimization algorithm object: |
| 128 | |
| 129 | ```{r, results="show", include=TRUE, echo=TRUE} |
| 130 | M <- computeMoments(io$X, io$Y) |
| 131 | # X and Y must be provided if the moments matrix is not given |
| 132 | algopt <- optimParams(K=2, link="probit", optargs=list(M=M)) |
| 133 | # Optimization starts at beta = mu, b = 0 and p = uniform distribution |
| 134 | x0 <- list(beta = mu) |
| 135 | theta <- algopt$run(x0) |
| 136 | ``` |
| 137 | |
| 138 | Now theta is a list with three slots: |
| 139 | |
| 140 | * $p$: estimated proportions, |
| 141 | * $\beta$: estimated regression matrix, |
| 142 | * $b$: estimated bias. |
| 143 | |
| 144 | ### Monte-Carlo and bootstrap |
| 145 | |
| 146 | The package provides a function to compare methods on several computations on random data. |
| 147 | It takes in input a list of parameters, then a list of functions which output some quantities |
| 148 | (on the first example, our "computeMu()" method versus flexmix way of estimating directions), |
| 149 | and finally a method to prepare the arguments which will be given to the functions in the |
| 150 | list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples |
| 151 | for each compared method. The two last arguments to "multiRun()" control the number of runs, |
| 152 | and the number of cores (using the package parallel). |
| 153 | |
| 154 | ```{r, results="show", include=TRUE, echo=TRUE} |
| 155 | beta <- matrix(c(1,-2,3,1), ncol=2) |
| 156 | io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit") |
| 157 | mu <- normalize(beta) |
| 158 | |
| 159 | # Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K |
| 160 | mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list( |
| 161 | # morpheus |
| 162 | function(fargs) { |
| 163 | library(morpheus) |
| 164 | ind <- fargs$ind |
| 165 | computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs) |
| 166 | }, |
| 167 | # flexmix |
| 168 | function(fargs) { |
| 169 | library(flexmix) |
| 170 | source("../patch_Bettina/FLXMRglm.R") |
| 171 | ind <- fargs$ind |
| 172 | K <- fargs$optargs$K |
| 173 | dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) ) |
| 174 | out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K, |
| 175 | model=FLXMRglm(family="binomial") ) ) |
| 176 | normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) ) |
| 177 | } ), |
| 178 | prepareArgs = function(fargs,index) { |
| 179 | # Always include the non-shuffled dataset |
| 180 | if (index == 1) |
| 181 | fargs$ind <- 1:nrow(fargs$X) |
| 182 | else |
| 183 | fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE) |
| 184 | fargs |
| 185 | }, N=10, ncores=3) |
| 186 | # The result is correct up to matrices columns permutations; align them: |
| 187 | for (i in 1:2) |
| 188 | mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact") |
| 189 | ``` |
| 190 | |
| 191 | Several plots are available: histograms, boxplots, or curves of coefficients. |
| 192 | We illustrate boxplots and curves here (histograms function uses the same arguments, |
| 193 | see ?plotHist). |
| 194 | |
| 195 | ```{r, results="show", include=TRUE, echo=TRUE} |
| 196 | # Second row, first column; morpheus on the left, flexmix on the right |
| 197 | plotBox(mr1, 2, 1, "Target value: -1") |
| 198 | ``` |
| 199 | |
| 200 | ```{r, results="show", include=TRUE, echo=TRUE} |
| 201 | # Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b |
| 202 | mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list( |
| 203 | # morpheus |
| 204 | function(fargs) { |
| 205 | library(morpheus) |
| 206 | mu <- computeMu(fargs$X, fargs$Y, fargs$optargs) |
| 207 | optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta |
| 208 | }, |
| 209 | # flexmix |
| 210 | function(fargs) { |
| 211 | library(flexmix) |
| 212 | source("../patch_Bettina/FLXMRglm.R") |
| 213 | dat <- as.data.frame( cbind(fargs$Y,fargs$X) ) |
| 214 | out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K, |
| 215 | model=FLXMRglm(family="binomial") ) ) |
| 216 | sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) ) |
| 217 | } ), |
| 218 | prepareArgs = function(fargs,index) { |
| 219 | library(morpheus) |
| 220 | io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link) |
| 221 | fargs$X = io$X |
| 222 | fargs$Y = io$Y |
| 223 | fargs$K = ncol(fargs$beta) |
| 224 | fargs$link = fargs$optargs$link |
| 225 | fargs$optargs$M = computeMoments(io$X,io$Y) |
| 226 | fargs |
| 227 | }, N=10, ncores=3) |
| 228 | # As in example 1, align results: |
| 229 | for (i in 1:2) |
| 230 | mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact") |
| 231 | ``` |
| 232 | |
| 233 | ```{r, results="show", include=TRUE, echo=TRUE} |
| 234 | # Second argument = true parameters matrix; third arg = index of method (here "morpheus") |
| 235 | plotCoefs(mr2, beta, 1) |
| 236 | # Real params are on the continous line; estimations = dotted line |
| 237 | ``` |