Commit | Line | Data |
---|---|---|
3d5b5060 | 1 | --- |
c83df166 | 2 | title: Use morpheus package |
3d5b5060 BA |
3 | |
4 | output: | |
5 | pdf_document: | |
6 | number_sections: true | |
7 | toc_depth: 1 | |
8 | --- | |
9 | ||
1b8eb253 BA |
10 | \renewcommand{\P}{\mathrm{P}} |
11 | \newcommand{\R}{\mathbb{R}} | |
12 | ||
3d5b5060 BA |
13 | ```{r setup, results="hide", include=FALSE} |
14 | knitr::opts_chunk$set(echo = TRUE, include = TRUE, | |
15 | cache = TRUE, comment="", cache.lazy = FALSE, | |
16 | out.width = "100%", fig.align = "center") | |
17 | ``` | |
18 | ||
c83df166 BA |
19 | ## Introduction |
20 | <!--Tell that we try to learn classification parameters in a non-EM way, using algebric manipulations.--> | |
3d5b5060 | 21 | |
dad25cd2 BA |
22 | *morpheus* is a contributed R package which attempts to find the parameters of a |
23 | mixture of logistic classifiers. | |
24 | When the data under study come from several groups that have different characteristics, | |
25 | using mixture models is a very popular way to handle heterogeneity. | |
26 | Thus, many algorithms were developed to deal with various mixtures models. | |
27 | Most of them use likelihood methods or Bayesian methods that are likelihood dependent. | |
cff1083b | 28 | *flexmix* is an R package which implements these kinds of algorithms. |
3d5b5060 | 29 | |
dad25cd2 BA |
30 | However, one problem of such methods is that they can converge to local maxima, |
31 | so several starting points must be explored. | |
32 | Recently, spectral methods were developed to bypass EM algorithms and they were proved | |
33 | able to recover the directions of the regression parameter | |
c83df166 | 34 | in models with known link function and random covariates (see [XX]). |
dad25cd2 BA |
35 | Our package extends such moment methods using least squares to get estimators of the |
36 | whole parameters (with theoretical garantees, see [XX]). | |
cff1083b | 37 | Currently it can handle only binary output $-$ which is a common case. |
3d5b5060 | 38 | |
c83df166 BA |
39 | ## Model |
40 | ||
dad25cd2 BA |
41 | Let $X\in \R^{d}$ be the vector of covariates and $Y\in \{0,1\}$ be the binary output. |
42 | A binary regression model assumes that for some link function $g$, the probability that | |
43 | $Y=1$ conditionally to $X=x$ is given by $g(\langle \beta, x \rangle +b)$, where | |
44 | $\beta\in \R^{d}$ is the vector of regression coefficients and $b\in\R$ is the intercept. | |
45 | Popular examples of link functions are the logit link function where for any real $z$, | |
46 | $g(z)=e^z/(1+e^z)$ and the probit link function where $g(z)=\Phi(z),$ with $\Phi$ | |
47 | the cumulative distribution function of the standard normal ${\cal N}(0,1)$. | |
48 | Both are implemented in the package. | |
49 | ||
50 | If now we want to modelise heterogeneous populations, let $K$ be the number of | |
51 | populations and $\omega=(\omega_1,\cdots,\omega_K)$ their weights such that | |
52 | $\omega_{j}\geq 0$, $j=1,\ldots,K$ and $\sum_{j=1}^{K}\omega{j}=1$. | |
53 | Define, for $j=1,\ldots,K$, the regression coefficients in the $j$-th population | |
54 | by $\beta_{j}\in\R^{d}$ and the intercept in the $j$-th population by | |
55 | $b_{j}\in\R$. Let $\omega =(\omega_{1},\ldots,\omega_{K})$, | |
56 | $b=(b_1,\cdots,b_K)$, $\beta=[\beta_{1} \vert \cdots,\vert \beta_K]$ the $d\times K$ | |
57 | matrix of regression coefficients and denote $\theta=(\omega,\beta,b)$. | |
e36b1046 | 58 | The model of population mixture of binary regressions is given by: |
dad25cd2 | 59 | |
e36b1046 BA |
60 | \begin{equation} |
61 | \label{mixturemodel1} | |
1b8eb253 | 62 | \P_{\theta}(Y=1\vert X=x)=\sum^{K}_{k=1}\omega_k g(<\beta_k,x>+b_k). |
e36b1046 BA |
63 | \end{equation} |
64 | ||
dad25cd2 | 65 | ## Algorithm, theoretical garantees |
e36b1046 | 66 | |
dad25cd2 BA |
67 | The algorithm uses spectral properties of some tensor matrices to estimate the model |
68 | parameters $\Theta = (\omega, \beta, b)$. Under rather mild conditions it can be | |
69 | proved that the algorithm converges to the correct values (its speed is known too). | |
70 | For more informations on that subject, however, please refer to our article [XX]. | |
71 | In this vignette let's rather focus on package usage. | |
3d5b5060 | 72 | |
dad25cd2 | 73 | ## Usage |
85e0343a BA |
74 | <!--We assume that the random variable $X$ has a Gaussian distribution. |
75 | We now focus on the situation where $X\sim \mathcal{N}(0,I_d)$, $I_d$ being the | |
76 | identity $d\times d$ matrix. All results may be easily extended to the situation | |
77 | where $X\sim \mathcal{N}(m,\Sigma)$, $m\in \R^{d}$, $\Sigma$ a positive and | |
78 | symetric $d\times d$ matrix. ***** TODO: take this into account? --> | |
e36b1046 | 79 | |
5859426b BA |
80 | The two main functions are: |
81 | * computeMu(), which estimates the parameters directions, and | |
82 | * optimParams(), which builds an object \code{o} to estimate all other parameters | |
83 | when calling \code{o$run()}, starting from the directions obtained by the | |
84 | previous function. | |
85 | A third function is useful to run Monte-Carlo or bootstrap estimations using | |
86 | different models in various contexts: multiRun(). We'll show example for all of them. | |
e36b1046 | 87 | |
5859426b | 88 | ### Estimation of directions |
e36b1046 | 89 | |
5859426b BA |
90 | In a real situation you would have (maybe after some pre-processing) the matrices |
91 | X and Y which contain vector inputs and binary output. | |
92 | However, a function is provided in the package to generate such data following a | |
93 | pre-defined law: | |
e36b1046 | 94 | |
d294ece1 BA |
95 | ```{r, results="show", include=TRUE, echo=TRUE} |
96 | library(morpheus) | |
5859426b | 97 | io <- generateSampleIO(n=10000, p=1/2, beta=matrix(c(1,0,0,1),ncol=2), b=c(0,0), link="probit") |
d294ece1 BA |
98 | # io$X and io$Y contain the sample data |
99 | ``` | |
5859426b | 100 | |
d294ece1 BA |
101 | $n$ is the total number of samples (lines in X, number of elements in Y) |
102 | $p$ is a vector of proportions, of size $d-1$ (because the last proportion is deduced | |
103 | from the others: $p$ elements sums to 1) [TODO: omega or p?] | |
104 | $\beta$ is the matrix of linear coefficients, as written above in the model. | |
105 | $b$ is the vector of intercepts (as in linear regression, and as in the model above) | |
5859426b BA |
106 | link can be either "logit" or "probit", as mentioned earlier. |
107 | ||
108 | This function outputs a list containing in particular the matrices X and Y, allowing to | |
109 | use the other functions (which all require either these, or the moments). | |
110 | ||
d294ece1 BA |
111 | ```{r, results="show", include=TRUE, echo=TRUE} |
112 | mu <- computeMu(io$X, io$Y, optargs=list(K=2)) | |
113 | ``` | |
5859426b | 114 | |
d294ece1 | 115 | The optional argument, "optargs", is a list which can provide |
1b8eb253 | 116 | |
d294ece1 BA |
117 | * the number of clusters $K$, |
118 | * the moments matrix $M$ (computed with the "computeMoments()" function), | |
119 | * the joint-diagonalisation method ("uwedge" or "jedi"), | |
120 | * the number of random vectors for joint-diagonalization. | |
1b8eb253 | 121 | |
d294ece1 BA |
122 | See ?computeMu and the code for more details. |
123 | ||
124 | ### Estimation of the other parameters | |
125 | ||
126 | The other parameters are estimated by solving an optimization problem. | |
127 | The following function builds and return an optimization algorithm object: | |
128 | ||
129 | ```{r, results="show", include=TRUE, echo=TRUE} | |
130 | M <- computeMoments(io$X, io$Y) | |
131 | # X and Y must be provided if the moments matrix is not given | |
132 | algopt <- optimParams(K=2, link="probit", optargs=list(M=M)) | |
133 | # Optimization starts at beta = mu, b = 0 and p = uniform distribution | |
134 | x0 <- list(beta = mu) | |
135 | theta <- algopt$run(x0) | |
136 | ``` | |
5859426b | 137 | |
d294ece1 | 138 | Now theta is a list with three slots: |
1b8eb253 | 139 | |
d294ece1 BA |
140 | * $p$: estimated proportions, |
141 | * $\beta$: estimated regression matrix, | |
142 | * $b$: estimated bias. | |
5859426b BA |
143 | |
144 | ### Monte-Carlo and bootstrap | |
145 | ||
d294ece1 BA |
146 | The package provides a function to compare methods on several computations on random data. |
147 | It takes in input a list of parameters, then a list of functions which output some quantities | |
148 | (on the first example, our "computeMu()" method versus flexmix way of estimating directions), | |
149 | and finally a method to prepare the arguments which will be given to the functions in the | |
150 | list just mentioned; this allows to run Monte-Carlo estimations with the exact same samples | |
151 | for each compared method. The two last arguments to "multiRun()" control the number of runs, | |
152 | and the number of cores (using the package parallel). | |
153 | ||
154 | ```{r, results="show", include=TRUE, echo=TRUE} | |
155 | beta <- matrix(c(1,-2,3,1), ncol=2) | |
156 | io <- generateSampleIO(n=1000, p=1/2, beta=beta, b=c(0,0), "logit") | |
157 | mu <- normalize(beta) | |
158 | ||
159 | # Example 1: bootstrap + computeMu, morpheus VS flexmix; assumes fargs first 3 elts X,Y,K | |
160 | mr1 <- multiRun(list(X=io$X,Y=io$Y,optargs=list(K=2,jd_nvects=0)), list( | |
161 | # morpheus | |
162 | function(fargs) { | |
163 | library(morpheus) | |
164 | ind <- fargs$ind | |
165 | computeMu(fargs$X[ind,],fargs$Y[ind],fargs$optargs) | |
166 | }, | |
167 | # flexmix | |
168 | function(fargs) { | |
169 | library(flexmix) | |
170 | source("../patch_Bettina/FLXMRglm.R") | |
171 | ind <- fargs$ind | |
172 | K <- fargs$optargs$K | |
173 | dat = as.data.frame( cbind(fargs$Y[ind],fargs$X[ind,]) ) | |
174 | out = refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=K, | |
175 | model=FLXMRglm(family="binomial") ) ) | |
176 | normalize( matrix(out@coef[1:(ncol(fargs$X)*K)], ncol=K) ) | |
177 | } ), | |
178 | prepareArgs = function(fargs,index) { | |
179 | # Always include the non-shuffled dataset | |
180 | if (index == 1) | |
181 | fargs$ind <- 1:nrow(fargs$X) | |
182 | else | |
183 | fargs$ind <- sample(1:nrow(fargs$X),replace=TRUE) | |
184 | fargs | |
185 | }, N=10, ncores=3) | |
186 | # The result is correct up to matrices columns permutations; align them: | |
187 | for (i in 1:2) | |
188 | mr1[[i]] <- alignMatrices(mr1[[i]], ref=mu, ls_mode="exact") | |
189 | ``` | |
190 | ||
191 | Several plots are available: histograms, boxplots, or curves of coefficients. | |
192 | We illustrate boxplots and curves here (histograms function uses the same arguments, | |
193 | see ?plotHist). | |
194 | ||
1b8eb253 | 195 | ```{r, results="show", include=TRUE, echo=TRUE} |
d294ece1 BA |
196 | # Second row, first column; morpheus on the left, flexmix on the right |
197 | plotBox(mr1, 2, 1, "Target value: -1") | |
198 | ``` | |
199 | ||
200 | ```{r, results="show", include=TRUE, echo=TRUE} | |
201 | # Example 2: Monte-Carlo + optimParams from X,Y, morpheus VS flexmix; first args n,p,beta,b | |
202 | mr2 <- multiRun(list(n=1000,p=1/2,beta=beta,b=c(0,0),optargs=list(link="logit")), list( | |
203 | # morpheus | |
204 | function(fargs) { | |
205 | library(morpheus) | |
206 | mu <- computeMu(fargs$X, fargs$Y, fargs$optargs) | |
207 | optimParams(fargs$K,fargs$link,fargs$optargs)$run(list(beta=mu))$beta | |
208 | }, | |
209 | # flexmix | |
210 | function(fargs) { | |
211 | library(flexmix) | |
212 | source("../patch_Bettina/FLXMRglm.R") | |
213 | dat <- as.data.frame( cbind(fargs$Y,fargs$X) ) | |
214 | out <- refit( flexmix( cbind(V1, 1 - V1) ~ 0+., data=dat, k=fargs$K, | |
215 | model=FLXMRglm(family="binomial") ) ) | |
216 | sapply( seq_len(fargs$K), function(i) as.double( out@components[[1]][[i]][,1] ) ) | |
217 | } ), | |
218 | prepareArgs = function(fargs,index) { | |
219 | library(morpheus) | |
220 | io = generateSampleIO(fargs$n, fargs$p, fargs$beta, fargs$b, fargs$optargs$link) | |
221 | fargs$X = io$X | |
222 | fargs$Y = io$Y | |
223 | fargs$K = ncol(fargs$beta) | |
224 | fargs$link = fargs$optargs$link | |
225 | fargs$optargs$M = computeMoments(io$X,io$Y) | |
226 | fargs | |
227 | }, N=10, ncores=3) | |
228 | # As in example 1, align results: | |
229 | for (i in 1:2) | |
230 | mr2[[i]] <- alignMatrices(mr2[[i]], ref=beta, ls_mode="exact") | |
231 | ``` | |
232 | ||
1b8eb253 | 233 | ```{r, results="show", include=TRUE, echo=TRUE} |
d294ece1 BA |
234 | # Second argument = true parameters matrix; third arg = index of method (here "morpheus") |
235 | plotCoefs(mr2, beta, 1) | |
236 | # Real params are on the continous line; estimations = dotted line | |
237 | ``` |