vignettes/simul.tex

   1 \section{Simulations}
   2
   3 \subsection{R package}
   4
   5 The developed R-package is called \verb"morpheus" \cite{Loum_Auder} and divided into two main parts:
   6 \begin{enumerate}
   7     \item the computation of the directions matrix $\mu$, based on the empirical
   8         cross-moments as described in the previous sections;
   9     \item the optimization of all parameters (including $\mu$), using the initially estimated
  10         directions as a starting point.
  11 \end{enumerate}
  12 The former is a straightforward translation of the mathematical formulas (file R/computeMu.R),
  13 while the latter calls R constrOptim() method on the objective function expression and its
  14 derivative (file R/optimParams.R). For usage examples, please refer to the package help.
  15
  16 \subsection{Experiments}
  17 In this section, we evaluate our algorithm in a first step using mean squared error (MSE). In a second step, we compare experimentally our moments method (morpheus package \cite{Loum_Auder}) and the likelihood method (with felxmix package \cite{bg-papers:Gruen+Leisch:2007a}).
  18 We arbitrarily choose the parameters for the simulations, which should be discovered by the algorithms (ours, and the likelihood algorithm).
  19
  20 Experiment 1 (dimension 2):
  21 \begin{align*}
  22     K &= 2\\
  23     p &= (0.5, 0.5)\\
  24     b &= (-0.2, 0.5)\\
  25     \beta &=
  26         \begin{pmatrix}
  27         1 & 3\\
  28         -2 & 1
  29         \end{pmatrix}
  30 \end{align*}
  31
  32 Experiment 2 (dimension 5):
  33 \begin{align*}
  34     K &= 2\\
  35     p &= (0.5, 0.5)\\
  36     b &= (-0.2, 0.5)\\
  37     \beta &=
  38         \begin{pmatrix}
  39             1 & 2\\
  40             2 & -3\\
  41             -1 & 0\\
  42             0 & 1\\
  43             3 & 0
  44         \end{pmatrix}
  45 \end{align*}
  46
  47 Experiment 3 (dimension 10):
  48 \begin{align*}
  49     K &= 3\\
  50     p &= (0.3, 0.3, 0.4)\\
  51     b &= (-0.2, 0, 0.5)\\
  52     \beta &=
  53         \begin{pmatrix}
  54             1 & 2 & -1\\
  55             2 & -3 & 1\\
  56             -1 & 0 & 3\\
  57             0 & 1 & -1\\
  58             3 & 0 & 0\\
  59             4 & -1 & 0\\
  60             -1 & -4 & 2\\
  61             -3 & 3 & 0\\
  62             0 & 2 & 1\\
  63             2 & 0 & -2
  64         \end{pmatrix}
  65 \end{align*}
  66
  67 For all three experiments we use both logit and probit links.
  68 Computations are always run on the same data both for our package and flexmix $-$ which is a reference for this kind of estimation, using an iterative algorithm to maximize the log-likelihood. Results are aggregated over $N=1000$ Monte-Carlo runs.\\
  69
  70
  71 \noindent \textbf{Mean squared error (MSE)}.
  72 Graphical representations of the MSE versus the sample size ($n$) are given in figures~\ref{mse_2}, ~\ref{mse_5} and \ref{mse_10}. In each figure, we represent the MSE associated with each parameter vector versus the sample size. We can see that the goodness of the estimation depends on  the sample size and that enough observations is needed to properly estimate the parameters. We have from figure~\ref{mse_2} and figure~\ref{mse_5} that for our moment method, with dimension less or equal to $5,$ the necessary sample size is around of $10^5.$ For large dimension, figures~\ref{mse_10} show that $10^6$ is not enough to estimate the parameters vectors $\beta.$ Indeed our estimation method use the spectral estimator, which need more enough data, as starting point.\\
  73
  74
  75
  76
  77
  78
  79 \begin{figure}[!!ht!!]
  80 \centering
  81     \includegraphics[keepaspectratio=true,height=3.5cm,width=8cm]{logit_2.png}\\
  82     \includegraphics[keepaspectratio=true,height=3.5cm,width=8cm]{probit_2.png}
  83     \caption{Experiment 1; Top: \textit{logit} link function $\Big($(a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$)$\Big)$, bottom: \textit{probit} link function: $\Big($(c) MSE$(\widehat{\beta})$, (d) MSE($(\widehat{b}$,$\widehat{p}$)$\Big)$.}
  84     \label{mse_2}
  85 \end{figure}
  86
  87 \begin{figure}[!!ht!!]
  88 \centering
  89     \includegraphics[keepaspectratio=true,height=3.5cm,width=8cm]{logit_5.png}\\
  90     \includegraphics[keepaspectratio=true,height=3.5cm,width=8cm]{probit_5.png}
  91     \caption{Experiment 2; Top: \textit{logit} link function $\Big($(a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$)$\Big)$, bottom: \textit{probit} link function: $\Big($ (c) MSE$(\widehat{\beta})$, (d) MSE($(\widehat{b}$,$\widehat{p}$)$\Big)$.}
  92     \label{mse_5}
  93 \end{figure}
  94
  95
  96 \begin{figure}[!!ht!!]
  97 \centering
  98     \includegraphics[keepaspectratio=true,height=3.5cm,width=8cm]{logit_10.png}\\
  99     \includegraphics[keepaspectratio=true,height=3.5cm,width=8cm]{probit_10.png}
 100     \caption{Experiment 3; Top: \textit{logit} link function $\Big($(a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$)$\Big)$, bottom: \textit{probit} link function: $\Big($ (c) MSE$(\widehat{\beta})$, (d) MSE($(\widehat{b}$,$\widehat{p}$)$\Big)$.}
 101     \label{mse_10}
 102 \end{figure}
 103
 104 %\begin{figure}[!!ht!!]
 105 %\centering
 106 %   \includegraphics[keepaspectratio=true,height=4cm,width=8cm]{Figures/logit_2.png}
 107 %   \caption{Experiment 1: \textit{logit} link function for our algorithm; (a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$).}
 108 %   \label{mse_logit_2}
 109 %\end{figure}
 110 %
 111 %\begin{figure}[!!ht!!]
 112 %\centering
 113 %   \includegraphics[keepaspectratio=true,height=4cm,width=8cm]{Figures/logit_5.png}
 114 %   \caption{Experiment 2: \textit{logit} link functionfor our algorithm; (a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$).}
 115 %   \label{mse_logit_5}
 116 %\end{figure}
 117 %
 118 %
 119 %\begin{figure}[!!ht!!]
 120 %\centering
 121 %   \includegraphics[keepaspectratio=true,height=4cm,width=8cm]{Figures/logit_10.png}
 122 %   \caption{Experiment 3: \textit{logit} link function for our algorithm; (a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$).}
 123 %   \label{mse_logit_10}
 124 %\end{figure}
 125 %
 126 %\begin{figure}[!!ht!!]
 127 %\centering
 128 %   \includegraphics[keepaspectratio=true,height=4cm,width=8cm]{Figures/probit_2.png}
 129 %   \caption{Experiment 1: \textit{probit} link function for our algorithm; (a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$).}
 130 %   \label{mse_probit_2}
 131 %\end{figure}
 132 %
 133 %\begin{figure}[!!ht!!]
 134 %\centering
 135 %   \includegraphics[keepaspectratio=true,height=4cm,width=8cm]{Figures/probit_5.png}
 136 %   \caption{Experiment 2: \textit{probit} link function for our algorithm; (a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$).}
 137 %   \label{mse_probit_5}
 138 %\end{figure}
 139 %
 140 %\begin{figure}[!!ht!!]
 141 %\centering
 142 %   \includegraphics[keepaspectratio=true,height=4cm,width=8cm]{Figures/probit_10.png}
 143 %   \caption{Experiment 3: \textit{probit} link function for our algorithm; (a) MSE($\widehat{\beta}$), (b) MSE($\widehat{b}$,$\widehat{p}$).}
 144 %   \label{mse_probit_10}
 145 %\end{figure}
 146
 147
 148
 149
 150
 151 \noindent \textbf{Algorithms performance}.
 152 To evaluate algorithms performance, the total number of sample points is fixed to $n = 10^5.$ This value still enough to observe correct performances, yet small enough to remain realistic.
 153
 154 On the figures~\ref{res_logit} and \ref{res_probit}, all (true) parameters ($p, b, \beta$) are re-ordered in a real vector, of size $K \times (d+2) - 1$. This vector is plotted as a line to improve visualization experience, but it must be noted that this does not represent any curve data. Dotted lines corresponds to the computed values plus or minus one standard deviation, and computed values themselves are represented with long dashed lines. The leftmost column corresponds to experiment 1 ($d=2$), the middle one to experiment 2 ($d=5$), and the rightmost column corresponds to experiment 3 ($d=10$).
 155
 156
 157
 158
 159 \noindent \textit{Figure \ref{res_logit}: logit link.}
 160 While most of the times flexmix finds a better solution than our proposed algorithm (smaller variance), both methods are good on average for $d \leq 5$.
 161 The case $d=10$ is not handled well neither by the flexmix package nor by our package (the latter showing even poorer accuracy).
 162 Indeed in this relatively high dimension the number of observations should be much higher.
 163
 164 \begin{figure}[!!ht!!]
 165 \centering
 166     \includegraphics[keepaspectratio=true,height=8cm,width=12cm]{res_logit.png}
 167     \caption{\textit{Logit} link function. Top: our package, bottom: flexmix. From left to right: experiment 1, 2 and 3 respectively}
 168     \label{res_logit}
 169 \end{figure}
 170 \noindent \textit{Figure \ref{res_probit}: probit link.}
 171 In this case our algorithm performs slightly better than its flexmix counterpart for $d \leq 5$.
 172 However, again, the variance in the case $d=10$ is way too high $-$ in fact even the average value is generally wrong, when coefficients are non-zero.
 173 We can increase $n$ by a factor $100$ to obtain more accurate results.\\
 174 \begin{figure}[!!ht!!]
 175 \centering
 176     \includegraphics[keepaspectratio=true,height=8cm,width=12cm]{res_probit.png}
 177     \caption{\textit{Probit} link function. Top: our package, bottom: flexmix. From left to right: experiment 1, 2 and 3 respectively}
 178     \label{res_probit}
 179 \end{figure}
 180
 181
 182
 183 % TODO: improve placement: see https://tex.stackexchange.com/questions/8625/force-figure-placement-in-text
 184
 185 Considering both links with $d \leq 5$, the algorithms performances are comparable with a global small advantage to our method.
 186 The case $d=10$ is handled better by the flexmix algorithm, although clearly not well.
 187 Finally, concerning our method we observe a tendancy to overestimate large parameters while underestimating small ones.
 188 This observation is inversed for flexmix.\\
 189
 190 \noindent \textbf{Computational time}.
 191 It must be noted that our algorithm timing does not depend on $n$, since it operates on matrices of size $O(d \times K)$.
 192 Thus it can be more suitable for very large datasets, where the variability is clearly reduced. Figure~\ref{timing} (time by seconds versus $\log_{10}n$) illustrate this fact: the timings clearly favor our package $-$ because increasing $n$
 193 has almost no impact on the running time. However, flexmix timings are not that high:
 194 just a few minutes for the longest run, on average, on one million sample points.
 195 To obtain the data shown on the figure we averaged $1000$ runs on random parameters.\\
 196
 197 \begin{figure}
 198 \centering
 199     \includegraphics[height=8cm,width=10cm]{timings.png}
 200     \caption{Timings versus sample size. Dotted lines: flexmix; others: our algorithm. Black for $d=2,$ red for $d=5$ and blue for $d=10.$}
 201     \label{timing}
 202 \end{figure}
 203
 204
 205 %\begin{figure}[h!]
 206 %\centering
 207 %   \includegraphics[height=6cm,width=14cm]{Figures/timings_logit.png}
 208 %   \caption{\textit{Logit} link function. Timings versus sample size. Our algorithm on the left, flexmix on the right. $d=2$ in long-dashed line, $d=5$ in dotted line, $d=10$ in solid line.}
 209 %   \label{timing_logit}
 210 %\end{figure}
 211 %
 212 %
 213 %\begin{figure}[h!]
 214 %\centering
 215 %   \includegraphics[height=6cm,width=14cm]{Figures/timings_probit.png}
 216 %   \caption{\textit{Probit} link function. Timings versus sample size. Our algorithm on the left, flexmix on the right. $d=2$ in long-dashed line, $d=5$ in dotted line, $d=10$ in solid line.}
 217 %   \label{timing_probit}
 218 %\end{figure}