src/Mixstore/StaticBundle/Resources/views/about.html.twig

   1 {% extends "::base.html.twig" %}
   2
   3 {% block title %}{{ parent() }}about{% endblock %}
   4
   5 {% block header %}
   6 {{ parent() }}
   7 <link rel="stylesheet" href="{{ asset('mixstore/css/static/about.css') }}">
   8 {% endblock %}
   9
  10 {% block content %}
  11
  12 <div id="maintext" class="row">
  13
  14 <div class="col-xs-12 borderbottom">
  15
  16 <h2>Origins</h2>
  17
  18 In the late 1990's, three researchers wrote some code in MATLAB to classify data using
  19 mixture models. Initially named XEM for "EM-algorithms on miXture models",
  20 it was quickly renamed into mixmod, and rewritten in C++ from 2001.
  21 Since then, mixmod has been extended in several directions including:
  22 <ul>
  23     <li>supervised classification</li>
  24     <li>categorical data handling</li>
  25     <li>heterogeneous data handling</li>
  26 </ul>
  27 ...and the code is constantly evolving. {# still in constant evolution #}
  28 More details can be found on the <a href="http://www.mixmod.org">dedicated website</a>.
  29
  30 There exist now many packages related to mixture models, each of them specialized in
  31 some domain. Although mixmod can (arguably) be considered as one of the first of its kind,
  32 it would be rather arbitrary to give him a central position.
  33 That is why mixmod is "only" part of the mix-store.
  34
  35 {# (mixmod permet de faire + de choses : renvoyer au site web + doc...) #}
  36
  37 <h2>Summary</h2>
  38
  39 Mixstore is a website gathering libraries dedicated to data modeling as
  40 a mixture of probabilistic components. The computed mixture can be used
  41 for various purposes including
  42 <ul>
  43     <li>density estimation</li>
  44     <li>clustering (unsupervised classification)</li>
  45     <li>(supervised) classification</li>
  46     <li>regression, ...</li>
  47 </ul>
  48
  49 <h2>Example</h2>
  50
  51 <p>
  52 To start using any of the softwares present in the store, we need a dataset.
  53 We choose here an old classic: the Iris dataset introduced by Ronald Fisher in 1936.
  54 Despite its classicity this dataset is not so easy to analyze, as we will see in the following.
  55 </p>
  56
  57 <p>
  58 The <a href="http://en.wikipedia.org/wiki/Iris_flower_data_set">Iris dataset</a>
  59 contains 150 rows, each of them composed of 4 continuous attributes which
  60 corresponds to some flowers measurements. 3 species are equally represented : (Iris)
  61 Setosa, Versicolor and Virginica.
  62 </p>
  63
  64 <p>
  65     <figure>
  66     <img src="{{ asset('mixstore/images/iris_pca.png') }}" alt="PCA components of iris dataset"/><br/>
  67     <caption>The two first PCA components of Iris dataset (image found
  68     <a href="http://www.wanderinformatiker.at/unipages/general/img/iris_pca1.png">here</a>)</caption>
  69     </figure>
  70 </p>
  71
  72 <p>
  73 As the figure suggests the goal on this dataset is to discriminate Iris species.
  74 That is to say, our goal is to find a way to answer these questions:
  75 "are two given elements in the same group ?", "which group does a given element belongs to ?".
  76 </p>
  77
  78 <p>
  79 The mixstore packages take a more general approach: they (try to) learn the data generation
  80 process, and then deduce the groups compositions. Thus, the two above questions can easily
  81 be answered by using the mathematical formulas describing the classes.
  82 Although this approach has several advantages (low sensitivity to outliers, likelihood
  83 to rank models...), finding the adequate model is challenging.
  84 We will not dive into such model selection details.
  85 {# This is a more general and harder problem. #}
  86 </p>
  87
  88 </div>
  89
  90 <div class="col-xs-12">
  91
  92 <p>
  93 Density for 2 groups:
  94 ££f^{(2)}(x) = \pi_1^{(2)} g_1^{(2)}(x) + \pi_2^{(2)} g_2^{(2)}(x)££
  95 where £g_i^{(2)} = (2 \pi)^{-d/2} \left| \Sigma_i^{(2)} \right|^{-1/2} \mbox{exp}\left( -\frac{1}{2} \, {}^T(x - \mu_i^{(2)}) (\Sigma_i^{(2)})^{-1} (x - \mu_i^{(2)}) \right)£.<br/>
  96 £x = (x_1,x_2,x_3,x_4)£ with the following correspondances.
  97 <ul>
  98     <li>£x_1£: sepal length;</li>
  99     <li>£x_2£: sepal width;</li>
 100     <li>£x_3£: petal length;</li>
 101     <li>£x_4£: petal width.</li>
 102 </ul>
 103 </p>
 104
 105 </div>
 106
 107 <div class="col-xs-12 col-sm-6">
 108 \begin{align*}
 109 \pi_1^{(2)} &= 0.33\\
 110 \mu_1^{(2)} &= (5.01 3.43 1.46 0.25)\\
 111 \Sigma_1^{(2)} &=
 112     \begin{pmatrix}
 113     0.15&0.13&0.02&0.01\\
 114     0.13&0.18&0.02&0.01\\
 115     0.02&0.02&0.03&0.01\\
 116     0.01&0.01&0.01&0.01
 117     \end{pmatrix}
 118 \end{align*}
 119 </div>
 120
 121 <div class="col-xs-12 col-sm-6">
 122 \begin{align*}
 123 \pi_2^{(2)} &= 0.67\\
 124 \mu_2^{(2)} &= (6.26 2.87 4.91 1.68)\\
 125 \Sigma_2^{(2)} &=
 126     \begin{pmatrix}
 127     0.40&0.11&0.40&0.14\\
 128     0.11&0.11&0.12&0.07\\
 129     0.40&0.12&0.61&0.26\\
 130     0.14&0.07&0.26&0.17
 131     \end{pmatrix}
 132 \end{align*}
 133 </div>
 134
 135 <div class="col-xs-12 borderbottom">
 136 Penalized log-likelihood (BIC): <b>-561.73</b>
 137 </div>
 138
 139 <div class="col-xs-12">
 140
 141 <p>
 142 Density for 3 groups:
 143 ££f^{(3)}(x) = \pi_1^{(3)} g_1^{(3)}(x) + \pi_2^{(3)} g_2^{(3)}(x) + \pi_3^{(3)} g_3^{(3)}(x)££
 144 (Same parameterizations for cluster densities £g_i^{(3)}£).<br/>
 145 </p>
 146
 147 </div>
 148
 149 <div class="col-xs-12 col-md-4">
 150 \begin{align*}
 151 \pi_1^{(3)} &= 0.33\\
 152 \mu_1^{(3)} &= (5.01 3.43 1.46 0.25)\\
 153 \Sigma_1^{(3)} &=
 154     \begin{pmatrix}
 155     0.13&0.11&0.02&0.01\\
 156     0.11&0.15&0.01&0.01\\
 157     0.02&0.01&0.03&0.01\\
 158     0.01&0.01&0.01&0.01
 159     \end{pmatrix}
 160 \end{align*}
 161 </div>
 162
 163 <div class="col-xs-12 col-md-4">
 164 \begin{align*}
 165 \pi_2^{(3)} &= 0.30\\
 166 \mu_2^{(3)} &= (5.91 2.78 4.20 1.30)\\
 167 \Sigma_2^{(3)} &=
 168     \begin{pmatrix}
 169     0.23&0.08&0.15&0.04\\
 170     0.08&0.08&0.07&0.03\\
 171     0.15&0.07&0.17&0.05\\
 172     0.04&0.03&0.05&0.03
 173     \end{pmatrix}
 174 \end{align*}
 175 </div>
 176
 177 <div class="col-xs-12 col-md-4">
 178 \begin{align*}
 179 \pi_3^{(3)} &= 0.37\\
 180 \mu_3^{(3)} &= (6.55 2.95 5.48 1.96)\\
 181 \Sigma_3^{(3)} &=
 182     \begin{pmatrix}
 183     0.43&0.11&0.33&0.07\\
 184     0.11&0.12&0.09&0.06\\
 185     0.33&0.09&0.36&0.09\\
 186     0.07&0.06&0.09&0.09
 187     \end{pmatrix}
 188 \end{align*}
 189 </div>
 190
 191 <div class="col-xs-12 borderbottom">
 192 Penalized log-likelihood (BIC): <b>-562.55</b>
 193 </div>
 194
 195 <div class="col-xs-12">
 196
 197 <p>
 198 As initially stated, the dataset is difficult to cluster because although we know there are
 199 3 species, 2 of them are almost undinstinguishable. That is why log-likelihood values are very close.
 200 We usually consider that a method is good on Iris dataset when it finds 3 clusters,
 201 but 2 is also a correct answer.
 202 </p>
 203
 204 </div>
 205
 206 </div>
 207
 208 {% endblock %}
 209
 210 {% block javascripts %}
 211 {{ parent() }}
 212 <script type="text/x-mathjax-config">
 213 MathJax.Hub.Config({
 214     tex2jax: {
 215         inlineMath: [['£','£']],
 216         displayMath: [['££','££']],
 217         skipTags: ["script","noscript","style"]//,"textarea","pre","code"]
 218     }
 219 });
 220 </script>
 221 <script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>
 222 {% endblock %}