updated report for 01/03
[talweg.git] / reports / report_2017-03-01.ipynb
CommitLineData
09cf9c19
BA
1{
2 "cells": [
3 {
4 "cell_type": "code",
5 "execution_count": null,
6 "metadata": {
7 "collapsed": false
8 },
9 "outputs": [],
10 "source": [
11 "library(talweg)"
12 ]
13 },
14 {
15 "cell_type": "code",
16 "execution_count": null,
17 "metadata": {
18 "collapsed": false
19 },
20 "outputs": [],
21 "source": [
22 "data = getData(ts_data=\"../data/pm10_mesures_H_loc.csv\", exo_data=\"../data/meteo_extra_noNAs.csv\",\n",
23 " input_tz = \"Europe/Paris\", working_tz=\"Europe/Paris\", predict_at=7)"
24 ]
25 },
26 {
27 "cell_type": "markdown",
28 "metadata": {},
29 "source": [
56999439
BA
30 "## Introduction\n",
31 "\n",
32 "J'ai fait quelques essais dans différentes configurations pour la méthode \"Neighbors\" (la seule dont on a parlé).<br>Il semble que le mieux soit\n",
33 "\n",
34 " * simtype=\"mix\" : on utilise les similarités endogènes et exogènes (fenêtre optimisée par VC)\n",
35 " * same_season=FALSE : les indices pour la validation croisée ne tiennent pas compte des saisons\n",
36 " * mix_strategy=\"mult\" : on multiplie les poids (au lieu d'en éteindre)\n",
37 "\n",
38 "J'ai systématiquement comparé à deux autres approches : la persistence et la moyenne de tous les futurs des jours similaires du passé ; à chaque fois sans prédiction du saut (sauf pour Neighbors : prédiction basée sur les poids calculés).\n",
39 "\n",
40 "Ensuite j'affiche les erreurs, quelques courbes prévues/mesurées, quelques filaments puis les histogrammes de quelques poids. Concernant les graphes de filaments, la moitié gauche du graphe correspond aux jours similaires au jour courant, tandis que la moitié droite affiche les lendemains : ce sont donc les voisinages tels qu'utilisés dans l'algorithme.\n",
41 "\n",
42 "<h2 style=\"color:blue;font-size:2em\">Pollution par chauffage</h2>"
09cf9c19
BA
43 ]
44 },
45 {
46 "cell_type": "code",
47 "execution_count": null,
48 "metadata": {
49 "collapsed": false
50 },
51 "outputs": [],
52 "source": [
53 "p_ch_nn = getForecast(data, seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\"), Inf, 17,\n",
54 " \"Neighbors\", \"Neighbors\", simtype=\"mix\", same_season=FALSE, mix_strategy=\"mult\")\n",
55 "p_ch_pz = getForecast(data, seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\"), Inf, 17,\n",
56 " \"Persistence\", \"Zero\")\n",
57 "p_ch_az = getForecast(data, seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\"), Inf, 17,\n",
58 " \"Average\", \"Zero\")"
59 ]
60 },
61 {
62 "cell_type": "code",
63 "execution_count": null,
64 "metadata": {
65 "collapsed": false
66 },
67 "outputs": [],
68 "source": [
69 "e_ch_nn = getError(data, p_ch_nn, 17)\n",
70 "e_ch_pz = getError(data, p_ch_pz, 17)\n",
71 "e_ch_az = getError(data, p_ch_az, 17)\n",
72 "options(repr.plot.width=9, repr.plot.height=6)\n",
56999439
BA
73 "plotError(list(e_ch_nn, e_ch_pz, e_ch_az), cols=c(1,2,colors()[258]))\n",
74 "\n",
75 "#Noir: neighbors, rouge: persistence, vert: moyenne"
76 ]
77 },
78 {
79 "cell_type": "markdown",
80 "metadata": {},
81 "source": [
82 "La méthode Neighbors fait assez nettement mieux qu'une simple moyenne dans ce cas."
09cf9c19
BA
83 ]
84 },
85 {
86 "cell_type": "code",
87 "execution_count": null,
88 "metadata": {
89 "collapsed": false
90 },
91 "outputs": [],
92 "source": [
93 "par(mfrow=c(1,2))\n",
94 "options(repr.plot.width=9, repr.plot.height=4)\n",
95 "plotPredReal(data, p_ch_nn, 3)\n",
56999439
BA
96 "plotPredReal(data, p_ch_nn, 4)\n",
97 "\n",
98 "#Bleu: prévue, noir: réalisée"
99 ]
100 },
101 {
102 "cell_type": "markdown",
103 "metadata": {},
104 "source": [
105 "Prédictions d'autant plus lisses que le jour à prévoir est atypique (pollué)."
09cf9c19
BA
106 ]
107 },
108 {
109 "cell_type": "code",
110 "execution_count": null,
111 "metadata": {
112 "collapsed": false
113 },
114 "outputs": [],
115 "source": [
116 "par(mfrow=c(1,2))\n",
117 "plotFilaments(data, p_ch_nn$getIndexInData(3))\n",
118 "plotFilaments(data, p_ch_nn$getIndexInData(4))"
119 ]
120 },
56999439
BA
121 {
122 "cell_type": "markdown",
123 "metadata": {},
124 "source": [
125 "Beaucoup de courbes similaires dans le cas peu pollué, très peu pour un jour pollué."
126 ]
127 },
09cf9c19
BA
128 {
129 "cell_type": "code",
130 "execution_count": null,
131 "metadata": {
132 "collapsed": false
133 },
134 "outputs": [],
135 "source": [
136 "par(mfrow=c(1,3))\n",
137 "plotSimils(p_ch_nn, 3)\n",
138 "plotSimils(p_ch_nn, 4)\n",
56999439
BA
139 "plotSimils(p_ch_nn, 5)\n",
140 "\n",
141 "#Non pollué à gauche, pollué au milieu, autre pollué à droite"
09cf9c19
BA
142 ]
143 },
144 {
145 "cell_type": "markdown",
146 "metadata": {},
147 "source": [
56999439
BA
148 "La plupart des poids très proches de zéro ; pas pour le jour 5 : autre type de jour, cf. ci-dessous."
149 ]
150 },
151 {
152 "cell_type": "code",
153 "execution_count": null,
154 "metadata": {
155 "collapsed": false
156 },
157 "outputs": [],
158 "source": [
159 "par(mfrow=c(1,2))\n",
160 "plotPredReal(data, p_ch_nn, 5)\n",
161 "plotFilaments(data, p_ch_nn$getIndexInData(5))"
162 ]
163 },
164 {
165 "cell_type": "markdown",
166 "metadata": {},
167 "source": [
168 "<h2 style=\"color:blue;font-size:2em\">Pollution par épandage</h2>"
09cf9c19
BA
169 ]
170 },
171 {
172 "cell_type": "code",
173 "execution_count": null,
174 "metadata": {
175 "collapsed": false
176 },
177 "outputs": [],
178 "source": [
179 "p_ep_nn = getForecast(data, seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\"), Inf, 17,\n",
180 " \"Neighbors\", \"Neighbors\", simtype=\"mix\", same_season=FALSE, mix_strategy=\"mult\")\n",
181 "p_ep_pz = getForecast(data, seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\"), Inf, 17,\n",
182 " \"Persistence\", \"Zero\")\n",
183 "p_ep_az = getForecast(data, seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\"), Inf, 17,\n",
184 " \"Average\", \"Zero\")"
185 ]
186 },
187 {
188 "cell_type": "code",
189 "execution_count": null,
190 "metadata": {
191 "collapsed": false
192 },
193 "outputs": [],
194 "source": [
195 "e_ep_nn = getError(data, p_ep_nn, 17)\n",
196 "e_ep_pz = getError(data, p_ep_pz, 17)\n",
197 "e_ep_az = getError(data, p_ep_az, 17)\n",
198 "options(repr.plot.width=9, repr.plot.height=6)\n",
56999439
BA
199 "plotError(list(e_ep_nn, e_ep_pz, e_ep_az), cols=c(1,2,colors()[258]))\n",
200 "\n",
201 "#Noir: neighbors, rouge: persistence, vert: moyenne"
202 ]
203 },
204 {
205 "cell_type": "markdown",
206 "metadata": {},
207 "source": [
208 "Cette fois les deux méthodes naïves font en moyenne moins d'erreurs que Neighbors. Prédiction trop difficile ?"
09cf9c19
BA
209 ]
210 },
211 {
212 "cell_type": "code",
213 "execution_count": null,
214 "metadata": {
215 "collapsed": false
216 },
217 "outputs": [],
218 "source": [
219 "par(mfrow=c(1,2))\n",
220 "options(repr.plot.width=9, repr.plot.height=4)\n",
56999439
BA
221 "plotPredReal(data, p_ep_nn, 4)\n",
222 "plotPredReal(data, p_ep_nn, 6)"
223 ]
224 },
225 {
226 "cell_type": "markdown",
227 "metadata": {},
228 "source": [
229 "À gauche un jour \"bien\" prévu, à droite le pic d'erreur (jour 6)."
09cf9c19
BA
230 ]
231 },
232 {
233 "cell_type": "code",
234 "execution_count": null,
235 "metadata": {
236 "collapsed": false
237 },
238 "outputs": [],
239 "source": [
240 "par(mfrow=c(1,2))\n",
56999439
BA
241 "plotFilaments(data, p_ep_nn$getIndexInData(4))\n",
242 "plotFilaments(data, p_ep_nn$getIndexInData(6))"
09cf9c19
BA
243 ]
244 },
245 {
246 "cell_type": "code",
247 "execution_count": null,
248 "metadata": {
249 "collapsed": false
250 },
251 "outputs": [],
252 "source": [
56999439 253 "par(mfrow=c(1,2))\n",
09cf9c19 254 "plotSimils(p_ep_nn, 4)\n",
56999439
BA
255 "plotSimils(p_ep_nn, 6)"
256 ]
257 },
258 {
259 "cell_type": "markdown",
260 "metadata": {},
261 "source": [
262 "Même observation concernant les poids : concentrés près de zéro pour les prédictions avec peu de voisins."
09cf9c19
BA
263 ]
264 },
265 {
266 "cell_type": "markdown",
267 "metadata": {},
268 "source": [
269 "## Semaine non polluée"
270 ]
271 },
272 {
273 "cell_type": "code",
274 "execution_count": null,
275 "metadata": {
276 "collapsed": false
277 },
278 "outputs": [],
279 "source": [
280 "p_np_nn = getForecast(data, seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\"), Inf, 17,\n",
281 " \"Neighbors\", \"Neighbors\", simtype=\"mix\", same_season=FALSE, mix_strategy=\"mult\")\n",
282 "p_np_pz = getForecast(data, seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\"), Inf, 17,\n",
283 " \"Persistence\", \"Zero\")\n",
284 "p_np_az = getForecast(data, seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\"), Inf, 17,\n",
285 " \"Average\", \"Zero\")"
286 ]
287 },
288 {
289 "cell_type": "code",
290 "execution_count": null,
291 "metadata": {
292 "collapsed": false
293 },
294 "outputs": [],
295 "source": [
296 "e_np_nn = getError(data, p_np_nn, 17)\n",
297 "e_np_pz = getError(data, p_np_pz, 17)\n",
298 "e_np_az = getError(data, p_np_az, 17)\n",
299 "options(repr.plot.width=9, repr.plot.height=6)\n",
56999439
BA
300 "plotError(list(e_np_nn, e_np_pz, e_np_az), cols=c(1,2,colors()[258]))\n",
301 "\n",
302 "#Noir: neighbors, rouge: persistence, vert: moyenne"
303 ]
304 },
305 {
306 "cell_type": "markdown",
307 "metadata": {},
308 "source": [
309 "Performances des méthodes \"Average\" et \"Neighbors\" comparables ; mauvais résultats pour la persistence."
09cf9c19
BA
310 ]
311 },
312 {
313 "cell_type": "code",
314 "execution_count": null,
315 "metadata": {
316 "collapsed": false
317 },
318 "outputs": [],
319 "source": [
320 "par(mfrow=c(1,2))\n",
321 "options(repr.plot.width=9, repr.plot.height=4)\n",
322 "plotPredReal(data, p_np_nn, 3)\n",
56999439
BA
323 "plotPredReal(data, p_np_nn, 6)"
324 ]
325 },
326 {
327 "cell_type": "markdown",
328 "metadata": {},
329 "source": [
330 "Les \"bonnes\" prédictions (à gauche) sont tout de même trop lissées."
09cf9c19
BA
331 ]
332 },
333 {
334 "cell_type": "code",
335 "execution_count": null,
336 "metadata": {
337 "collapsed": false
338 },
339 "outputs": [],
340 "source": [
341 "par(mfrow=c(1,2))\n",
342 "plotFilaments(data, p_np_nn$getIndexInData(3))\n",
56999439
BA
343 "plotFilaments(data, p_np_nn$getIndexInData(6))"
344 ]
345 },
346 {
347 "cell_type": "markdown",
348 "metadata": {},
349 "source": [
350 "Jours \"typiques\", donc beaucoup de voisins."
09cf9c19
BA
351 ]
352 },
353 {
354 "cell_type": "code",
355 "execution_count": null,
356 "metadata": {
357 "collapsed": false
358 },
359 "outputs": [],
360 "source": [
361 "par(mfrow=c(1,3))\n",
362 "plotSimils(p_np_nn, 3)\n",
363 "plotSimils(p_np_nn, 4)\n",
56999439
BA
364 "plotSimils(p_np_nn, 6)"
365 ]
366 },
367 {
368 "cell_type": "markdown",
369 "metadata": {},
370 "source": [
371 "Répartition idéale des poids : quelques uns au-delà de 0.3-0.4, le reste très proche de zéro."
372 ]
373 },
374 {
375 "cell_type": "markdown",
376 "metadata": {},
377 "source": [
378 "## Bilan\n",
379 "\n",
380 "Problème difficile : on ne fait guère mieux qu'une naïve moyenne des lendemains des jours similaires dans le passé.\n",
381 "\n",
382 "Comment améliorer la méthode ?"
09cf9c19
BA
383 ]
384 }
385 ],
386 "metadata": {
387 "kernelspec": {
388 "display_name": "R",
389 "language": "R",
390 "name": "ir"
391 },
392 "language_info": {
393 "codemirror_mode": "r",
394 "file_extension": ".r",
395 "mimetype": "text/x-r-source",
396 "name": "R",
397 "pygments_lexer": "r",
398 "version": "3.3.2"
399 }
400 },
401 "nbformat": 4,
402 "nbformat_minor": 2
403}