on the way to R6 class + remove truncated days (simplifications)
[talweg.git] / reports / report_2017-03-01.13h_average.ipynb
CommitLineData
fa8078f9
BA
1{
2 "cells": [
3 {
4 "cell_type": "code",
1e20780e 5 "execution_count": null,
fa8078f9
BA
6 "metadata": {
7 "collapsed": false
8 },
9 "outputs": [],
10 "source": [
11 "library(talweg)"
12 ]
13 },
14 {
15 "cell_type": "code",
1e20780e 16 "execution_count": null,
fa8078f9
BA
17 "metadata": {
18 "collapsed": false
19 },
20 "outputs": [],
21 "source": [
99f83c9a
BA
22 "ts_data = read.csv(system.file(\"extdata\",\"pm10_mesures_H_loc.csv\",package=\"talweg\"))\n",
23 "exo_data = read.csv(system.file(\"extdata\",\"meteo_extra_noNAs.csv\",package=\"talweg\"))\n",
24 "data = getData(ts_data, exo_data, input_tz = \"Europe/Paris\", working_tz=\"Europe/Paris\", predict_at=13)"
fa8078f9
BA
25 ]
26 },
27 {
28 "cell_type": "markdown",
29 "metadata": {},
30 "source": [
31 "## Introduction\n",
32 "\n",
33 "J'ai fait quelques essais dans différentes configurations pour la méthode \"Neighbors\" (la seule dont on a parlé).<br>Il semble que le mieux soit\n",
34 "\n",
35 " * simtype=\"mix\" : on utilise les similarités endogènes et exogènes (fenêtre optimisée par VC)\n",
36 " * same_season=FALSE : les indices pour la validation croisée ne tiennent pas compte des saisons\n",
37 " * mix_strategy=\"mult\" : on multiplie les poids (au lieu d'en éteindre)\n",
38 "\n",
99f83c9a
BA
39 "(valeurs par défaut).\n",
40 "\n",
fa8078f9
BA
41 "J'ai systématiquement comparé à deux autres approches : la persistence et la moyennes des lendemains des jours \"similaires\" dans tout le passé ; à chaque fois sans prédiction du saut (sauf pour Neighbors : prédiction basée sur les poids calculés).\n",
42 "\n",
43 "Ensuite j'affiche les erreurs, quelques courbes prévues/mesurées, quelques filaments puis les histogrammes de quelques poids. Concernant les graphes de filaments, la moitié gauche du graphe correspond aux jours similaires au jour courant, tandis que la moitié droite affiche les lendemains : ce sont donc les voisinages tels qu'utilisés dans l'algorithme.\n",
44 "\n",
45 "<h2 style=\"color:blue;font-size:2em\">Pollution par chauffage</h2>"
46 ]
47 },
48 {
49 "cell_type": "code",
1e20780e 50 "execution_count": null,
fa8078f9
BA
51 "metadata": {
52 "collapsed": false
53 },
54 "outputs": [],
55 "source": [
99f83c9a
BA
56 "indices_ch = seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\")\n",
57 "p_ch_nn = computeForecast(data,indices_ch, \"Neighbors\", \"Neighbors\", simtype=\"mix\")\n",
58 "p_ch_pz = computeForecast(data, indices_ch, \"Persistence\", \"Zero\", same_day=TRUE)\n",
59 "p_ch_az = computeForecast(data, indices_ch, \"Average\", \"Zero\") #, memory=183)\n",
60 "#p_ch_zz = computeForecast(data, indices_ch, \"Zero\", \"Zero\")"
fa8078f9
BA
61 ]
62 },
63 {
64 "cell_type": "code",
1e20780e 65 "execution_count": null,
fa8078f9
BA
66 "metadata": {
67 "collapsed": false
68 },
1e20780e 69 "outputs": [],
fa8078f9 70 "source": [
99f83c9a
BA
71 "e_ch_nn = computeError(data, p_ch_nn)\n",
72 "e_ch_pz = computeError(data, p_ch_pz)\n",
73 "e_ch_az = computeError(data, p_ch_az)\n",
74 "#e_ch_zz = computeError(data, p_ch_zz)\n",
75 "options(repr.plot.width=9, repr.plot.height=7)\n",
fa8078f9
BA
76 "plotError(list(e_ch_nn, e_ch_pz, e_ch_az), cols=c(1,2,colors()[258]))\n",
77 "\n",
78 "#Noir: neighbors, rouge: persistence, vert: moyenne"
79 ]
80 },
fa8078f9
BA
81 {
82 "cell_type": "code",
1e20780e 83 "execution_count": null,
fa8078f9
BA
84 "metadata": {
85 "collapsed": false
86 },
1e20780e 87 "outputs": [],
fa8078f9
BA
88 "source": [
89 "par(mfrow=c(1,2))\n",
90 "options(repr.plot.width=9, repr.plot.height=4)\n",
91 "plotPredReal(data, p_ch_nn, 3)\n",
92 "plotPredReal(data, p_ch_nn, 4)\n",
93 "\n",
94 "#Bleu: prévue, noir: réalisée"
95 ]
96 },
97 {
98 "cell_type": "markdown",
99 "metadata": {},
100 "source": [
99f83c9a 101 "Prédictions très lisses."
fa8078f9
BA
102 ]
103 },
104 {
105 "cell_type": "code",
1e20780e 106 "execution_count": null,
fa8078f9
BA
107 "metadata": {
108 "collapsed": false
109 },
1e20780e 110 "outputs": [],
fa8078f9
BA
111 "source": [
112 "par(mfrow=c(1,2))\n",
113 "plotPredReal(data, p_ch_az, 3)\n",
114 "plotPredReal(data, p_ch_az, 4)"
115 ]
116 },
fa8078f9
BA
117 {
118 "cell_type": "code",
1e20780e 119 "execution_count": null,
fa8078f9
BA
120 "metadata": {
121 "collapsed": false
122 },
1e20780e 123 "outputs": [],
fa8078f9
BA
124 "source": [
125 "par(mfrow=c(1,2))\n",
99f83c9a
BA
126 "f3_ch = computeFilaments(data, p_ch_nn$getIndexInData(3), plot=TRUE)\n",
127 "f4_ch = computeFilaments(data, p_ch_nn$getIndexInData(4), plot=TRUE)"
fa8078f9
BA
128 ]
129 },
130 {
131 "cell_type": "code",
1e20780e 132 "execution_count": null,
fa8078f9
BA
133 "metadata": {
134 "collapsed": false
135 },
1e20780e 136 "outputs": [],
fa8078f9
BA
137 "source": [
138 "par(mfrow=c(2,2))\n",
99f83c9a
BA
139 "options(repr.plot.width=9, repr.plot.height=7)\n",
140 "plotFilamentsBox(data, f3_ch$indices)\n",
141 "plotFilamentsBox(data, f3_ch$indices+1)\n",
142 "plotFilamentsBox(data, f4_ch$indices)\n",
143 "plotFilamentsBox(data, f4_ch$indices+1)\n",
144 "\n",
145 "#En haut : jour 3 + lendemain (4) ; en bas : jour 4 + lendemain (5)\n",
146 "#À gauche : premières 24h ; à droite : 24h suivantes"
fa8078f9
BA
147 ]
148 },
149 {
150 "cell_type": "markdown",
151 "metadata": {},
152 "source": [
99f83c9a 153 "Peu de voisins, les courbes sont assez isolées (en particulier les lendemains)."
fa8078f9
BA
154 ]
155 },
156 {
157 "cell_type": "code",
158 "execution_count": null,
159 "metadata": {
160 "collapsed": false
161 },
162 "outputs": [],
163 "source": [
164 "par(mfrow=c(1,2))\n",
99f83c9a
BA
165 "options(repr.plot.width=9, repr.plot.height=4)\n",
166 "plotRelativeVariability(data, f3_ch$indices)\n",
167 "plotRelativeVariability(data, f4_ch$indices)\n",
fa8078f9 168 "\n",
99f83c9a 169 "#Variabilité sur 60 courbes au hasard en rouge ; sur nos 60 voisins (+ lendemains) en noir"
fa8078f9
BA
170 ]
171 },
172 {
173 "cell_type": "markdown",
174 "metadata": {},
175 "source": [
99f83c9a 176 "Il faudrait que la courbe noire soit nettement plus basse que la courbe rouge."
fa8078f9
BA
177 ]
178 },
179 {
180 "cell_type": "code",
181 "execution_count": null,
182 "metadata": {
183 "collapsed": false
184 },
185 "outputs": [],
186 "source": [
187 "par(mfrow=c(1,2))\n",
99f83c9a
BA
188 "plotSimils(p_ch_nn, 3)\n",
189 "plotSimils(p_ch_nn, 4)\n",
190 "\n",
191 "#Non pollué à gauche, pollué à droite"
fa8078f9
BA
192 ]
193 },
194 {
195 "cell_type": "markdown",
196 "metadata": {},
197 "source": [
99f83c9a 198 "Poids plus concentrés autour de 0 pour un jour plus pollué."
fa8078f9
BA
199 ]
200 },
201 {
202 "cell_type": "code",
203 "execution_count": null,
204 "metadata": {
205 "collapsed": false
206 },
207 "outputs": [],
208 "source": [
99f83c9a
BA
209 "#Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
210 "p_ch_nn$getParams(3)$window\n",
211 "p_ch_nn$getParams(4)$window"
212 ]
213 },
214 {
215 "cell_type": "markdown",
216 "metadata": {},
217 "source": [
218 "<h2 style=\"color:blue;font-size:2em\">Pollution par épandage</h2>"
fa8078f9
BA
219 ]
220 },
221 {
222 "cell_type": "code",
223 "execution_count": null,
224 "metadata": {
225 "collapsed": false
226 },
227 "outputs": [],
228 "source": [
99f83c9a
BA
229 "indices_ep = seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\")\n",
230 "p_ep_nn = computeForecast(data,indices_ep, \"Neighbors\", \"Neighbors\", simtype=\"mix\")\n",
231 "p_ep_pz = computeForecast(data, indices_ep, \"Persistence\", \"Zero\", same_day=TRUE)\n",
232 "p_ep_az = computeForecast(data, indices_ep, \"Average\", \"Zero\") #, memory=183)\n",
233 "#p_ep_zz = computeForecast(data, indices_ep, \"Zero\", \"Zero\")"
fa8078f9
BA
234 ]
235 },
236 {
237 "cell_type": "code",
238 "execution_count": null,
239 "metadata": {
240 "collapsed": false
241 },
242 "outputs": [],
243 "source": [
99f83c9a
BA
244 "e_ep_nn = computeError(data, p_ep_nn)\n",
245 "e_ep_pz = computeError(data, p_ep_pz)\n",
246 "e_ep_az = computeError(data, p_ep_az)\n",
247 "#e_ep_zz = computeError(data, p_ep_zz)\n",
248 "options(repr.plot.width=9, repr.plot.height=7)\n",
fa8078f9
BA
249 "plotError(list(e_ep_nn, e_ep_pz, e_ep_az), cols=c(1,2,colors()[258]))\n",
250 "\n",
251 "#Noir: neighbors, rouge: persistence, vert: moyenne"
252 ]
253 },
254 {
255 "cell_type": "markdown",
256 "metadata": {},
257 "source": [
99f83c9a 258 "Neighbors et Average comparables, Persistence moins performante."
fa8078f9
BA
259 ]
260 },
261 {
262 "cell_type": "code",
263 "execution_count": null,
264 "metadata": {
265 "collapsed": false
266 },
267 "outputs": [],
268 "source": [
269 "par(mfrow=c(1,2))\n",
270 "options(repr.plot.width=9, repr.plot.height=4)\n",
99f83c9a
BA
271 "plotPredReal(data, p_ep_nn, 6)\n",
272 "plotPredReal(data, p_ep_nn, 3)\n",
fa8078f9
BA
273 "\n",
274 "#Bleu: prévue, noir: réalisée"
275 ]
276 },
277 {
278 "cell_type": "markdown",
279 "metadata": {},
280 "source": [
99f83c9a 281 "À gauche un jour \"bien\" prévu, à droite le pic d'erreur (jour 3)."
fa8078f9
BA
282 ]
283 },
284 {
285 "cell_type": "code",
286 "execution_count": null,
287 "metadata": {
288 "collapsed": false
289 },
290 "outputs": [],
291 "source": [
292 "par(mfrow=c(1,2))\n",
99f83c9a
BA
293 "plotPredReal(data, p_ep_az, 6)\n",
294 "plotPredReal(data, p_ep_az, 3)"
fa8078f9
BA
295 ]
296 },
297 {
298 "cell_type": "markdown",
299 "metadata": {},
300 "source": [
99f83c9a 301 "Average : autre type de prévision."
fa8078f9
BA
302 ]
303 },
304 {
305 "cell_type": "code",
306 "execution_count": null,
307 "metadata": {
308 "collapsed": false
309 },
310 "outputs": [],
311 "source": [
312 "par(mfrow=c(1,2))\n",
99f83c9a
BA
313 "f6_ep = computeFilaments(data, p_ep_nn$getIndexInData(6), plot=TRUE)\n",
314 "f3_ep = computeFilaments(data, p_ep_nn$getIndexInData(3), plot=TRUE)"
315 ]
316 },
317 {
318 "cell_type": "code",
319 "execution_count": null,
320 "metadata": {
321 "collapsed": false
322 },
323 "outputs": [],
324 "source": [
325 "par(mfrow=c(2,2))\n",
326 "options(repr.plot.width=9, repr.plot.height=7)\n",
327 "plotFilamentsBox(data, f6_ep$indices)\n",
328 "plotFilamentsBox(data, f6_ep$indices+1)\n",
329 "plotFilamentsBox(data, f3_ep$indices)\n",
330 "plotFilamentsBox(data, f3_ep$indices+1)\n",
331 "\n",
332 "#En haut : jour 4 + lendemain (5) ; en bas : jour 6 + lendemain (7)\n",
333 "#À gauche : premières 24h ; à droite : 24h suivantes"
fa8078f9
BA
334 ]
335 },
336 {
337 "cell_type": "code",
338 "execution_count": null,
339 "metadata": {
340 "collapsed": false
341 },
342 "outputs": [],
343 "source": [
344 "par(mfrow=c(1,2))\n",
99f83c9a
BA
345 "options(repr.plot.width=9, repr.plot.height=4)\n",
346 "plotRelativeVariability(data, f6_ep$indices)\n",
347 "plotRelativeVariability(data, f3_ep$indices)\n",
348 "\n",
349 "#Variabilité sur 60 courbes au hasard en rouge ; sur nos 60 voisins (+ lendemains) en noir"
350 ]
351 },
352 {
353 "cell_type": "markdown",
354 "metadata": {},
355 "source": [
356 "Il faudrait que la courbe noire soit nettement plus basse que la courbe rouge..."
357 ]
358 },
359 {
360 "cell_type": "code",
361 "execution_count": null,
362 "metadata": {
363 "collapsed": false
364 },
365 "outputs": [],
366 "source": [
367 "par(mfrow=c(1,2))\n",
368 "plotSimils(p_ep_nn, 6)\n",
369 "plotSimils(p_ep_nn, 3)"
fa8078f9
BA
370 ]
371 },
372 {
373 "cell_type": "markdown",
374 "metadata": {},
375 "source": [
376 "Même observation concernant les poids : concentrés près de zéro pour les prédictions avec peu de voisins."
377 ]
378 },
99f83c9a
BA
379 {
380 "cell_type": "code",
381 "execution_count": null,
382 "metadata": {
383 "collapsed": false
384 },
385 "outputs": [],
386 "source": [
387 "#Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
388 "p_ep_nn$getParams(6)$window\n",
389 "p_ep_nn$getParams(3)$window"
390 ]
391 },
fa8078f9
BA
392 {
393 "cell_type": "markdown",
394 "metadata": {},
395 "source": [
99f83c9a 396 "<h2 style=\"color:blue;font-size:2em\">Semaine non polluée</h2>"
fa8078f9
BA
397 ]
398 },
399 {
400 "cell_type": "code",
401 "execution_count": null,
402 "metadata": {
403 "collapsed": false
404 },
405 "outputs": [],
406 "source": [
99f83c9a
BA
407 "indices_np = seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\")\n",
408 "p_np_nn = computeForecast(data,indices_np, \"Neighbors\", \"Neighbors\", simtype=\"mix\")\n",
409 "p_np_pz = computeForecast(data, indices_np, \"Persistence\", \"Zero\", same_day=FALSE)\n",
410 "p_np_az = computeForecast(data, indices_np, \"Average\", \"Zero\") #, memory=183)\n",
411 "#p_np_zz = computeForecast(data, indices_np, \"Zero\", \"Zero\")"
fa8078f9
BA
412 ]
413 },
414 {
415 "cell_type": "code",
416 "execution_count": null,
417 "metadata": {
418 "collapsed": false
419 },
420 "outputs": [],
421 "source": [
99f83c9a
BA
422 "e_np_nn = computeError(data, p_np_nn)\n",
423 "e_np_pz = computeError(data, p_np_pz)\n",
424 "e_np_az = computeError(data, p_np_az)\n",
425 "#e_np_zz = computeError(data, p_np_zz)\n",
426 "options(repr.plot.width=9, repr.plot.height=7)\n",
fa8078f9
BA
427 "plotError(list(e_np_nn, e_np_pz, e_np_az), cols=c(1,2,colors()[258]))\n",
428 "\n",
429 "#Noir: neighbors, rouge: persistence, vert: moyenne"
430 ]
431 },
432 {
433 "cell_type": "markdown",
434 "metadata": {},
435 "source": [
99f83c9a 436 "Performances des méthodes \"Average\" et \"Neighbors\" identiques ; mauvais résultats pour la persistence."
fa8078f9
BA
437 ]
438 },
439 {
440 "cell_type": "code",
441 "execution_count": null,
442 "metadata": {
443 "collapsed": false
444 },
445 "outputs": [],
446 "source": [
447 "par(mfrow=c(1,2))\n",
448 "options(repr.plot.width=9, repr.plot.height=4)\n",
99f83c9a 449 "plotPredReal(data, p_np_nn, 5)\n",
fa8078f9 450 "plotPredReal(data, p_np_nn, 3)\n",
fa8078f9
BA
451 "\n",
452 "#Bleu: prévue, noir: réalisée"
453 ]
454 },
455 {
99f83c9a
BA
456 "cell_type": "code",
457 "execution_count": null,
458 "metadata": {
459 "collapsed": false
460 },
461 "outputs": [],
fa8078f9 462 "source": [
99f83c9a
BA
463 "par(mfrow=c(1,2))\n",
464 "plotPredReal(data, p_np_az, 5)\n",
465 "plotPredReal(data, p_np_az, 3)"
fa8078f9
BA
466 ]
467 },
468 {
469 "cell_type": "code",
470 "execution_count": null,
471 "metadata": {
472 "collapsed": false
473 },
474 "outputs": [],
475 "source": [
476 "par(mfrow=c(1,2))\n",
99f83c9a
BA
477 "f5_np = computeFilaments(data, p_np_nn$getIndexInData(5), plot=TRUE)\n",
478 "f3_np = computeFilaments(data, p_np_nn$getIndexInData(3), plot=TRUE)"
fa8078f9
BA
479 ]
480 },
481 {
482 "cell_type": "markdown",
483 "metadata": {},
484 "source": [
99f83c9a
BA
485 "Jours \"typiques\", donc beaucoup de voisins."
486 ]
487 },
488 {
489 "cell_type": "code",
490 "execution_count": null,
491 "metadata": {
492 "collapsed": false
493 },
494 "outputs": [],
495 "source": [
496 "par(mfrow=c(2,2))\n",
497 "options(repr.plot.width=9, repr.plot.height=7)\n",
498 "plotFilamentsBox(data, f5_np$indices)\n",
499 "plotFilamentsBox(data, f5_np$indices+1)\n",
500 "plotFilamentsBox(data, f3_np$indices)\n",
501 "plotFilamentsBox(data, f3_np$indices+1)\n",
502 "\n",
503 "#En haut : jour 3 + lendemain (4) ; en bas : jour 6 + lendemain (7)\n",
504 "#À gauche : premières 24h ; à droite : 24h suivantes"
fa8078f9
BA
505 ]
506 },
507 {
508 "cell_type": "code",
509 "execution_count": null,
510 "metadata": {
511 "collapsed": false
512 },
513 "outputs": [],
514 "source": [
515 "par(mfrow=c(1,2))\n",
99f83c9a
BA
516 "options(repr.plot.width=9, repr.plot.height=4)\n",
517 "plotRelativeVariability(data, f5_np$indices)\n",
518 "plotRelativeVariability(data, f3_np$indices)\n",
519 "\n",
520 "#Variabilité sur 60 courbes au hasard en rouge ; sur nos 60 voisins (+ lendemains) en noir"
fa8078f9
BA
521 ]
522 },
523 {
524 "cell_type": "markdown",
525 "metadata": {},
526 "source": [
99f83c9a 527 "Situation meilleure que dans les autres cas, mais assez difficile tout de même."
fa8078f9
BA
528 ]
529 },
530 {
531 "cell_type": "code",
532 "execution_count": null,
533 "metadata": {
534 "collapsed": false
535 },
536 "outputs": [],
537 "source": [
99f83c9a
BA
538 "par(mfrow=c(1,2))\n",
539 "plotSimils(p_np_nn, 5)\n",
540 "plotSimils(p_np_nn, 3)"
fa8078f9
BA
541 ]
542 },
543 {
544 "cell_type": "markdown",
545 "metadata": {},
546 "source": [
99f83c9a
BA
547 "Répartition des poids difficile à interpréter."
548 ]
549 },
550 {
551 "cell_type": "code",
552 "execution_count": null,
553 "metadata": {
554 "collapsed": false
555 },
556 "outputs": [],
557 "source": [
558 "#Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
559 "p_np_nn$getParams(5)$window\n",
560 "p_np_nn$getParams(3)$window"
fa8078f9
BA
561 ]
562 },
563 {
564 "cell_type": "markdown",
565 "metadata": {},
566 "source": [
567 "## Bilan\n",
568 "\n",
569 "Problème difficile : on ne fait guère mieux qu'une naïve moyenne des lendemains des jours similaires dans le passé, ce qui n'est pas loin de prédire une série constante égale à la dernière valeur observée (méthode \"zéro\"). La persistence donne parfois de bons résultats mais est trop instable (sensibilité à l'argument <code>same_day</code>).\n",
570 "\n",
571 "Comment améliorer la méthode ?"
572 ]
573 }
574 ],
575 "metadata": {
576 "kernelspec": {
577 "display_name": "R",
578 "language": "R",
579 "name": "ir"
580 },
581 "language_info": {
582 "codemirror_mode": "r",
583 "file_extension": ".r",
584 "mimetype": "text/x-r-source",
585 "name": "R",
586 "pygments_lexer": "r",
587 "version": "3.3.2"
588 }
589 },
590 "nbformat": 4,
591 "nbformat_minor": 2
592}