a few adjustments: TODO: bash script to re-reun reports
[talweg.git] / reports / report.ipynb
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "\n",
8 "\n",
9 "<h2>Introduction</h2>\n",
10 "\n",
11 "J'ai fait quelques essais dans différentes configurations pour la méthode \"Neighbors\"\n",
12 "(la seule dont on a parlé).<br>Il semble que le mieux soit\n",
13 "\n",
14 " * simtype=\"exo\" ou \"mix\" : similarités exogènes avec/sans endogènes (fenêtre optimisée par VC)\n",
15 " * same_season=FALSE : les indices pour la validation croisée ne tiennent pas compte des saisons\n",
16 " * mix_strategy=\"mult\" : on multiplie les poids (au lieu d'en éteindre)\n",
17 "\n",
18 "J'ai systématiquement comparé à une approche naïve : la moyennes des lendemains des jours\n",
19 "\"similaires\" dans tout le passé ; à chaque fois sans prédiction du saut (sauf pour Neighbors :\n",
20 "prédiction basée sur les poids calculés).\n",
21 "\n",
22 "Ensuite j'affiche les erreurs, quelques courbes prévues/mesurées, quelques filaments puis les\n",
23 "histogrammes de quelques poids. Concernant les graphes de filaments, la moitié gauche du graphe\n",
24 "correspond aux jours similaires au jour courant, tandis que la moitié droite affiche les\n",
25 "lendemains : ce sont donc les voisinages tels qu'utilisés dans l'algorithme.\n",
26 "\n"
27 ]
28 },
29 {
30 "cell_type": "code",
31 "execution_count": null,
32 "metadata": {
33 "collapsed": false
34 },
35 "outputs": [],
36 "source": [
37 "library(talweg)\n",
38 "\n",
39 "ts_data = read.csv(system.file(\"extdata\",\"pm10_mesures_H_loc_report.csv\",package=\"talweg\"))\n",
40 "exo_data = read.csv(system.file(\"extdata\",\"meteo_extra_noNAs.csv\",package=\"talweg\"))\n",
41 "data = getData(ts_data, exo_data, input_tz = \"Europe/Paris\", working_tz=\"Europe/Paris\", predict_at=7)\n",
42 "\n",
43 "indices_ch = seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\")\n",
44 "indices_ep = seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\")\n",
45 "indices_np = seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\")\n",
46 "\n",
47 "H = 3 #predict from 2pm to 4pm"
48 ]
49 },
50 {
51 "cell_type": "markdown",
52 "metadata": {},
53 "source": [
54 "\n",
55 "\n",
56 "<h2 style=\"color:blue;font-size:2em\">Pollution par chauffage</h2>"
57 ]
58 },
59 {
60 "cell_type": "code",
61 "execution_count": null,
62 "metadata": {
63 "collapsed": false
64 },
65 "outputs": [],
66 "source": [
67 "p_nn_exo = computeForecast(data, indices_ch, \"Neighbors\", \"Neighbors\", simtype=\"exo\", horizon=H)\n",
68 "p_nn_mix = computeForecast(data, indices_ch, \"Neighbors\", \"Neighbors\", simtype=\"mix\", horizon=H)\n",
69 "p_az = computeForecast(data, indices_ch, \"Average\", \"Zero\", horizon=H) #, memory=183)\n",
70 "p_pz = computeForecast(data, indices_ch, \"Persistence\", \"Zero\", horizon=H, same_day=TRUE)"
71 ]
72 },
73 {
74 "cell_type": "code",
75 "execution_count": null,
76 "metadata": {
77 "collapsed": false
78 },
79 "outputs": [],
80 "source": [
81 "e_nn_exo = computeError(data, p_nn_exo, H)\n",
82 "e_nn_mix = computeError(data, p_nn_mix, H)\n",
83 "e_az = computeError(data, p_az, H)\n",
84 "e_pz = computeError(data, p_pz, H)\n",
85 "\n",
86 "options(repr.plot.width=9, repr.plot.height=7)\n",
87 "plotError(list(e_nn_mix, e_pz, e_az, e_nn_exo), cols=c(1,2,colors()[258], 4))\n",
88 "\n",
89 "# Noir: neighbors_mix, bleu: neighbors_exo, vert: moyenne, rouge: persistence\n",
90 "\n",
91 "i_np = which.min(e_nn_exo$abs$indices)\n",
92 "i_p = which.max(e_nn_exo$abs$indices)"
93 ]
94 },
95 {
96 "cell_type": "code",
97 "execution_count": null,
98 "metadata": {
99 "collapsed": false
100 },
101 "outputs": [],
102 "source": [
103 "options(repr.plot.width=9, repr.plot.height=4)\n",
104 "par(mfrow=c(1,2))\n",
105 "\n",
106 "plotPredReal(data, p_nn_exo, i_np); title(paste(\"PredReal nn exo day\",i_np))\n",
107 "plotPredReal(data, p_nn_exo, i_p); title(paste(\"PredReal nn exo day\",i_p))\n",
108 "\n",
109 "plotPredReal(data, p_nn_mix, i_np); title(paste(\"PredReal nn mix day\",i_np))\n",
110 "plotPredReal(data, p_nn_mix, i_p); title(paste(\"PredReal nn mix day\",i_p))\n",
111 "\n",
112 "plotPredReal(data, p_az, i_np); title(paste(\"PredReal az day\",i_np))\n",
113 "plotPredReal(data, p_az, i_p); title(paste(\"PredReal az day\",i_p))\n",
114 "\n",
115 "# Bleu: prévue, noir: réalisée"
116 ]
117 },
118 {
119 "cell_type": "code",
120 "execution_count": null,
121 "metadata": {
122 "collapsed": false
123 },
124 "outputs": [],
125 "source": [
126 "par(mfrow=c(1,2))\n",
127 "f_np_exo = computeFilaments(data, p_nn_exo, i_np, plot=TRUE); title(paste(\"Filaments nn exo day\",i_np))\n",
128 "f_p_exo = computeFilaments(data, p_nn_exo, i_p, plot=TRUE); title(paste(\"Filaments nn exo day\",i_p))\n",
129 "\n",
130 "f_np_mix = computeFilaments(data, p_nn_mix, i_np, plot=TRUE); title(paste(\"Filaments nn mix day\",i_np))\n",
131 "f_p_mix = computeFilaments(data, p_nn_mix, i_p, plot=TRUE); title(paste(\"Filaments nn mix day\",i_p))"
132 ]
133 },
134 {
135 "cell_type": "code",
136 "execution_count": null,
137 "metadata": {
138 "collapsed": false
139 },
140 "outputs": [],
141 "source": [
142 "par(mfrow=c(1,2))\n",
143 "plotFilamentsBox(data, f_np_exo); title(paste(\"FilBox nn exo day\",i_np))\n",
144 "plotFilamentsBox(data, f_p_exo); title(paste(\"FilBox nn exo day\",i_p))\n",
145 "\n",
146 "plotFilamentsBox(data, f_np_mix); title(paste(\"FilBox nn mix day\",i_np))\n",
147 "plotFilamentsBox(data, f_p_mix); title(paste(\"FilBox nn mix day\",i_p))"
148 ]
149 },
150 {
151 "cell_type": "code",
152 "execution_count": null,
153 "metadata": {
154 "collapsed": false
155 },
156 "outputs": [],
157 "source": [
158 "par(mfrow=c(1,2))\n",
159 "plotRelVar(data, f_np_exo); title(paste(\"StdDev nn exo day\",i_np))\n",
160 "plotRelVar(data, f_p_exo); title(paste(\"StdDev nn exo day\",i_p))\n",
161 "\n",
162 "plotRelVar(data, f_np_mix); title(paste(\"StdDev nn mix day\",i_np))\n",
163 "plotRelVar(data, f_p_mix); title(paste(\"StdDev nn mix day\",i_p))\n",
164 "\n",
165 "# Variabilité globale en rouge ; sur les 60 voisins (+ lendemains) en noir"
166 ]
167 },
168 {
169 "cell_type": "code",
170 "execution_count": null,
171 "metadata": {
172 "collapsed": false
173 },
174 "outputs": [],
175 "source": [
176 "par(mfrow=c(1,2))\n",
177 "plotSimils(p_nn_exo, i_np); title(paste(\"Weights nn exo day\",i_np))\n",
178 "plotSimils(p_nn_exo, i_p); title(paste(\"Weights nn exo day\",i_p))\n",
179 "\n",
180 "plotSimils(p_nn_mix, i_np); title(paste(\"Weights nn mix day\",i_np))\n",
181 "plotSimils(p_nn_mix, i_p); title(paste(\"Weights nn mix day\",i_p))\n",
182 "\n",
183 "# - pollué à gauche, + pollué à droite"
184 ]
185 },
186 {
187 "cell_type": "code",
188 "execution_count": null,
189 "metadata": {
190 "collapsed": false
191 },
192 "outputs": [],
193 "source": [
194 "# Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
195 "p_nn_exo$getParams(i_np)$window\n",
196 "p_nn_exo$getParams(i_p)$window\n",
197 "\n",
198 "p_nn_mix$getParams(i_np)$window\n",
199 "p_nn_mix$getParams(i_p)$window"
200 ]
201 },
202 {
203 "cell_type": "markdown",
204 "metadata": {},
205 "source": [
206 "\n",
207 "\n",
208 "<h2 style=\"color:blue;font-size:2em\">Pollution par épandage</h2>"
209 ]
210 },
211 {
212 "cell_type": "code",
213 "execution_count": null,
214 "metadata": {
215 "collapsed": false
216 },
217 "outputs": [],
218 "source": [
219 "p_nn_exo = computeForecast(data, indices_ep, \"Neighbors\", \"Neighbors\", simtype=\"exo\", horizon=H)\n",
220 "p_nn_mix = computeForecast(data, indices_ep, \"Neighbors\", \"Neighbors\", simtype=\"mix\", horizon=H)\n",
221 "p_az = computeForecast(data, indices_ep, \"Average\", \"Zero\", horizon=H) #, memory=183)\n",
222 "p_pz = computeForecast(data, indices_ep, \"Persistence\", \"Zero\", horizon=H, same_day=TRUE)"
223 ]
224 },
225 {
226 "cell_type": "code",
227 "execution_count": null,
228 "metadata": {
229 "collapsed": false
230 },
231 "outputs": [],
232 "source": [
233 "e_nn_exo = computeError(data, p_nn_exo, H)\n",
234 "e_nn_mix = computeError(data, p_nn_mix, H)\n",
235 "e_az = computeError(data, p_az, H)\n",
236 "e_pz = computeError(data, p_pz, H)\n",
237 "options(repr.plot.width=9, repr.plot.height=7)\n",
238 "plotError(list(e_nn_mix, e_pz, e_az, e_nn_exo), cols=c(1,2,colors()[258], 4))\n",
239 "\n",
240 "# Noir: neighbors_mix, bleu: neighbors_exo, vert: moyenne, rouge: persistence\n",
241 "\n",
242 "i_np = which.min(e_nn_exo$abs$indices)\n",
243 "i_p = which.max(e_nn_exo$abs$indices)"
244 ]
245 },
246 {
247 "cell_type": "code",
248 "execution_count": null,
249 "metadata": {
250 "collapsed": false
251 },
252 "outputs": [],
253 "source": [
254 "options(repr.plot.width=9, repr.plot.height=4)\n",
255 "par(mfrow=c(1,2))\n",
256 "\n",
257 "plotPredReal(data, p_nn_exo, i_np); title(paste(\"PredReal nn exo day\",i_np))\n",
258 "plotPredReal(data, p_nn_exo, i_p); title(paste(\"PredReal nn exo day\",i_p))\n",
259 "\n",
260 "plotPredReal(data, p_nn_mix, i_np); title(paste(\"PredReal nn mix day\",i_np))\n",
261 "plotPredReal(data, p_nn_mix, i_p); title(paste(\"PredReal nn mix day\",i_p))\n",
262 "\n",
263 "plotPredReal(data, p_az, i_np); title(paste(\"PredReal az day\",i_np))\n",
264 "plotPredReal(data, p_az, i_p); title(paste(\"PredReal az day\",i_p))\n",
265 "\n",
266 "# Bleu: prévue, noir: réalisée"
267 ]
268 },
269 {
270 "cell_type": "code",
271 "execution_count": null,
272 "metadata": {
273 "collapsed": false
274 },
275 "outputs": [],
276 "source": [
277 "par(mfrow=c(1,2))\n",
278 "f_np_exo = computeFilaments(data, p_nn_exo, i_np, plot=TRUE); title(paste(\"Filaments nn exo day\",i_np))\n",
279 "f_p_exo = computeFilaments(data, p_nn_exo, i_p, plot=TRUE); title(paste(\"Filaments nn exo day\",i_p))\n",
280 "\n",
281 "f_np_mix = computeFilaments(data, p_nn_mix, i_np, plot=TRUE); title(paste(\"Filaments nn mix day\",i_np))\n",
282 "f_p_mix = computeFilaments(data, p_nn_mix, i_p, plot=TRUE); title(paste(\"Filaments nn mix day\",i_p))"
283 ]
284 },
285 {
286 "cell_type": "code",
287 "execution_count": null,
288 "metadata": {
289 "collapsed": false
290 },
291 "outputs": [],
292 "source": [
293 "par(mfrow=c(1,2))\n",
294 "plotFilamentsBox(data, f_np_exo); title(paste(\"FilBox nn exo day\",i_np))\n",
295 "plotFilamentsBox(data, f_p_exo); title(paste(\"FilBox nn exo day\",i_p))\n",
296 "\n",
297 "plotFilamentsBox(data, f_np_mix); title(paste(\"FilBox nn mix day\",i_np))\n",
298 "plotFilamentsBox(data, f_p_mix); title(paste(\"FilBox nn mix day\",i_p))"
299 ]
300 },
301 {
302 "cell_type": "code",
303 "execution_count": null,
304 "metadata": {
305 "collapsed": false
306 },
307 "outputs": [],
308 "source": [
309 "par(mfrow=c(1,2))\n",
310 "plotRelVar(data, f_np_exo); title(paste(\"StdDev nn exo day\",i_np))\n",
311 "plotRelVar(data, f_p_exo); title(paste(\"StdDev nn exo day\",i_p))\n",
312 "\n",
313 "plotRelVar(data, f_np_mix); title(paste(\"StdDev nn mix day\",i_np))\n",
314 "plotRelVar(data, f_p_mix); title(paste(\"StdDev nn mix day\",i_p))\n",
315 "\n",
316 "# Variabilité globale en rouge ; sur les 60 voisins (+ lendemains) en noir"
317 ]
318 },
319 {
320 "cell_type": "code",
321 "execution_count": null,
322 "metadata": {
323 "collapsed": false
324 },
325 "outputs": [],
326 "source": [
327 "par(mfrow=c(1,2))\n",
328 "plotSimils(p_nn_exo, i_np); title(paste(\"Weights nn exo day\",i_np))\n",
329 "plotSimils(p_nn_exo, i_p); title(paste(\"Weights nn exo day\",i_p))\n",
330 "\n",
331 "plotSimils(p_nn_mix, i_np); title(paste(\"Weights nn mix day\",i_np))\n",
332 "plotSimils(p_nn_mix, i_p); title(paste(\"Weights nn mix day\",i_p))\n",
333 "\n",
334 "# - pollué à gauche, + pollué à droite"
335 ]
336 },
337 {
338 "cell_type": "code",
339 "execution_count": null,
340 "metadata": {
341 "collapsed": false
342 },
343 "outputs": [],
344 "source": [
345 "# Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
346 "p_nn_exo$getParams(i_np)$window\n",
347 "p_nn_exo$getParams(i_p)$window\n",
348 "\n",
349 "p_nn_mix$getParams(i_np)$window\n",
350 "p_nn_mix$getParams(i_p)$window"
351 ]
352 },
353 {
354 "cell_type": "markdown",
355 "metadata": {},
356 "source": [
357 "\n",
358 "\n",
359 "<h2 style=\"color:blue;font-size:2em\">Semaine non polluée</h2>"
360 ]
361 },
362 {
363 "cell_type": "code",
364 "execution_count": null,
365 "metadata": {
366 "collapsed": false
367 },
368 "outputs": [],
369 "source": [
370 "p_nn_exo = computeForecast(data, indices_np, \"Neighbors\", \"Neighbors\", simtype=\"exo\", horizon=H)\n",
371 "p_nn_mix = computeForecast(data, indices_np, \"Neighbors\", \"Neighbors\", simtype=\"mix\", horizon=H)\n",
372 "p_az = computeForecast(data, indices_np, \"Average\", \"Zero\", horizon=H) #, memory=183)\n",
373 "p_pz = computeForecast(data, indices_np, \"Persistence\", \"Zero\", horizon=H, same_day=FALSE)"
374 ]
375 },
376 {
377 "cell_type": "code",
378 "execution_count": null,
379 "metadata": {
380 "collapsed": false
381 },
382 "outputs": [],
383 "source": [
384 "e_nn_exo = computeError(data, p_nn_exo, H)\n",
385 "e_nn_mix = computeError(data, p_nn_mix, H)\n",
386 "e_az = computeError(data, p_az, H)\n",
387 "e_pz = computeError(data, p_pz, H)\n",
388 "options(repr.plot.width=9, repr.plot.height=7)\n",
389 "plotError(list(e_nn_mix, e_pz, e_az, e_nn_exo), cols=c(1,2,colors()[258], 4))\n",
390 "\n",
391 "# Noir: neighbors_mix, bleu: neighbors_exo, vert: moyenne, rouge: persistence\n",
392 "\n",
393 "i_np = which.min(e_nn_exo$abs$indices)\n",
394 "i_p = which.max(e_nn_exo$abs$indices)"
395 ]
396 },
397 {
398 "cell_type": "code",
399 "execution_count": null,
400 "metadata": {
401 "collapsed": false
402 },
403 "outputs": [],
404 "source": [
405 "options(repr.plot.width=9, repr.plot.height=4)\n",
406 "par(mfrow=c(1,2))\n",
407 "\n",
408 "plotPredReal(data, p_nn_exo, i_np); title(paste(\"PredReal nn exo day\",i_np))\n",
409 "plotPredReal(data, p_nn_exo, i_p); title(paste(\"PredReal nn exo day\",i_p))\n",
410 "\n",
411 "plotPredReal(data, p_nn_mix, i_np); title(paste(\"PredReal nn mix day\",i_np))\n",
412 "plotPredReal(data, p_nn_mix, i_p); title(paste(\"PredReal nn mix day\",i_p))\n",
413 "\n",
414 "plotPredReal(data, p_az, i_np); title(paste(\"PredReal az day\",i_np))\n",
415 "plotPredReal(data, p_az, i_p); title(paste(\"PredReal az day\",i_p))\n",
416 "\n",
417 "# Bleu: prévue, noir: réalisée"
418 ]
419 },
420 {
421 "cell_type": "code",
422 "execution_count": null,
423 "metadata": {
424 "collapsed": false
425 },
426 "outputs": [],
427 "source": [
428 "par(mfrow=c(1,2))\n",
429 "f_np_exo = computeFilaments(data, p_nn_exo, i_np, plot=TRUE); title(paste(\"Filaments nn exo day\",i_np))\n",
430 "f_p_exo = computeFilaments(data, p_nn_exo, i_p, plot=TRUE); title(paste(\"Filaments nn exo day\",i_p))\n",
431 "\n",
432 "f_np_mix = computeFilaments(data, p_nn_mix, i_np, plot=TRUE); title(paste(\"Filaments nn mix day\",i_np))\n",
433 "f_p_mix = computeFilaments(data, p_nn_mix, i_p, plot=TRUE); title(paste(\"Filaments nn mix day\",i_p))"
434 ]
435 },
436 {
437 "cell_type": "code",
438 "execution_count": null,
439 "metadata": {
440 "collapsed": false
441 },
442 "outputs": [],
443 "source": [
444 "par(mfrow=c(1,2))\n",
445 "plotFilamentsBox(data, f_np_exo); title(paste(\"FilBox nn exo day\",i_np))\n",
446 "plotFilamentsBox(data, f_p_exo); title(paste(\"FilBox nn exo day\",i_p))\n",
447 "\n",
448 "plotFilamentsBox(data, f_np_mix); title(paste(\"FilBox nn mix day\",i_np))\n",
449 "plotFilamentsBox(data, f_p_mix); title(paste(\"FilBox nn mix day\",i_p))"
450 ]
451 },
452 {
453 "cell_type": "code",
454 "execution_count": null,
455 "metadata": {
456 "collapsed": false
457 },
458 "outputs": [],
459 "source": [
460 "par(mfrow=c(1,2))\n",
461 "plotRelVar(data, f_np_exo); title(paste(\"StdDev nn exo day\",i_np))\n",
462 "plotRelVar(data, f_p_exo); title(paste(\"StdDev nn exo day\",i_p))\n",
463 "\n",
464 "plotRelVar(data, f_np_mix); title(paste(\"StdDev nn mix day\",i_np))\n",
465 "plotRelVar(data, f_p_mix); title(paste(\"StdDev nn mix day\",i_p))\n",
466 "\n",
467 "# Variabilité globale en rouge ; sur les 60 voisins (+ lendemains) en noir"
468 ]
469 },
470 {
471 "cell_type": "code",
472 "execution_count": null,
473 "metadata": {
474 "collapsed": false
475 },
476 "outputs": [],
477 "source": [
478 "par(mfrow=c(1,2))\n",
479 "plotSimils(p_nn_exo, i_np); title(paste(\"Weights nn exo day\",i_np))\n",
480 "plotSimils(p_nn_exo, i_p); title(paste(\"Weights nn exo day\",i_p))\n",
481 "\n",
482 "plotSimils(p_nn_mix, i_np); title(paste(\"Weights nn mix day\",i_np))\n",
483 "plotSimils(p_nn_mix, i_p); title(paste(\"Weights nn mix day\",i_p))\n",
484 "\n",
485 "# - pollué à gauche, + pollué à droite"
486 ]
487 },
488 {
489 "cell_type": "code",
490 "execution_count": null,
491 "metadata": {
492 "collapsed": false
493 },
494 "outputs": [],
495 "source": [
496 "# Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
497 "p_nn_exo$getParams(i_np)$window\n",
498 "p_nn_exo$getParams(i_p)$window\n",
499 "\n",
500 "p_nn_mix$getParams(i_np)$window\n",
501 "p_nn_mix$getParams(i_p)$window"
502 ]
503 },
504 {
505 "cell_type": "markdown",
506 "metadata": {},
507 "source": [
508 "\n",
509 "\n",
510 "<h2>Bilan</h2>\n",
511 "\n",
512 "Problème difficile : on ne fait guère mieux qu'une naïve moyenne des lendemains des jours\n",
513 "similaires dans le passé, ce qui n'est pas loin de prédire une série constante égale à la\n",
514 "dernière valeur observée (méthode \"zéro\"). La persistence donne parfois de bons résultats\n",
515 "mais est trop instable (sensibilité à l'argument <code>same_day</code>).\n",
516 "\n",
517 "Comment améliorer la méthode ?"
518 ]
519 }
520 ],
521 "metadata": {
522 "kernelspec": {
523 "display_name": "R",
524 "language": "R",
525 "name": "ir"
526 },
527 "language_info": {
528 "codemirror_mode": "r",
529 "file_extension": ".r",
530 "mimetype": "text/x-r-source",
531 "name": "R",
532 "pygments_lexer": "r",
533 "version": "3.3.3"
534 }
535 },
536 "nbformat": 4,
537 "nbformat_minor": 2
538 }