a few adjustments: TODO: bash script to re-reun reports
[talweg.git] / reports / report.ipynb
CommitLineData
ff5df8e3
BA
1{
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "\n",
8 "\n",
9 "<h2>Introduction</h2>\n",
10 "\n",
11 "J'ai fait quelques essais dans différentes configurations pour la méthode \"Neighbors\"\n",
12 "(la seule dont on a parlé).<br>Il semble que le mieux soit\n",
13 "\n",
14 " * simtype=\"exo\" ou \"mix\" : similarités exogènes avec/sans endogènes (fenêtre optimisée par VC)\n",
15 " * same_season=FALSE : les indices pour la validation croisée ne tiennent pas compte des saisons\n",
16 " * mix_strategy=\"mult\" : on multiplie les poids (au lieu d'en éteindre)\n",
17 "\n",
18 "J'ai systématiquement comparé à une approche naïve : la moyennes des lendemains des jours\n",
19 "\"similaires\" dans tout le passé ; à chaque fois sans prédiction du saut (sauf pour Neighbors :\n",
20 "prédiction basée sur les poids calculés).\n",
21 "\n",
22 "Ensuite j'affiche les erreurs, quelques courbes prévues/mesurées, quelques filaments puis les\n",
23 "histogrammes de quelques poids. Concernant les graphes de filaments, la moitié gauche du graphe\n",
24 "correspond aux jours similaires au jour courant, tandis que la moitié droite affiche les\n",
25 "lendemains : ce sont donc les voisinages tels qu'utilisés dans l'algorithme.\n",
26 "\n"
27 ]
28 },
29 {
30 "cell_type": "code",
31 "execution_count": null,
32 "metadata": {
33 "collapsed": false
34 },
35 "outputs": [],
36 "source": [
37 "library(talweg)\n",
38 "\n",
39 "ts_data = read.csv(system.file(\"extdata\",\"pm10_mesures_H_loc_report.csv\",package=\"talweg\"))\n",
40 "exo_data = read.csv(system.file(\"extdata\",\"meteo_extra_noNAs.csv\",package=\"talweg\"))\n",
d4841a3f 41 "data = getData(ts_data, exo_data, input_tz = \"Europe/Paris\", working_tz=\"Europe/Paris\", predict_at=7)\n",
ff5df8e3
BA
42 "\n",
43 "indices_ch = seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\")\n",
44 "indices_ep = seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\")\n",
d4841a3f
BA
45 "indices_np = seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\")\n",
46 "\n",
47 "H = 3 #predict from 2pm to 4pm"
ff5df8e3
BA
48 ]
49 },
50 {
51 "cell_type": "markdown",
52 "metadata": {},
53 "source": [
54 "\n",
55 "\n",
56 "<h2 style=\"color:blue;font-size:2em\">Pollution par chauffage</h2>"
57 ]
58 },
59 {
60 "cell_type": "code",
61 "execution_count": null,
62 "metadata": {
63 "collapsed": false
64 },
65 "outputs": [],
66 "source": [
67 "p_nn_exo = computeForecast(data, indices_ch, \"Neighbors\", \"Neighbors\", simtype=\"exo\", horizon=H)\n",
68 "p_nn_mix = computeForecast(data, indices_ch, \"Neighbors\", \"Neighbors\", simtype=\"mix\", horizon=H)\n",
69 "p_az = computeForecast(data, indices_ch, \"Average\", \"Zero\", horizon=H) #, memory=183)\n",
70 "p_pz = computeForecast(data, indices_ch, \"Persistence\", \"Zero\", horizon=H, same_day=TRUE)"
71 ]
72 },
73 {
74 "cell_type": "code",
75 "execution_count": null,
76 "metadata": {
77 "collapsed": false
78 },
79 "outputs": [],
80 "source": [
d4841a3f
BA
81 "e_nn_exo = computeError(data, p_nn_exo, H)\n",
82 "e_nn_mix = computeError(data, p_nn_mix, H)\n",
83 "e_az = computeError(data, p_az, H)\n",
84 "e_pz = computeError(data, p_pz, H)\n",
85 "\n",
ff5df8e3
BA
86 "options(repr.plot.width=9, repr.plot.height=7)\n",
87 "plotError(list(e_nn_mix, e_pz, e_az, e_nn_exo), cols=c(1,2,colors()[258], 4))\n",
88 "\n",
89 "# Noir: neighbors_mix, bleu: neighbors_exo, vert: moyenne, rouge: persistence\n",
90 "\n",
91 "i_np = which.min(e_nn_exo$abs$indices)\n",
92 "i_p = which.max(e_nn_exo$abs$indices)"
93 ]
94 },
95 {
96 "cell_type": "code",
97 "execution_count": null,
98 "metadata": {
99 "collapsed": false
100 },
101 "outputs": [],
102 "source": [
103 "options(repr.plot.width=9, repr.plot.height=4)\n",
104 "par(mfrow=c(1,2))\n",
105 "\n",
106 "plotPredReal(data, p_nn_exo, i_np); title(paste(\"PredReal nn exo day\",i_np))\n",
107 "plotPredReal(data, p_nn_exo, i_p); title(paste(\"PredReal nn exo day\",i_p))\n",
108 "\n",
109 "plotPredReal(data, p_nn_mix, i_np); title(paste(\"PredReal nn mix day\",i_np))\n",
110 "plotPredReal(data, p_nn_mix, i_p); title(paste(\"PredReal nn mix day\",i_p))\n",
111 "\n",
112 "plotPredReal(data, p_az, i_np); title(paste(\"PredReal az day\",i_np))\n",
113 "plotPredReal(data, p_az, i_p); title(paste(\"PredReal az day\",i_p))\n",
114 "\n",
115 "# Bleu: prévue, noir: réalisée"
116 ]
117 },
118 {
119 "cell_type": "code",
120 "execution_count": null,
121 "metadata": {
122 "collapsed": false
123 },
124 "outputs": [],
125 "source": [
126 "par(mfrow=c(1,2))\n",
127 "f_np_exo = computeFilaments(data, p_nn_exo, i_np, plot=TRUE); title(paste(\"Filaments nn exo day\",i_np))\n",
128 "f_p_exo = computeFilaments(data, p_nn_exo, i_p, plot=TRUE); title(paste(\"Filaments nn exo day\",i_p))\n",
129 "\n",
130 "f_np_mix = computeFilaments(data, p_nn_mix, i_np, plot=TRUE); title(paste(\"Filaments nn mix day\",i_np))\n",
131 "f_p_mix = computeFilaments(data, p_nn_mix, i_p, plot=TRUE); title(paste(\"Filaments nn mix day\",i_p))"
132 ]
133 },
134 {
135 "cell_type": "code",
136 "execution_count": null,
137 "metadata": {
138 "collapsed": false
139 },
140 "outputs": [],
141 "source": [
142 "par(mfrow=c(1,2))\n",
143 "plotFilamentsBox(data, f_np_exo); title(paste(\"FilBox nn exo day\",i_np))\n",
144 "plotFilamentsBox(data, f_p_exo); title(paste(\"FilBox nn exo day\",i_p))\n",
145 "\n",
146 "plotFilamentsBox(data, f_np_mix); title(paste(\"FilBox nn mix day\",i_np))\n",
147 "plotFilamentsBox(data, f_p_mix); title(paste(\"FilBox nn mix day\",i_p))"
148 ]
149 },
150 {
151 "cell_type": "code",
152 "execution_count": null,
153 "metadata": {
154 "collapsed": false
155 },
156 "outputs": [],
157 "source": [
158 "par(mfrow=c(1,2))\n",
159 "plotRelVar(data, f_np_exo); title(paste(\"StdDev nn exo day\",i_np))\n",
160 "plotRelVar(data, f_p_exo); title(paste(\"StdDev nn exo day\",i_p))\n",
161 "\n",
162 "plotRelVar(data, f_np_mix); title(paste(\"StdDev nn mix day\",i_np))\n",
163 "plotRelVar(data, f_p_mix); title(paste(\"StdDev nn mix day\",i_p))\n",
164 "\n",
165 "# Variabilité globale en rouge ; sur les 60 voisins (+ lendemains) en noir"
166 ]
167 },
168 {
169 "cell_type": "code",
170 "execution_count": null,
171 "metadata": {
172 "collapsed": false
173 },
174 "outputs": [],
175 "source": [
176 "par(mfrow=c(1,2))\n",
177 "plotSimils(p_nn_exo, i_np); title(paste(\"Weights nn exo day\",i_np))\n",
178 "plotSimils(p_nn_exo, i_p); title(paste(\"Weights nn exo day\",i_p))\n",
179 "\n",
180 "plotSimils(p_nn_mix, i_np); title(paste(\"Weights nn mix day\",i_np))\n",
d4841a3f 181 "plotSimils(p_nn_mix, i_p); title(paste(\"Weights nn mix day\",i_p))\n",
ff5df8e3
BA
182 "\n",
183 "# - pollué à gauche, + pollué à droite"
184 ]
185 },
186 {
187 "cell_type": "code",
188 "execution_count": null,
189 "metadata": {
190 "collapsed": false
191 },
192 "outputs": [],
193 "source": [
194 "# Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
195 "p_nn_exo$getParams(i_np)$window\n",
196 "p_nn_exo$getParams(i_p)$window\n",
197 "\n",
198 "p_nn_mix$getParams(i_np)$window\n",
199 "p_nn_mix$getParams(i_p)$window"
200 ]
201 },
202 {
203 "cell_type": "markdown",
204 "metadata": {},
205 "source": [
206 "\n",
207 "\n",
208 "<h2 style=\"color:blue;font-size:2em\">Pollution par épandage</h2>"
209 ]
210 },
211 {
212 "cell_type": "code",
213 "execution_count": null,
214 "metadata": {
215 "collapsed": false
216 },
217 "outputs": [],
218 "source": [
219 "p_nn_exo = computeForecast(data, indices_ep, \"Neighbors\", \"Neighbors\", simtype=\"exo\", horizon=H)\n",
220 "p_nn_mix = computeForecast(data, indices_ep, \"Neighbors\", \"Neighbors\", simtype=\"mix\", horizon=H)\n",
221 "p_az = computeForecast(data, indices_ep, \"Average\", \"Zero\", horizon=H) #, memory=183)\n",
222 "p_pz = computeForecast(data, indices_ep, \"Persistence\", \"Zero\", horizon=H, same_day=TRUE)"
223 ]
224 },
225 {
226 "cell_type": "code",
227 "execution_count": null,
228 "metadata": {
229 "collapsed": false
230 },
231 "outputs": [],
232 "source": [
d4841a3f
BA
233 "e_nn_exo = computeError(data, p_nn_exo, H)\n",
234 "e_nn_mix = computeError(data, p_nn_mix, H)\n",
235 "e_az = computeError(data, p_az, H)\n",
236 "e_pz = computeError(data, p_pz, H)\n",
ff5df8e3
BA
237 "options(repr.plot.width=9, repr.plot.height=7)\n",
238 "plotError(list(e_nn_mix, e_pz, e_az, e_nn_exo), cols=c(1,2,colors()[258], 4))\n",
239 "\n",
240 "# Noir: neighbors_mix, bleu: neighbors_exo, vert: moyenne, rouge: persistence\n",
241 "\n",
242 "i_np = which.min(e_nn_exo$abs$indices)\n",
243 "i_p = which.max(e_nn_exo$abs$indices)"
244 ]
245 },
246 {
247 "cell_type": "code",
248 "execution_count": null,
249 "metadata": {
250 "collapsed": false
251 },
252 "outputs": [],
253 "source": [
254 "options(repr.plot.width=9, repr.plot.height=4)\n",
255 "par(mfrow=c(1,2))\n",
256 "\n",
257 "plotPredReal(data, p_nn_exo, i_np); title(paste(\"PredReal nn exo day\",i_np))\n",
258 "plotPredReal(data, p_nn_exo, i_p); title(paste(\"PredReal nn exo day\",i_p))\n",
259 "\n",
260 "plotPredReal(data, p_nn_mix, i_np); title(paste(\"PredReal nn mix day\",i_np))\n",
261 "plotPredReal(data, p_nn_mix, i_p); title(paste(\"PredReal nn mix day\",i_p))\n",
262 "\n",
263 "plotPredReal(data, p_az, i_np); title(paste(\"PredReal az day\",i_np))\n",
264 "plotPredReal(data, p_az, i_p); title(paste(\"PredReal az day\",i_p))\n",
265 "\n",
266 "# Bleu: prévue, noir: réalisée"
267 ]
268 },
269 {
270 "cell_type": "code",
271 "execution_count": null,
272 "metadata": {
273 "collapsed": false
274 },
275 "outputs": [],
276 "source": [
277 "par(mfrow=c(1,2))\n",
278 "f_np_exo = computeFilaments(data, p_nn_exo, i_np, plot=TRUE); title(paste(\"Filaments nn exo day\",i_np))\n",
279 "f_p_exo = computeFilaments(data, p_nn_exo, i_p, plot=TRUE); title(paste(\"Filaments nn exo day\",i_p))\n",
280 "\n",
281 "f_np_mix = computeFilaments(data, p_nn_mix, i_np, plot=TRUE); title(paste(\"Filaments nn mix day\",i_np))\n",
282 "f_p_mix = computeFilaments(data, p_nn_mix, i_p, plot=TRUE); title(paste(\"Filaments nn mix day\",i_p))"
283 ]
284 },
285 {
286 "cell_type": "code",
287 "execution_count": null,
288 "metadata": {
289 "collapsed": false
290 },
291 "outputs": [],
292 "source": [
293 "par(mfrow=c(1,2))\n",
294 "plotFilamentsBox(data, f_np_exo); title(paste(\"FilBox nn exo day\",i_np))\n",
295 "plotFilamentsBox(data, f_p_exo); title(paste(\"FilBox nn exo day\",i_p))\n",
296 "\n",
297 "plotFilamentsBox(data, f_np_mix); title(paste(\"FilBox nn mix day\",i_np))\n",
298 "plotFilamentsBox(data, f_p_mix); title(paste(\"FilBox nn mix day\",i_p))"
299 ]
300 },
301 {
302 "cell_type": "code",
303 "execution_count": null,
304 "metadata": {
305 "collapsed": false
306 },
307 "outputs": [],
308 "source": [
309 "par(mfrow=c(1,2))\n",
310 "plotRelVar(data, f_np_exo); title(paste(\"StdDev nn exo day\",i_np))\n",
311 "plotRelVar(data, f_p_exo); title(paste(\"StdDev nn exo day\",i_p))\n",
312 "\n",
313 "plotRelVar(data, f_np_mix); title(paste(\"StdDev nn mix day\",i_np))\n",
314 "plotRelVar(data, f_p_mix); title(paste(\"StdDev nn mix day\",i_p))\n",
315 "\n",
316 "# Variabilité globale en rouge ; sur les 60 voisins (+ lendemains) en noir"
317 ]
318 },
319 {
320 "cell_type": "code",
321 "execution_count": null,
322 "metadata": {
323 "collapsed": false
324 },
325 "outputs": [],
326 "source": [
327 "par(mfrow=c(1,2))\n",
328 "plotSimils(p_nn_exo, i_np); title(paste(\"Weights nn exo day\",i_np))\n",
329 "plotSimils(p_nn_exo, i_p); title(paste(\"Weights nn exo day\",i_p))\n",
330 "\n",
331 "plotSimils(p_nn_mix, i_np); title(paste(\"Weights nn mix day\",i_np))\n",
d4841a3f 332 "plotSimils(p_nn_mix, i_p); title(paste(\"Weights nn mix day\",i_p))\n",
ff5df8e3
BA
333 "\n",
334 "# - pollué à gauche, + pollué à droite"
335 ]
336 },
337 {
338 "cell_type": "code",
339 "execution_count": null,
340 "metadata": {
341 "collapsed": false
342 },
343 "outputs": [],
344 "source": [
345 "# Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
346 "p_nn_exo$getParams(i_np)$window\n",
347 "p_nn_exo$getParams(i_p)$window\n",
348 "\n",
349 "p_nn_mix$getParams(i_np)$window\n",
350 "p_nn_mix$getParams(i_p)$window"
351 ]
352 },
353 {
354 "cell_type": "markdown",
355 "metadata": {},
356 "source": [
357 "\n",
358 "\n",
359 "<h2 style=\"color:blue;font-size:2em\">Semaine non polluée</h2>"
360 ]
361 },
362 {
363 "cell_type": "code",
364 "execution_count": null,
365 "metadata": {
366 "collapsed": false
367 },
368 "outputs": [],
369 "source": [
370 "p_nn_exo = computeForecast(data, indices_np, \"Neighbors\", \"Neighbors\", simtype=\"exo\", horizon=H)\n",
371 "p_nn_mix = computeForecast(data, indices_np, \"Neighbors\", \"Neighbors\", simtype=\"mix\", horizon=H)\n",
372 "p_az = computeForecast(data, indices_np, \"Average\", \"Zero\", horizon=H) #, memory=183)\n",
d4841a3f 373 "p_pz = computeForecast(data, indices_np, \"Persistence\", \"Zero\", horizon=H, same_day=FALSE)"
ff5df8e3
BA
374 ]
375 },
376 {
377 "cell_type": "code",
378 "execution_count": null,
379 "metadata": {
380 "collapsed": false
381 },
382 "outputs": [],
383 "source": [
d4841a3f
BA
384 "e_nn_exo = computeError(data, p_nn_exo, H)\n",
385 "e_nn_mix = computeError(data, p_nn_mix, H)\n",
386 "e_az = computeError(data, p_az, H)\n",
387 "e_pz = computeError(data, p_pz, H)\n",
ff5df8e3
BA
388 "options(repr.plot.width=9, repr.plot.height=7)\n",
389 "plotError(list(e_nn_mix, e_pz, e_az, e_nn_exo), cols=c(1,2,colors()[258], 4))\n",
390 "\n",
391 "# Noir: neighbors_mix, bleu: neighbors_exo, vert: moyenne, rouge: persistence\n",
392 "\n",
393 "i_np = which.min(e_nn_exo$abs$indices)\n",
394 "i_p = which.max(e_nn_exo$abs$indices)"
395 ]
396 },
397 {
398 "cell_type": "code",
399 "execution_count": null,
400 "metadata": {
401 "collapsed": false
402 },
403 "outputs": [],
404 "source": [
405 "options(repr.plot.width=9, repr.plot.height=4)\n",
406 "par(mfrow=c(1,2))\n",
407 "\n",
408 "plotPredReal(data, p_nn_exo, i_np); title(paste(\"PredReal nn exo day\",i_np))\n",
409 "plotPredReal(data, p_nn_exo, i_p); title(paste(\"PredReal nn exo day\",i_p))\n",
410 "\n",
411 "plotPredReal(data, p_nn_mix, i_np); title(paste(\"PredReal nn mix day\",i_np))\n",
412 "plotPredReal(data, p_nn_mix, i_p); title(paste(\"PredReal nn mix day\",i_p))\n",
413 "\n",
414 "plotPredReal(data, p_az, i_np); title(paste(\"PredReal az day\",i_np))\n",
415 "plotPredReal(data, p_az, i_p); title(paste(\"PredReal az day\",i_p))\n",
416 "\n",
417 "# Bleu: prévue, noir: réalisée"
418 ]
419 },
420 {
421 "cell_type": "code",
422 "execution_count": null,
423 "metadata": {
424 "collapsed": false
425 },
426 "outputs": [],
427 "source": [
428 "par(mfrow=c(1,2))\n",
429 "f_np_exo = computeFilaments(data, p_nn_exo, i_np, plot=TRUE); title(paste(\"Filaments nn exo day\",i_np))\n",
430 "f_p_exo = computeFilaments(data, p_nn_exo, i_p, plot=TRUE); title(paste(\"Filaments nn exo day\",i_p))\n",
431 "\n",
432 "f_np_mix = computeFilaments(data, p_nn_mix, i_np, plot=TRUE); title(paste(\"Filaments nn mix day\",i_np))\n",
433 "f_p_mix = computeFilaments(data, p_nn_mix, i_p, plot=TRUE); title(paste(\"Filaments nn mix day\",i_p))"
434 ]
435 },
436 {
437 "cell_type": "code",
438 "execution_count": null,
439 "metadata": {
440 "collapsed": false
441 },
442 "outputs": [],
443 "source": [
444 "par(mfrow=c(1,2))\n",
445 "plotFilamentsBox(data, f_np_exo); title(paste(\"FilBox nn exo day\",i_np))\n",
446 "plotFilamentsBox(data, f_p_exo); title(paste(\"FilBox nn exo day\",i_p))\n",
447 "\n",
448 "plotFilamentsBox(data, f_np_mix); title(paste(\"FilBox nn mix day\",i_np))\n",
449 "plotFilamentsBox(data, f_p_mix); title(paste(\"FilBox nn mix day\",i_p))"
450 ]
451 },
452 {
453 "cell_type": "code",
454 "execution_count": null,
455 "metadata": {
456 "collapsed": false
457 },
458 "outputs": [],
459 "source": [
460 "par(mfrow=c(1,2))\n",
461 "plotRelVar(data, f_np_exo); title(paste(\"StdDev nn exo day\",i_np))\n",
462 "plotRelVar(data, f_p_exo); title(paste(\"StdDev nn exo day\",i_p))\n",
463 "\n",
464 "plotRelVar(data, f_np_mix); title(paste(\"StdDev nn mix day\",i_np))\n",
465 "plotRelVar(data, f_p_mix); title(paste(\"StdDev nn mix day\",i_p))\n",
466 "\n",
467 "# Variabilité globale en rouge ; sur les 60 voisins (+ lendemains) en noir"
468 ]
469 },
470 {
471 "cell_type": "code",
472 "execution_count": null,
473 "metadata": {
474 "collapsed": false
475 },
476 "outputs": [],
477 "source": [
478 "par(mfrow=c(1,2))\n",
479 "plotSimils(p_nn_exo, i_np); title(paste(\"Weights nn exo day\",i_np))\n",
480 "plotSimils(p_nn_exo, i_p); title(paste(\"Weights nn exo day\",i_p))\n",
481 "\n",
482 "plotSimils(p_nn_mix, i_np); title(paste(\"Weights nn mix day\",i_np))\n",
d4841a3f 483 "plotSimils(p_nn_mix, i_p); title(paste(\"Weights nn mix day\",i_p))\n",
ff5df8e3
BA
484 "\n",
485 "# - pollué à gauche, + pollué à droite"
486 ]
487 },
488 {
489 "cell_type": "code",
490 "execution_count": null,
491 "metadata": {
492 "collapsed": false
493 },
494 "outputs": [],
495 "source": [
496 "# Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
497 "p_nn_exo$getParams(i_np)$window\n",
498 "p_nn_exo$getParams(i_p)$window\n",
499 "\n",
500 "p_nn_mix$getParams(i_np)$window\n",
501 "p_nn_mix$getParams(i_p)$window"
502 ]
503 },
504 {
505 "cell_type": "markdown",
506 "metadata": {},
507 "source": [
508 "\n",
509 "\n",
510 "<h2>Bilan</h2>\n",
511 "\n",
512 "Problème difficile : on ne fait guère mieux qu'une naïve moyenne des lendemains des jours\n",
513 "similaires dans le passé, ce qui n'est pas loin de prédire une série constante égale à la\n",
514 "dernière valeur observée (méthode \"zéro\"). La persistence donne parfois de bons résultats\n",
515 "mais est trop instable (sensibilité à l'argument <code>same_day</code>).\n",
516 "\n",
517 "Comment améliorer la méthode ?"
518 ]
519 }
520 ],
521 "metadata": {
522 "kernelspec": {
523 "display_name": "R",
524 "language": "R",
525 "name": "ir"
526 },
527 "language_info": {
528 "codemirror_mode": "r",
529 "file_extension": ".r",
530 "mimetype": "text/x-r-source",
531 "name": "R",
532 "pygments_lexer": "r",
533 "version": "3.3.3"
534 }
535 },
536 "nbformat": 4,
537 "nbformat_minor": 2
538}