name instead of year; ipynb generator debugged, with logging
[talweg.git] / reports / report_2017-03-01.13h.ipynb
CommitLineData
fa8078f9
BA
1{
2 "cells": [
3 {
4 "cell_type": "code",
1e20780e 5 "execution_count": null,
fa8078f9
BA
6 "metadata": {
7 "collapsed": false
8 },
9 "outputs": [],
10 "source": [
11 "library(talweg)"
12 ]
13 },
14 {
15 "cell_type": "code",
1e20780e 16 "execution_count": null,
fa8078f9
BA
17 "metadata": {
18 "collapsed": false
19 },
20 "outputs": [],
21 "source": [
6d50a76f 22 "ts_data = read.csv(system.file(\"extdata\",\"pm10_mesures_H_loc_report.csv\",package=\"talweg\"))\n",
99f83c9a
BA
23 "exo_data = read.csv(system.file(\"extdata\",\"meteo_extra_noNAs.csv\",package=\"talweg\"))\n",
24 "data = getData(ts_data, exo_data, input_tz = \"Europe/Paris\", working_tz=\"Europe/Paris\", predict_at=13)"
fa8078f9
BA
25 ]
26 },
27 {
28 "cell_type": "markdown",
29 "metadata": {},
30 "source": [
31 "## Introduction\n",
32 "\n",
33 "J'ai fait quelques essais dans différentes configurations pour la méthode \"Neighbors\" (la seule dont on a parlé).<br>Il semble que le mieux soit\n",
34 "\n",
35 " * simtype=\"mix\" : on utilise les similarités endogènes et exogènes (fenêtre optimisée par VC)\n",
36 " * same_season=FALSE : les indices pour la validation croisée ne tiennent pas compte des saisons\n",
37 " * mix_strategy=\"mult\" : on multiplie les poids (au lieu d'en éteindre)\n",
38 "\n",
99f83c9a
BA
39 "(valeurs par défaut).\n",
40 "\n",
fa8078f9
BA
41 "J'ai systématiquement comparé à deux autres approches : la persistence et la moyennes des lendemains des jours \"similaires\" dans tout le passé ; à chaque fois sans prédiction du saut (sauf pour Neighbors : prédiction basée sur les poids calculés).\n",
42 "\n",
43 "Ensuite j'affiche les erreurs, quelques courbes prévues/mesurées, quelques filaments puis les histogrammes de quelques poids. Concernant les graphes de filaments, la moitié gauche du graphe correspond aux jours similaires au jour courant, tandis que la moitié droite affiche les lendemains : ce sont donc les voisinages tels qu'utilisés dans l'algorithme.\n",
44 "\n",
45 "<h2 style=\"color:blue;font-size:2em\">Pollution par chauffage</h2>"
46 ]
47 },
48 {
49 "cell_type": "code",
1e20780e 50 "execution_count": null,
fa8078f9
BA
51 "metadata": {
52 "collapsed": false
53 },
54 "outputs": [],
55 "source": [
99f83c9a 56 "indices_ch = seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\")\n",
6d50a76f 57 "p_ch_nn = computeForecast(data, indices_ch, \"Neighbors\", \"Neighbors\", simtype=\"mix\")\n",
99f83c9a
BA
58 "p_ch_pz = computeForecast(data, indices_ch, \"Persistence\", \"Zero\", same_day=TRUE)\n",
59 "p_ch_az = computeForecast(data, indices_ch, \"Average\", \"Zero\") #, memory=183)\n",
60 "#p_ch_zz = computeForecast(data, indices_ch, \"Zero\", \"Zero\")"
fa8078f9
BA
61 ]
62 },
63 {
64 "cell_type": "code",
1e20780e 65 "execution_count": null,
fa8078f9
BA
66 "metadata": {
67 "collapsed": false
68 },
1e20780e 69 "outputs": [],
fa8078f9 70 "source": [
99f83c9a
BA
71 "e_ch_nn = computeError(data, p_ch_nn)\n",
72 "e_ch_pz = computeError(data, p_ch_pz)\n",
73 "e_ch_az = computeError(data, p_ch_az)\n",
74 "#e_ch_zz = computeError(data, p_ch_zz)\n",
75 "options(repr.plot.width=9, repr.plot.height=7)\n",
fa8078f9
BA
76 "plotError(list(e_ch_nn, e_ch_pz, e_ch_az), cols=c(1,2,colors()[258]))\n",
77 "\n",
78 "#Noir: neighbors, rouge: persistence, vert: moyenne"
79 ]
80 },
fa8078f9
BA
81 {
82 "cell_type": "code",
1e20780e 83 "execution_count": null,
fa8078f9
BA
84 "metadata": {
85 "collapsed": false
86 },
1e20780e 87 "outputs": [],
fa8078f9
BA
88 "source": [
89 "par(mfrow=c(1,2))\n",
90 "options(repr.plot.width=9, repr.plot.height=4)\n",
91 "plotPredReal(data, p_ch_nn, 3)\n",
92 "plotPredReal(data, p_ch_nn, 4)\n",
93 "\n",
94 "#Bleu: prévue, noir: réalisée"
95 ]
96 },
fa8078f9
BA
97 {
98 "cell_type": "code",
1e20780e 99 "execution_count": null,
fa8078f9
BA
100 "metadata": {
101 "collapsed": false
102 },
1e20780e 103 "outputs": [],
fa8078f9
BA
104 "source": [
105 "par(mfrow=c(1,2))\n",
106 "plotPredReal(data, p_ch_az, 3)\n",
107 "plotPredReal(data, p_ch_az, 4)"
108 ]
109 },
fa8078f9
BA
110 {
111 "cell_type": "code",
1e20780e 112 "execution_count": null,
fa8078f9
BA
113 "metadata": {
114 "collapsed": false
115 },
1e20780e 116 "outputs": [],
fa8078f9
BA
117 "source": [
118 "par(mfrow=c(1,2))\n",
99f83c9a
BA
119 "f3_ch = computeFilaments(data, p_ch_nn$getIndexInData(3), plot=TRUE)\n",
120 "f4_ch = computeFilaments(data, p_ch_nn$getIndexInData(4), plot=TRUE)"
fa8078f9
BA
121 ]
122 },
123 {
124 "cell_type": "code",
1e20780e 125 "execution_count": null,
fa8078f9
BA
126 "metadata": {
127 "collapsed": false
128 },
1e20780e 129 "outputs": [],
fa8078f9 130 "source": [
6d50a76f
BA
131 "par(mfrow=c(1,2))\n",
132 "plotFilamentsBox(data, f3_ch)\n",
133 "plotFilamentsBox(data, f4_ch)\n",
99f83c9a 134 "\n",
6d50a76f 135 "#À gauche : jour 3 + lendemain (4) ; à droite : jour 4 + lendemain (5)"
fa8078f9
BA
136 ]
137 },
138 {
139 "cell_type": "code",
140 "execution_count": null,
141 "metadata": {
142 "collapsed": false
143 },
144 "outputs": [],
145 "source": [
146 "par(mfrow=c(1,2))\n",
6d50a76f
BA
147 "plotRelVar(data, f3_ch)\n",
148 "plotRelVar(data, f4_ch)\n",
fa8078f9 149 "\n",
6d50a76f 150 "#Variabilité globale en rouge ; sur nos 60 voisins (+ lendemains) en noir"
fa8078f9
BA
151 ]
152 },
153 {
154 "cell_type": "code",
155 "execution_count": null,
156 "metadata": {
157 "collapsed": false
158 },
159 "outputs": [],
160 "source": [
161 "par(mfrow=c(1,2))\n",
99f83c9a
BA
162 "plotSimils(p_ch_nn, 3)\n",
163 "plotSimils(p_ch_nn, 4)\n",
164 "\n",
165 "#Non pollué à gauche, pollué à droite"
fa8078f9
BA
166 ]
167 },
fa8078f9
BA
168 {
169 "cell_type": "code",
170 "execution_count": null,
171 "metadata": {
172 "collapsed": false
173 },
174 "outputs": [],
175 "source": [
99f83c9a
BA
176 "#Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
177 "p_ch_nn$getParams(3)$window\n",
178 "p_ch_nn$getParams(4)$window"
179 ]
180 },
181 {
182 "cell_type": "markdown",
183 "metadata": {},
184 "source": [
185 "<h2 style=\"color:blue;font-size:2em\">Pollution par épandage</h2>"
fa8078f9
BA
186 ]
187 },
188 {
189 "cell_type": "code",
190 "execution_count": null,
191 "metadata": {
192 "collapsed": false
193 },
194 "outputs": [],
195 "source": [
99f83c9a
BA
196 "indices_ep = seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\")\n",
197 "p_ep_nn = computeForecast(data,indices_ep, \"Neighbors\", \"Neighbors\", simtype=\"mix\")\n",
198 "p_ep_pz = computeForecast(data, indices_ep, \"Persistence\", \"Zero\", same_day=TRUE)\n",
199 "p_ep_az = computeForecast(data, indices_ep, \"Average\", \"Zero\") #, memory=183)\n",
200 "#p_ep_zz = computeForecast(data, indices_ep, \"Zero\", \"Zero\")"
fa8078f9
BA
201 ]
202 },
203 {
204 "cell_type": "code",
205 "execution_count": null,
206 "metadata": {
207 "collapsed": false
208 },
209 "outputs": [],
210 "source": [
99f83c9a
BA
211 "e_ep_nn = computeError(data, p_ep_nn)\n",
212 "e_ep_pz = computeError(data, p_ep_pz)\n",
213 "e_ep_az = computeError(data, p_ep_az)\n",
214 "#e_ep_zz = computeError(data, p_ep_zz)\n",
215 "options(repr.plot.width=9, repr.plot.height=7)\n",
fa8078f9
BA
216 "plotError(list(e_ep_nn, e_ep_pz, e_ep_az), cols=c(1,2,colors()[258]))\n",
217 "\n",
218 "#Noir: neighbors, rouge: persistence, vert: moyenne"
219 ]
220 },
fa8078f9
BA
221 {
222 "cell_type": "code",
223 "execution_count": null,
224 "metadata": {
225 "collapsed": false
226 },
227 "outputs": [],
228 "source": [
229 "par(mfrow=c(1,2))\n",
230 "options(repr.plot.width=9, repr.plot.height=4)\n",
99f83c9a
BA
231 "plotPredReal(data, p_ep_nn, 6)\n",
232 "plotPredReal(data, p_ep_nn, 3)\n",
fa8078f9
BA
233 "\n",
234 "#Bleu: prévue, noir: réalisée"
235 ]
236 },
fa8078f9
BA
237 {
238 "cell_type": "code",
239 "execution_count": null,
240 "metadata": {
241 "collapsed": false
242 },
243 "outputs": [],
244 "source": [
245 "par(mfrow=c(1,2))\n",
99f83c9a
BA
246 "plotPredReal(data, p_ep_az, 6)\n",
247 "plotPredReal(data, p_ep_az, 3)"
fa8078f9
BA
248 ]
249 },
fa8078f9
BA
250 {
251 "cell_type": "code",
252 "execution_count": null,
253 "metadata": {
254 "collapsed": false
255 },
256 "outputs": [],
257 "source": [
258 "par(mfrow=c(1,2))\n",
99f83c9a
BA
259 "f6_ep = computeFilaments(data, p_ep_nn$getIndexInData(6), plot=TRUE)\n",
260 "f3_ep = computeFilaments(data, p_ep_nn$getIndexInData(3), plot=TRUE)"
261 ]
262 },
263 {
264 "cell_type": "code",
265 "execution_count": null,
266 "metadata": {
267 "collapsed": false
268 },
269 "outputs": [],
270 "source": [
6d50a76f
BA
271 "par(mfrow=c(1,2))\n",
272 "plotFilamentsBox(data, f6_ep)\n",
273 "plotFilamentsBox(data, f3_ep)\n",
99f83c9a 274 "\n",
6d50a76f 275 "#À gauche : jour 6 + lendemain (7) ; à droite : jour 3 + lendemain (4)"
fa8078f9
BA
276 ]
277 },
278 {
279 "cell_type": "code",
280 "execution_count": null,
281 "metadata": {
282 "collapsed": false
283 },
284 "outputs": [],
285 "source": [
286 "par(mfrow=c(1,2))\n",
6d50a76f
BA
287 "plotRelVar(data, f6_ep)\n",
288 "plotRelVar(data, f3_ep)\n",
99f83c9a 289 "\n",
6d50a76f 290 "#Variabilité globale en rouge ; sur nos 60 voisins (+ lendemains) en noir"
99f83c9a
BA
291 ]
292 },
293 {
294 "cell_type": "code",
295 "execution_count": null,
296 "metadata": {
297 "collapsed": false
298 },
299 "outputs": [],
300 "source": [
301 "par(mfrow=c(1,2))\n",
302 "plotSimils(p_ep_nn, 6)\n",
303 "plotSimils(p_ep_nn, 3)"
fa8078f9
BA
304 ]
305 },
99f83c9a
BA
306 {
307 "cell_type": "code",
308 "execution_count": null,
309 "metadata": {
310 "collapsed": false
311 },
312 "outputs": [],
313 "source": [
314 "#Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
315 "p_ep_nn$getParams(6)$window\n",
316 "p_ep_nn$getParams(3)$window"
317 ]
318 },
fa8078f9
BA
319 {
320 "cell_type": "markdown",
321 "metadata": {},
322 "source": [
99f83c9a 323 "<h2 style=\"color:blue;font-size:2em\">Semaine non polluée</h2>"
fa8078f9
BA
324 ]
325 },
326 {
327 "cell_type": "code",
328 "execution_count": null,
329 "metadata": {
330 "collapsed": false
331 },
332 "outputs": [],
333 "source": [
99f83c9a
BA
334 "indices_np = seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\")\n",
335 "p_np_nn = computeForecast(data,indices_np, \"Neighbors\", \"Neighbors\", simtype=\"mix\")\n",
336 "p_np_pz = computeForecast(data, indices_np, \"Persistence\", \"Zero\", same_day=FALSE)\n",
337 "p_np_az = computeForecast(data, indices_np, \"Average\", \"Zero\") #, memory=183)\n",
338 "#p_np_zz = computeForecast(data, indices_np, \"Zero\", \"Zero\")"
fa8078f9
BA
339 ]
340 },
341 {
342 "cell_type": "code",
343 "execution_count": null,
344 "metadata": {
345 "collapsed": false
346 },
347 "outputs": [],
348 "source": [
99f83c9a
BA
349 "e_np_nn = computeError(data, p_np_nn)\n",
350 "e_np_pz = computeError(data, p_np_pz)\n",
351 "e_np_az = computeError(data, p_np_az)\n",
352 "#e_np_zz = computeError(data, p_np_zz)\n",
353 "options(repr.plot.width=9, repr.plot.height=7)\n",
fa8078f9
BA
354 "plotError(list(e_np_nn, e_np_pz, e_np_az), cols=c(1,2,colors()[258]))\n",
355 "\n",
356 "#Noir: neighbors, rouge: persistence, vert: moyenne"
357 ]
358 },
fa8078f9
BA
359 {
360 "cell_type": "code",
361 "execution_count": null,
362 "metadata": {
363 "collapsed": false
364 },
365 "outputs": [],
366 "source": [
367 "par(mfrow=c(1,2))\n",
368 "options(repr.plot.width=9, repr.plot.height=4)\n",
99f83c9a 369 "plotPredReal(data, p_np_nn, 5)\n",
fa8078f9 370 "plotPredReal(data, p_np_nn, 3)\n",
fa8078f9
BA
371 "\n",
372 "#Bleu: prévue, noir: réalisée"
373 ]
374 },
375 {
99f83c9a
BA
376 "cell_type": "code",
377 "execution_count": null,
378 "metadata": {
379 "collapsed": false
380 },
381 "outputs": [],
fa8078f9 382 "source": [
99f83c9a
BA
383 "par(mfrow=c(1,2))\n",
384 "plotPredReal(data, p_np_az, 5)\n",
385 "plotPredReal(data, p_np_az, 3)"
fa8078f9
BA
386 ]
387 },
388 {
389 "cell_type": "code",
390 "execution_count": null,
391 "metadata": {
392 "collapsed": false
393 },
394 "outputs": [],
395 "source": [
396 "par(mfrow=c(1,2))\n",
99f83c9a
BA
397 "f5_np = computeFilaments(data, p_np_nn$getIndexInData(5), plot=TRUE)\n",
398 "f3_np = computeFilaments(data, p_np_nn$getIndexInData(3), plot=TRUE)"
fa8078f9
BA
399 ]
400 },
99f83c9a
BA
401 {
402 "cell_type": "code",
403 "execution_count": null,
404 "metadata": {
405 "collapsed": false
406 },
407 "outputs": [],
408 "source": [
6d50a76f
BA
409 "par(mfrow=c(1,2))\n",
410 "plotFilamentsBox(data, f5_np)\n",
411 "plotFilamentsBox(data, f3_np)\n",
99f83c9a 412 "\n",
6d50a76f 413 "#À gauche : jour 5 + lendemain (6) ; à droite : jour 3 + lendemain (4)"
fa8078f9
BA
414 ]
415 },
416 {
417 "cell_type": "code",
418 "execution_count": null,
419 "metadata": {
420 "collapsed": false
421 },
422 "outputs": [],
423 "source": [
424 "par(mfrow=c(1,2))\n",
6d50a76f
BA
425 "plotRelVar(data, f5_np)\n",
426 "plotRelVar(data, f3_np)\n",
99f83c9a 427 "\n",
6d50a76f 428 "#Variabilité globale en rouge ; sur nos 60 voisins (+ lendemains) en noir"
fa8078f9
BA
429 ]
430 },
431 {
432 "cell_type": "code",
433 "execution_count": null,
434 "metadata": {
435 "collapsed": false
436 },
437 "outputs": [],
438 "source": [
99f83c9a
BA
439 "par(mfrow=c(1,2))\n",
440 "plotSimils(p_np_nn, 5)\n",
441 "plotSimils(p_np_nn, 3)"
fa8078f9
BA
442 ]
443 },
99f83c9a
BA
444 {
445 "cell_type": "code",
446 "execution_count": null,
447 "metadata": {
448 "collapsed": false
449 },
450 "outputs": [],
451 "source": [
452 "#Fenêtres sélectionnées dans ]0,10] / endo à gauche, exo à droite\n",
453 "p_np_nn$getParams(5)$window\n",
454 "p_np_nn$getParams(3)$window"
fa8078f9
BA
455 ]
456 },
457 {
458 "cell_type": "markdown",
459 "metadata": {},
460 "source": [
461 "## Bilan\n",
462 "\n",
463 "Problème difficile : on ne fait guère mieux qu'une naïve moyenne des lendemains des jours similaires dans le passé, ce qui n'est pas loin de prédire une série constante égale à la dernière valeur observée (méthode \"zéro\"). La persistence donne parfois de bons résultats mais est trop instable (sensibilité à l'argument <code>same_day</code>).\n",
464 "\n",
465 "Comment améliorer la méthode ?"
466 ]
467 }
468 ],
469 "metadata": {
470 "kernelspec": {
471 "display_name": "R",
472 "language": "R",
473 "name": "ir"
474 },
475 "language_info": {
476 "codemirror_mode": "r",
477 "file_extension": ".r",
478 "mimetype": "text/x-r-source",
479 "name": "R",
480 "pygments_lexer": "r",
63ff1ecb 481 "version": "3.3.3"
fa8078f9
BA
482 }
483 },
484 "nbformat": 4,
485 "nbformat_minor": 2
486}