new version, persistence -7 days
[talweg.git] / reports / report_2017-03-01.ipynb
CommitLineData
09cf9c19
BA
1{
2 "cells": [
3 {
4 "cell_type": "code",
5 "execution_count": null,
6 "metadata": {
7 "collapsed": false
8 },
9 "outputs": [],
10 "source": [
11 "library(talweg)"
12 ]
13 },
14 {
15 "cell_type": "code",
16 "execution_count": null,
17 "metadata": {
18 "collapsed": false
19 },
20 "outputs": [],
21 "source": [
22 "data = getData(ts_data=\"../data/pm10_mesures_H_loc.csv\", exo_data=\"../data/meteo_extra_noNAs.csv\",\n",
23 " input_tz = \"Europe/Paris\", working_tz=\"Europe/Paris\", predict_at=7)"
24 ]
25 },
26 {
27 "cell_type": "markdown",
28 "metadata": {},
29 "source": [
56999439
BA
30 "## Introduction\n",
31 "\n",
32 "J'ai fait quelques essais dans différentes configurations pour la méthode \"Neighbors\" (la seule dont on a parlé).<br>Il semble que le mieux soit\n",
33 "\n",
34 " * simtype=\"mix\" : on utilise les similarités endogènes et exogènes (fenêtre optimisée par VC)\n",
35 " * same_season=FALSE : les indices pour la validation croisée ne tiennent pas compte des saisons\n",
36 " * mix_strategy=\"mult\" : on multiplie les poids (au lieu d'en éteindre)\n",
37 "\n",
38 "J'ai systématiquement comparé à deux autres approches : la persistence et la moyenne de tous les futurs des jours similaires du passé ; à chaque fois sans prédiction du saut (sauf pour Neighbors : prédiction basée sur les poids calculés).\n",
39 "\n",
40 "Ensuite j'affiche les erreurs, quelques courbes prévues/mesurées, quelques filaments puis les histogrammes de quelques poids. Concernant les graphes de filaments, la moitié gauche du graphe correspond aux jours similaires au jour courant, tandis que la moitié droite affiche les lendemains : ce sont donc les voisinages tels qu'utilisés dans l'algorithme.\n",
41 "\n",
42 "<h2 style=\"color:blue;font-size:2em\">Pollution par chauffage</h2>"
09cf9c19
BA
43 ]
44 },
45 {
46 "cell_type": "code",
47 "execution_count": null,
48 "metadata": {
49 "collapsed": false
50 },
51 "outputs": [],
52 "source": [
e030a6e3
BA
53 "indices = seq(as.Date(\"2015-01-18\"),as.Date(\"2015-01-24\"),\"days\")\n",
54 "p_ch_nn = getForecast(data,indices,\"Neighbors\",\"Neighbors\",simtype=\"mix\",same_season=FALSE,mix_strategy=\"mult\")\n",
55 "p_ch_pz = getForecast(data, indices, \"Persistence\", \"Zero\")\n",
56 "p_ch_az = getForecast(data, indices, \"Average\", \"Zero\")\n",
57 "p_ch_zz = getForecast(data, indices, \"Zero\", \"Zero\")"
09cf9c19
BA
58 ]
59 },
60 {
61 "cell_type": "code",
62 "execution_count": null,
63 "metadata": {
64 "collapsed": false
65 },
66 "outputs": [],
67 "source": [
e030a6e3
BA
68 "e_ch_nn = getError(data, p_ch_nn)\n",
69 "e_ch_pz = getError(data, p_ch_pz)\n",
70 "e_ch_az = getError(data, p_ch_az)\n",
09cf9c19 71 "options(repr.plot.width=9, repr.plot.height=6)\n",
56999439
BA
72 "plotError(list(e_ch_nn, e_ch_pz, e_ch_az), cols=c(1,2,colors()[258]))\n",
73 "\n",
74 "#Noir: neighbors, rouge: persistence, vert: moyenne"
75 ]
76 },
77 {
78 "cell_type": "markdown",
79 "metadata": {},
80 "source": [
81 "La méthode Neighbors fait assez nettement mieux qu'une simple moyenne dans ce cas."
09cf9c19
BA
82 ]
83 },
84 {
85 "cell_type": "code",
86 "execution_count": null,
87 "metadata": {
88 "collapsed": false
89 },
90 "outputs": [],
91 "source": [
92 "par(mfrow=c(1,2))\n",
93 "options(repr.plot.width=9, repr.plot.height=4)\n",
94 "plotPredReal(data, p_ch_nn, 3)\n",
56999439
BA
95 "plotPredReal(data, p_ch_nn, 4)\n",
96 "\n",
97 "#Bleu: prévue, noir: réalisée"
98 ]
99 },
100 {
101 "cell_type": "markdown",
102 "metadata": {},
103 "source": [
104 "Prédictions d'autant plus lisses que le jour à prévoir est atypique (pollué)."
09cf9c19
BA
105 ]
106 },
107 {
108 "cell_type": "code",
109 "execution_count": null,
110 "metadata": {
111 "collapsed": false
112 },
113 "outputs": [],
114 "source": [
115 "par(mfrow=c(1,2))\n",
116 "plotFilaments(data, p_ch_nn$getIndexInData(3))\n",
117 "plotFilaments(data, p_ch_nn$getIndexInData(4))"
118 ]
119 },
56999439
BA
120 {
121 "cell_type": "markdown",
122 "metadata": {},
123 "source": [
124 "Beaucoup de courbes similaires dans le cas peu pollué, très peu pour un jour pollué."
125 ]
126 },
09cf9c19
BA
127 {
128 "cell_type": "code",
129 "execution_count": null,
130 "metadata": {
131 "collapsed": false
132 },
133 "outputs": [],
134 "source": [
135 "par(mfrow=c(1,3))\n",
136 "plotSimils(p_ch_nn, 3)\n",
137 "plotSimils(p_ch_nn, 4)\n",
56999439
BA
138 "plotSimils(p_ch_nn, 5)\n",
139 "\n",
140 "#Non pollué à gauche, pollué au milieu, autre pollué à droite"
09cf9c19
BA
141 ]
142 },
143 {
144 "cell_type": "markdown",
145 "metadata": {},
146 "source": [
56999439
BA
147 "La plupart des poids très proches de zéro ; pas pour le jour 5 : autre type de jour, cf. ci-dessous."
148 ]
149 },
150 {
151 "cell_type": "code",
152 "execution_count": null,
153 "metadata": {
154 "collapsed": false
155 },
156 "outputs": [],
157 "source": [
158 "par(mfrow=c(1,2))\n",
159 "plotPredReal(data, p_ch_nn, 5)\n",
160 "plotFilaments(data, p_ch_nn$getIndexInData(5))"
161 ]
162 },
163 {
164 "cell_type": "markdown",
165 "metadata": {},
166 "source": [
167 "<h2 style=\"color:blue;font-size:2em\">Pollution par épandage</h2>"
09cf9c19
BA
168 ]
169 },
170 {
171 "cell_type": "code",
172 "execution_count": null,
173 "metadata": {
174 "collapsed": false
175 },
176 "outputs": [],
177 "source": [
e030a6e3
BA
178 "indices = seq(as.Date(\"2015-03-15\"),as.Date(\"2015-03-21\"),\"days\")\n",
179 "p_ep_nn = getForecast(data,indices,\"Neighbors\",\"Neighbors\",simtype=\"mix\",same_season=FALSE,mix_strategy=\"mult\")\n",
180 "p_ep_pz = getForecast(data, indices, \"Persistence\", \"Zero\")\n",
181 "p_ep_az = getForecast(data, indices, \"Average\", \"Zero\")\n",
182 "p_ep_zz = getForecast(data, indices, \"Zero\", \"Zero\")\n",
183 "p_ep_lz = getForecast(data, indices, \"Level\", \"Zero\")"
09cf9c19
BA
184 ]
185 },
186 {
187 "cell_type": "code",
188 "execution_count": null,
189 "metadata": {
190 "collapsed": false
191 },
192 "outputs": [],
193 "source": [
e030a6e3
BA
194 "e_ep_nn = getError(data, p_ep_nn)\n",
195 "e_ep_pz = getError(data, p_ep_pz)\n",
196 "e_ep_az = getError(data, p_ep_az)\n",
09cf9c19 197 "options(repr.plot.width=9, repr.plot.height=6)\n",
56999439
BA
198 "plotError(list(e_ep_nn, e_ep_pz, e_ep_az), cols=c(1,2,colors()[258]))\n",
199 "\n",
200 "#Noir: neighbors, rouge: persistence, vert: moyenne"
201 ]
202 },
203 {
204 "cell_type": "markdown",
205 "metadata": {},
206 "source": [
207 "Cette fois les deux méthodes naïves font en moyenne moins d'erreurs que Neighbors. Prédiction trop difficile ?"
09cf9c19
BA
208 ]
209 },
210 {
211 "cell_type": "code",
212 "execution_count": null,
213 "metadata": {
214 "collapsed": false
215 },
216 "outputs": [],
217 "source": [
218 "par(mfrow=c(1,2))\n",
219 "options(repr.plot.width=9, repr.plot.height=4)\n",
56999439
BA
220 "plotPredReal(data, p_ep_nn, 4)\n",
221 "plotPredReal(data, p_ep_nn, 6)"
222 ]
223 },
224 {
225 "cell_type": "markdown",
226 "metadata": {},
227 "source": [
228 "À gauche un jour \"bien\" prévu, à droite le pic d'erreur (jour 6)."
09cf9c19
BA
229 ]
230 },
231 {
232 "cell_type": "code",
233 "execution_count": null,
234 "metadata": {
235 "collapsed": false
236 },
237 "outputs": [],
238 "source": [
239 "par(mfrow=c(1,2))\n",
56999439
BA
240 "plotFilaments(data, p_ep_nn$getIndexInData(4))\n",
241 "plotFilaments(data, p_ep_nn$getIndexInData(6))"
09cf9c19
BA
242 ]
243 },
244 {
245 "cell_type": "code",
246 "execution_count": null,
247 "metadata": {
248 "collapsed": false
249 },
250 "outputs": [],
251 "source": [
56999439 252 "par(mfrow=c(1,2))\n",
09cf9c19 253 "plotSimils(p_ep_nn, 4)\n",
56999439
BA
254 "plotSimils(p_ep_nn, 6)"
255 ]
256 },
257 {
258 "cell_type": "markdown",
259 "metadata": {},
260 "source": [
261 "Même observation concernant les poids : concentrés près de zéro pour les prédictions avec peu de voisins."
09cf9c19
BA
262 ]
263 },
264 {
265 "cell_type": "markdown",
266 "metadata": {},
267 "source": [
268 "## Semaine non polluée"
269 ]
270 },
271 {
272 "cell_type": "code",
273 "execution_count": null,
274 "metadata": {
275 "collapsed": false
276 },
277 "outputs": [],
278 "source": [
e030a6e3
BA
279 "indices = seq(as.Date(\"2015-04-26\"),as.Date(\"2015-05-02\"),\"days\")\n",
280 "p_np_nn = getForecast(data,indices,\"Neighbors\",\"Neighbors\",simtype=\"mix\",same_season=FALSE,mix_strategy=\"mult\")\n",
281 "p_np_pz = getForecast(data, indices, \"Persistence\", \"Zero\")\n",
282 "p_np_az = getForecast(data, indices, \"Average\", \"Zero\")\n",
283 "p_np_zz = getForecast(data, indices, \"Zero\", \"Zero\")\n",
284 "p_np_lz = getForecast(data, indices, \"Level\", \"Zero\")"
09cf9c19
BA
285 ]
286 },
287 {
288 "cell_type": "code",
289 "execution_count": null,
290 "metadata": {
291 "collapsed": false
292 },
293 "outputs": [],
294 "source": [
e030a6e3
BA
295 "e_np_nn = getError(data, p_np_nn)\n",
296 "e_np_pz = getError(data, p_np_pz)\n",
297 "e_np_az = getError(data, p_np_az)\n",
09cf9c19 298 "options(repr.plot.width=9, repr.plot.height=6)\n",
56999439
BA
299 "plotError(list(e_np_nn, e_np_pz, e_np_az), cols=c(1,2,colors()[258]))\n",
300 "\n",
301 "#Noir: neighbors, rouge: persistence, vert: moyenne"
302 ]
303 },
304 {
305 "cell_type": "markdown",
306 "metadata": {},
307 "source": [
308 "Performances des méthodes \"Average\" et \"Neighbors\" comparables ; mauvais résultats pour la persistence."
09cf9c19
BA
309 ]
310 },
311 {
312 "cell_type": "code",
313 "execution_count": null,
314 "metadata": {
315 "collapsed": false
316 },
317 "outputs": [],
318 "source": [
319 "par(mfrow=c(1,2))\n",
320 "options(repr.plot.width=9, repr.plot.height=4)\n",
321 "plotPredReal(data, p_np_nn, 3)\n",
56999439
BA
322 "plotPredReal(data, p_np_nn, 6)"
323 ]
324 },
325 {
326 "cell_type": "markdown",
327 "metadata": {},
328 "source": [
329 "Les \"bonnes\" prédictions (à gauche) sont tout de même trop lissées."
09cf9c19
BA
330 ]
331 },
332 {
333 "cell_type": "code",
334 "execution_count": null,
335 "metadata": {
336 "collapsed": false
337 },
338 "outputs": [],
339 "source": [
340 "par(mfrow=c(1,2))\n",
341 "plotFilaments(data, p_np_nn$getIndexInData(3))\n",
56999439
BA
342 "plotFilaments(data, p_np_nn$getIndexInData(6))"
343 ]
344 },
345 {
346 "cell_type": "markdown",
347 "metadata": {},
348 "source": [
349 "Jours \"typiques\", donc beaucoup de voisins."
09cf9c19
BA
350 ]
351 },
352 {
353 "cell_type": "code",
354 "execution_count": null,
355 "metadata": {
356 "collapsed": false
357 },
358 "outputs": [],
359 "source": [
360 "par(mfrow=c(1,3))\n",
361 "plotSimils(p_np_nn, 3)\n",
362 "plotSimils(p_np_nn, 4)\n",
56999439
BA
363 "plotSimils(p_np_nn, 6)"
364 ]
365 },
366 {
367 "cell_type": "markdown",
368 "metadata": {},
369 "source": [
370 "Répartition idéale des poids : quelques uns au-delà de 0.3-0.4, le reste très proche de zéro."
371 ]
372 },
373 {
374 "cell_type": "markdown",
375 "metadata": {},
376 "source": [
377 "## Bilan\n",
378 "\n",
379 "Problème difficile : on ne fait guère mieux qu'une naïve moyenne des lendemains des jours similaires dans le passé.\n",
380 "\n",
381 "Comment améliorer la méthode ?"
09cf9c19
BA
382 ]
383 }
384 ],
385 "metadata": {
386 "kernelspec": {
387 "display_name": "R",
388 "language": "R",
389 "name": "ir"
390 },
391 "language_info": {
392 "codemirror_mode": "r",
393 "file_extension": ".r",
394 "mimetype": "text/x-r-source",
395 "name": "R",
396 "pygments_lexer": "r",
397 "version": "3.3.2"
398 }
399 },
400 "nbformat": 4,
401 "nbformat_minor": 2
402}