improve/fix comments - TODO: debug examples, CSV and after
[epclust.git] / contrat / 2016_IRSDIproject_v3.tex
CommitLineData
7709d507
BA
1\documentclass[12pt, a4paper]{article}
2
3\usepackage[margin=2.5cm]{geometry}
4\usepackage[utf8]{inputenc} % in encoding
5\usepackage[T1]{fontenc} % out-encoding f
6\usepackage{eurosym}
7\usepackage{lmodern, microtype} % goes OK with T1 fontenc
8%\usepackage[authoryear, round]{natbib}
9\usepackage{natbib}
10\usepackage{color, tikz, graphicx, subfig}
11\usepackage{amssymb, amsmath, amsthm}
12\usepackage{setspace, lineno, url, xcolor}
13\usepackage{savetrees}
14
15\newcommand{\todo}[1]{\textcolor{blue}{TODO: #1}} % macro for todo entries
16
17% Style options
18\renewcommand\familydefault{\sfdefault} % Use with sans serif font
19\setlength{\bibsep}{0.0pt} % Compact bibliography (natbib)
20
21\title{Disaggregated Electricity Forecasting using Clustering of Individual Consumers \\
22 {\normalsize \color{gray} IRSDI - RESEARCH INITIATIVE IN INDUSTRIAL DATA SCIENCE}}
23
24\author{Benjamin Auder \and
25 Jairo Cugliari \and
26 Yannig Goude \and
27 Jean-Michel Poggi
28}
29\date{\normalsize\today
30\vspace{-1.2\baselineskip}}
31
32
33
34\begin{document}
35\maketitle
36
37%\begin{abstract}
38
39%\end{abstract}
40
41% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
42%
43% S E C T I O N
44%
45\section{Context}
46
47\subsection{Industrial}
48
49Electricity load forecasting is crucial for utilities for production
50planning as well as marketing offers. Recently, the increasing deployment of
51smart grids infrastructure requires the development of more flexible data
52driven forecasting methods adapting quite automatically to new data sets.
53Electricity load forecasting is crucial for utilities for production planning as
54well as marketing offers. New metering infrastructures as smart meters
55provide new and potentially massive informations about individual (household,
56small and medium enterprise) consumption. As an example, in France,
57ERDF (Electricite Reseau Distribution de France the French manager of
58the public electricity distribution network) deployed 250000 smart meters,
59covering a rural and an urban territory and providing half-hourly household
60energy used each day. ERDF plans to install 35 millions of them over the
61French territory by the end of 2020 and exploiting such an amount of data
62is an exciting but challenging task (see \url{http://www.erdf.fr/Linky}).
63We propose to build clustering tools useful for forecasting the load
64consumption. The idea is to disaggregate the global signal in such a way that
65the sum of disaggregated forecasts significantly improves the prediction of the
66whole global signal. The strategy is in three steps: first we cluster curves
67defining super-consumers, then we build a hierarchy of partitions within which
68the best one is finally selected with respect to a disaggregated forecast
69criterion. The proposed strategy is applied to a dataset of individual
70consumers from the French electricity provider EDF. A substantial gain
71of $16$ \% in forecast accuracy comparing to the 1-cluster approach is provided
72by disaggregation while preserving meaningful classes of consumers.
73
74\subsection{Academic}
75
76In the context of economic seasonal univariate continuous time series, it is often
77natural to segment it in time, into consecutive curves, for example days, which
78are then treated as a discrete time series of functions. In particular, in the
79electrical context, the shape of the curves exhibits rich information about the
80calendar day type, the meteorological conditions or the existence of special
81electricity tariffs. Using the information contained in the shape of the load
82curves leads to very elegant formulation of functional forecasting.
83
84
85%Electricity load experts naturally look at daily demand data as time functions
86%called load curves. In a recent paper, \cite{shang2013} uses a functional time
87%series approach for forecasting short-term electricity demand. This paper is
88%illustrated by the half-hourly electricity demand from Monday to Sunday in South
89%Australia. The strategy is also to consider a seasonal univariate time series as
90%a time series of curves, then to reduce the dimensionality of curves by applying
91%a functional principal component analysis and finally, following
92%\cite{shang2011}, the principal component scores are forecasted using a
93%univariate ARIMA models. In addition, since data points in the daily electricity
94%demand are sequentially observed, a forecast updating method based on
95%nonparametric bootstrap approach is proposed to improve the accuracy of point
96%forecasts. With respect to this strategy, the scheme we propose handles the
97%forecasting problem in a functional way avoiding the hour by hour processing and
98%considers a more flexible way to construct the distribution leading to the
99%prediction interval.
100
101The shape of the curves exhibits rich information about the calendar day type,
102the meteorological conditions or the existence of special electricity tariffs.
103Using the information contained in the shape of the load curves, \cite{antoniadis2012prevision} proposed a flexible nonparametric function-valued
104forecast model called KWF (\textit{Kernel + Wavelet + Functional}) well suited
105to handle nonstationary series. The predictor can be seen as a weighted average
106of futures of past situations, where the weights increase with the similarity
107between the past situations and the actual one. In addition, this strategy
108provides with a simultaneous multiple horizon prediction for a global forecast.
109
110However, there is a need for local electricity load forecasting at different levels of the grid.
111Bottom-up approaches, based on a two stage process combining clustering and forecasting
112methods, are a promising perspective. First, it
113consists in building classes in a population such that each class could be
114sufficiently well forecast but corresponds to different load shapes or reacts
115differently to exogenous variables like temperature or prices (see e.g.
116\cite{labeeuw} in the context of demand response). The second stage consists in
117aggregating forecasts to forecast the total or any subtotal of the population
118consumption. For example, identify and forecast the consumption of a
119sub-population reactive to an incentive is an important need to optimize a
120demand response program.
121
122\section{Past work}
123
124Few papers consider the problem of clustering individual consumption for
125forecasting (e.g. \cite{iwafune2014short, Alzate, carevic2010applications, MisitiElec}). Recently, \cite{energycon} proposed to build clustering tools useful for the two tasks simultaneously: clustering individual customers and forecasting the load consumption. The idea is to disaggregate the global signal in such a way that the sum of disaggregated forecasts significantly improves the prediction of the whole global signal. The general strategy is in three steps: first we cluster individual curves defining super-consumers, then we built a hierarchy of partitions within which a best one is finally selected with respect to a disaggregated forecast criterion. The predictions are made with the KWF model which allows one to use it as a off-the-shelve tool.
126
127While this work has ended with an the specification of an algorithm, a current need is a real upscaling proof. A first step on this direction was done in
128\cite{auder2014}.
129
130
131% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
132%
133% S E C T I O N
134%
135\section{Aims}
136
137The method proposed in \cite{energycon} has been successfully tested on a small data set of EDF clients. With the current development of smart meters in France the available volume of individual data is increasing day after day. Then, there is a genuine need of measuring the upscale skills of the existent methods.
138
139This projet's aim is twofold. First, we will evaluate the upscaling capacity of the strategy developed in \cite{energycon} to cope with the upgrowing volume of data. Second, we will study how to adapt the KWF prediction method to take into account an exogeneous variable. In our particular problem the exogeneous variables can be any meteorological measurement that affects the load demand and is available at the moment of the prediction.
140
141
142% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
143%
144% S E C T I O N
145%
146
147\section{Means considered}
148
149\subsection{Methods}
150\paragraph{Clustering analysis.} In general, clustering methods look for groups of individuals on data in such a way that those belonging to the same group are more similar than those from other groups. Many methods exists to cluster data:
151hierarchical, center-based, probabilistic, etc. Almost all of them depends heavily
152on the choice of a similarity measure between individuals. For this challenge we plan
153to compare individuals in terms of their wavelet spectrum signature. Thanks to this strategy, non
154stationary signals may be fairly compared. Moreover, the signals need not to be
155measured on the same temporal grid. However, in order to detect relevant results
156the wavelet signatures should be corrected by exogenous information (e.g. the one
157provided as client characteristics).
158
159\paragraph{Wavelet analysis.} Since the objects to analyze (load curves) can be viewed
160as functions of time, functional data analysis techniques are one possible choice to
161represent these objects. From a stochastic point of view the functions are realizations
162of a non stationary random process. Wavelet transform can be used to extract
163relevant information about the functions both on time and frequency. With an
164appropriate representation of the objects, it is then possible to construct
165a meaningful distance between load curves.
166
167\paragraph{Forecasting with KWF}
168The basic idea of nonparametric forecasting is that similar cases in the past
169have similar future consequences. For example the electricity consumption is
170divided into blocks of one day size. Then, using a dissimilarity measure, the
171blocks similar to the last observed block are searched in the past and a vector
172of weights is built. Finally, the forecast of the next day is obtained by a
173weighted average of the most similar future days using previous vector of
174weights. From the statistical point of view, the model is an estimate of the
175regression function using the kernel method, of the last block against all the
176blocks in the past. In \cite{antoniadis2006functional} this basic model is
177extended to the case of stationary functional random variables. But in the
178context of electrical power demand, the hypothesis of stationarity generally
179fails: an evolving mean level and the existence of groups that may be seen as
180classes of stationarity are to be considered. Corrections to take into
181account these two main nonstationary features are considered in
182\cite{antoniadis2012prevision} defining a flexible nonparametric function-valued
183forecast model called KWF (\textit{Kernel + Wavelet + Functional}) well suited
184to handle nonstationary series. The predictor can be seen as a weighted average
185of futures of past situations, where the weights increase with the similarity
186between the past situations and the actual one. Again the similarity is defined
187thanks to the wavelet decompositions of the two segments.
188
189
190\subsection{Technology} % to be employed (hardware y software)}
191
192
193The volume of data to deal for this projet can be handled with standard
194but recent tools for data analysis.
195The specific software tools will be statistical programming language like \texttt{R} with some popular
196libraries (\texttt{data.table}, \texttt{dplyr}) and specific packages to cope with wavelet analysis. All these elements are open source.
197
198When the computational burden will grow, we have direct access to larger computation capacities.
199
200All the tools developed on the project will be made available as open source software licences.
201
202\subsection{Research team}
203
204The proposed team for developing this projet is composed by theree
205academic members :
206\begin{itemize}
207\item Benjamin Auder, LMO, Univ Paris Saclay
208\item Jairo Cugliari, ERIC, Univ Lyon
209\item Jean-Michel Poggi, LMO, Univ Paris Saclay, Univ Paris Descartes
210\end{itemize}
211
212% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
213%
214% S E C T I O N
215%
216\section{Data description}
217\begin{itemize}
218\item a first dataset already used in \cite{energycon} could be used, at least in a first step, to calibrate the method.
219\item simulated data could be obtained at EDF following \cite{bondu15} or any simulation method preserving confidentiality
220of individual consumers. Obviously, any amount of such data could be produced to benchmark the scalability of our approach.
221\item Irish data provided by the Irish commission for energy regulation consisting in 2000 individual consumption (small and
222medium enterprise and residential) at an half-hourly resolution as well as pre and post experiment survey (see \cite{Cer_a, Cer_b}).
223\end{itemize}
224
225
226
227% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
228%
229% S E C T I O N
230%
231\section{Budget}
232The expected global budget for the projet is of 15000 \euro, which comprises a 1 day workshop.
233
234\paragraph{Internal budget} The members of the research team are based on the Paris area and Lyon.
235The way we work includes video and audio conferences in a regular basis as well as several in-person meetings.
236
237We plan to present the work on international conferences both on data science and energy oriented meetings.
238
239Last, a stress test for the upscale skill of the proposed method will need to hire computing time on a specialized platform. We have access to
240the Centre de Calcul de l'Institut National de Physique Nucléaire et de Physique des Particules (\url{http://cc.in2p3.fr/}) through the laboratory ERIC, Lyon 2.
241
242\paragraph{Worshop organization on Individual Electricity Consumers}
243A 1-day workshop dedicated to Individual Electricity Consumers including
244sessions on data, packages and methods, could be organized in September
2452017, and could be proposed to The French Statistical Society (SFdS) as a
246satellite meeting of the Journées de Statistique 2018 which will be held in
247the campus of EDF Lab in May 2018.
248
249
250\begin{center}
251\begin{tabular}{lr} \hline
252\textbf{Internal budget} & \textbf{10 000 \euro}\\
253\; Travels & 3 000 \euro\\
254\; Conference fees & 3 000 \euro\\
255\; Internal meetings & 2 000 \euro\\
256\; Hiring of high performance computing time & 2 000 \euro\\
257\textbf{Worshop organization} & \textbf{5 000 \euro} \\
258\; Invitations of researchers & 3 000 \euro\\
259\; Organization workshop & 2 000 \euro\\ \hline
260\textbf{Global budget} & \textbf{15 000 \euro} \\ \hline
261\end{tabular}
262\end{center}
263
264
265% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
266%
267% S E C T I O N
268%
269\section{Vitas}
270
271\paragraph{Benjamin Auder} is CNRS Research Engineer at LMO, University Paris-Sud Orsay in France.
272He obtained his PhD in statistics in 2011 at the university Université Pierre et Marie Curie, Paris.
273His main research areas are Clustering, dimensionality reduction, manifold learning, machine learning
274in addition to software development and implementation issues of algorithmic solutions.
275
276(\url{http://auder.net/page-upsud/})
277
278\paragraph{Jairo Cugliari} is Assistant Professor of Statistics at University of Lyon in France. He obtained his PhD in statistics
279in 2011 at the university Paris-Sud 11 Orsay. His main research areas are functional data analysis methods
280for classification and prediction for applied statistical problems.
281
282(\url{http://eric.univ-lyon2.fr/~jcugliari/})
283
284
285
286\paragraph{Jean-Michel Poggi} is Professor of Statistics at University of Paris Descartes
287and at University Paris-Sud Orsay in France. His main research areas are
288tree-based methods for classification and regression, nonparametric time
289series forecasting, wavelet methods and applied statistical modeling in energy
290and environment fields. His publications combine theoretical and practical
291contributions together with industrial applications and software development.
292
293\noindent
294He is an elected member of the ISI, he was President of the French Statistical
295Society (SFdS) and he is Vice-President of the FENStatS, Vice-President of ENBIS and President of ECAS.
296
297(\url{http://www.math.u-psud.fr/~poggi/})
298
299% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
300%
301% S E C T I O N
302%
303\section{Associated industrial company} % And members
304
305
306\paragraph{Yannig Goude} is a research-engineer/project manager at EDF R\&D and associate
307professor at University Paris-Sud Orsay, France. He obtained his PhD in statistics and probability
308in 2008 at the university Paris-Sud 11 Orsay. His research interests are electricity load forecasting,
309more generally time series analysis and forecasting, non-parametric models and expert aggregation.
310
311(\url{https://fr.linkedin.com/in/yannig-goude-768b3980})
312
313\bibliographystyle{plain}
314\bibliography{biblio_irsdi} %,predintervals,rapportfinal}
315
316\end{document}
317
318
319
320\bibitem{Alzate}
321C.~Alzate and M.~Sinn,
322 Improved electricity load forecasting via kernel spectral clustering of
323 smartmeter,
324 \emph{International Conference on Data Mining}, vol. 948, pp. 943 -- 948,
325 2013
326
327\bibitem{antoniadis2006functional}
328A.~Antoniadis, E.~Paparoditis and T.~Sapatinas,
329 A functional wavelet-kernel approach for time series prediction,
330 \emph{Journal of the Royal Statistical Society, Series B},
331 vol. 68(5), pp. 837 -- 857, 2006
332
333\bibitem{antoniadis2013clustering}
334A.~Antoniadis, X.~Brossat, J.~Cugliari, and J.-M.~Poggi,
335 Clustering functional data using wavelets,
336 \emph{International Journal of Wavelets, Multiresolution and Information
337 Processing},
338 vol. 11(1), 2013
339
340\bibitem{antoniadis2012prevision}
341A. Antoniadis, X. Brossat, J. Cugliari, J.-M. Poggi,
342 Pr\'{e}vision d'un processus \`{a} valeurs fonctionnelles en pr\'{e}sence de
343 non stationnarit\'{e}s. Application \`{a} la consommation
344 d'\'{e}lectricit\'{e}
345 Journal de la Soci\'{e}t\'{e} Fran\c{c}aise de Statistique,
346 Vol. 153, No. 2, 52--78, 2012
347
348\bibitem{brabec2015statistical}
349Brabec, M. and Kon{\'a}r, O. and Mal{\`y}, M. and Kasanick{\`y}, I and Pelik{\'a}n, E.,
350 Statistical models for disaggregation and reaggregation of natural gas
351 consumption data,
352 \emph{Journal of Applied Statistics}, vol. 42(5), pp. 921--937, 2015
353
354\bibitem{carevic2010applications}
355Carevi{\'c}, S. and Capuder, T. and Delimar, M.
356 Applications of clustering algorithms in long-term load forecasting
357 \emph{Proceedings Energy Conference and Exhibition (EnergyCon),
358 2010 IEEE International} 688--693, 2010
359
360\bibitem{Chicco}
361G. Chicco
362 Overview and performance assessment of the clustering methods for electrical
363 load pattern grouping, Energy , 42, 68 -- 80, 2012.
364
365\bibitem{Figueiredo}
366Figueiredo, V., Rodrigues, F., Vale, Z., Gouveia, J. B.
367 An electric energy consumer characterization framework based on data mining
368 techniques.
369 Power Systems, IEEE Transactions on, 20(2), 596--602, 2005
370
371\bibitem{iwafune2014short}
372Iwafune, Y., Yagita, Y., Ikegami, T., Ogimoto K.
373 Short-term forecasting of residential building load for distributed energy
374 management
375 \emph{Proceedings Energy Conference (ENERGYCON), 2014 IEEE International}
376 1197--1204, 2014
377
378\bibitem{kaufmanpj}
379Kaufman, L. and Rousseeuw, P
380 Finding groups in data: An introduction to cluster analysis,
381 Hoboken NJ John Wiley \& Sons Inc, 1990
382
383\bibitem{Kwac}
384J. Kwac, Flora, J., Rajagopal, R.
385 Household Energy Consumption Segmentation Using Hourly Data
386 Smart Grid, IEEE Transactions on, 5, 420--430, 2014
387
388\bibitem{labeeuw}
389Labeeuw, W., Stragier, J., and Deconinck, G.
390 Potential of active demand reduction with residential wet appliances:
391 A case study for Belgium.
392 Smart Grid, IEEE Transactions on, 6(1), 315--323, 2015
393
394\bibitem{Liao}
395Warren Liao, T.
396 Clustering of time series data--a survey
397 Pattern recognition, 38(11), 1857--1874, 2005
398
399\bibitem{MisitiElec}
400M.~Misiti, Y.~Misiti, G.~Oppenheim, and J.-M.~Poggi,
401 Optimized Clusters for Disaggregated Electricity Load Forecasting,
402 \emph{REVSTAT -- Statistical Journal}, vol. 8(2), pp. 105 -- 124, 2010
403
404\bibitem{Mutanen}
405 Mutanen, A., Ruska, M., Repo, S., Jarventausta, P.
406 Customer classification and load profiling method for distribution systems.
407 Power Delivery, IEEE Transactions on, 26(3), 1755--1763, 2011
408
409%\bibitem{Piao}
410%Piao, M., Lee, H. G., Park, J. H., Ryu, K. H.
411% Application of Classification Methods for Forecasting Mid-Term
412% Power Load Patterns.
413% In Advanced Intelligent Computing Theories and Applications. Springer, 2008
414
415\bibitem{Rasanen}
416T., R\"{a}s\"{a}nen, D., Voukantsis, H., Niska, K., Karatzas, M., Kolehmainen
417 Data-based method for creating electricity use load profiles using large
418 amount of customer-specific hourly measured electricity use data
419 Applied Energy, 87(11), 3538--3545, 2010
420
421\bibitem{Rhodes}
422J.D. Rhodes, W.J. Cole, C.R. Upshaw, T.F. Edgar, M.E. Webber
423 Clustering analysis of residential electricity demand profiles
424 Preprint submitted to Applied Energy, March 18, 2014
425
426\bibitem{steinley2008new}
427D. Steinley and M. Brusco,
428A new variable weighting and selection procedure for k-means cluster analysis.
429\emph{Multivariate Behavioral Research}, 43:32, 2008.
430
431\bibitem{wijaya2015forecasting}
432Wijaya, T. K., Sinn, M., and Chen, B.,
433 Forecasting Uncertainty in Electricity Demand,
434 \emph{AAAI-15 Workshop on Computational Sustainability, EPFL-CONF-203769},
435 2015
436
437\bibitem{Zhou}
438K. Zhou, S. Yang, C. Shen
439 A review of electric load classification in smart grid environment,
440 Renewable and Sustainable Energy Reviews, 24, 103 -- 110, 2013.
441