1 \documentclass[12pt,
a4paper]{article
}
3 \usepackage[margin=
2.5cm
]{geometry
}
4 \usepackage[utf8
]{inputenc} % in encoding
5 \usepackage[T1]{fontenc} % out-encoding f
7 \usepackage{lmodern, microtype
} % goes OK with T1 fontenc
8 %\usepackage[authoryear, round]{natbib}
10 \usepackage{color, tikz,
graphicx, subfig
}
11 \usepackage{amssymb, amsmath, amsthm
}
12 \usepackage{setspace, lineno, url, xcolor
}
13 \usepackage{savetrees
}
15 \newcommand{\todo}[1]{\textcolor{blue
}{TODO:
#1}} % macro for todo entries
18 \renewcommand\familydefault{\sfdefault} % Use with sans serif font
19 \setlength{\bibsep}{0.0pt
} % Compact bibliography (natbib)
21 \title{Disaggregated Electricity Forecasting using Clustering of Individual Consumers \\
22 {\normalsize \color{gray
} IRSDI - RESEARCH INITIATIVE IN INDUSTRIAL DATA SCIENCE
}}
24 \author{Benjamin Auder
\and
29 \date{\normalsize\today
30 \vspace{-
1.2\baselineskip}}
41 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
47 \subsection{Industrial
}
49 Electricity load forecasting is crucial for utilities for production
50 planning as well as marketing offers. Recently, the increasing deployment of
51 smart grids infrastructure requires the development of more flexible data
52 driven forecasting methods adapting quite automatically to new data sets.
53 Electricity load forecasting is crucial for utilities for production planning as
54 well as marketing offers. New metering infrastructures as smart meters
55 provide new and potentially massive informations about individual (household,
56 small and medium enterprise) consumption. As an example, in France,
57 ERDF (Electricite Reseau Distribution de France the French manager of
58 the public electricity distribution network) deployed
250000 smart meters,
59 covering a rural and an urban territory and providing half-hourly household
60 energy used each day. ERDF plans to install
35 millions of them over the
61 French territory by the end of
2020 and exploiting such an amount of data
62 is an exciting but challenging task (see
\url{http://www.erdf.fr/Linky
}).
63 We propose to build clustering tools useful for forecasting the load
64 consumption. The idea is to disaggregate the global signal in such a way that
65 the sum of disaggregated forecasts significantly improves the prediction of the
66 whole global signal. The strategy is in three steps: first we cluster curves
67 defining super-consumers, then we build a hierarchy of partitions within which
68 the best one is finally selected with respect to a disaggregated forecast
69 criterion. The proposed strategy is applied to a dataset of individual
70 consumers from the French electricity provider EDF. A substantial gain
71 of $
16$ \% in forecast accuracy comparing to the
1-cluster approach is provided
72 by disaggregation while preserving meaningful classes of consumers.
76 In the context of economic seasonal univariate continuous time series, it is often
77 natural to segment it in time, into consecutive curves, for example days, which
78 are then treated as a discrete time series of functions. In particular, in the
79 electrical context, the shape of the curves exhibits rich information about the
80 calendar day type, the meteorological conditions or the existence of special
81 electricity tariffs. Using the information contained in the shape of the load
82 curves leads to very elegant formulation of functional forecasting.
85 %Electricity load experts naturally look at daily demand data as time functions
86 %called load curves. In a recent paper, \cite{shang2013} uses a functional time
87 %series approach for forecasting short-term electricity demand. This paper is
88 %illustrated by the half-hourly electricity demand from Monday to Sunday in South
89 %Australia. The strategy is also to consider a seasonal univariate time series as
90 %a time series of curves, then to reduce the dimensionality of curves by applying
91 %a functional principal component analysis and finally, following
92 %\cite{shang2011}, the principal component scores are forecasted using a
93 %univariate ARIMA models. In addition, since data points in the daily electricity
94 %demand are sequentially observed, a forecast updating method based on
95 %nonparametric bootstrap approach is proposed to improve the accuracy of point
96 %forecasts. With respect to this strategy, the scheme we propose handles the
97 %forecasting problem in a functional way avoiding the hour by hour processing and
98 %considers a more flexible way to construct the distribution leading to the
101 The shape of the curves exhibits rich information about the calendar day type,
102 the meteorological conditions or the existence of special electricity tariffs.
103 Using the information contained in the shape of the load curves,
\cite{antoniadis2012prevision
} proposed a flexible nonparametric function-valued
104 forecast model called KWF (
\textit{Kernel + Wavelet + Functional
}) well suited
105 to handle nonstationary series. The predictor can be seen as a weighted average
106 of futures of past situations, where the weights increase with the similarity
107 between the past situations and the actual one. In addition, this strategy
108 provides with a simultaneous multiple horizon prediction for a global forecast.
110 However, there is a need for local electricity load forecasting at different levels of the grid.
111 Bottom-up approaches, based on a two stage process combining clustering and forecasting
112 methods, are a promising perspective. First, it
113 consists in building classes in a population such that each class could be
114 sufficiently well forecast but corresponds to different load shapes or reacts
115 differently to exogenous variables like temperature or prices (see e.g.
116 \cite{labeeuw
} in the context of demand response). The second stage consists in
117 aggregating forecasts to forecast the total or any subtotal of the population
118 consumption. For example, identify and forecast the consumption of a
119 sub-population reactive to an incentive is an important need to optimize a
120 demand response program.
124 Few papers consider the problem of clustering individual consumption for
125 forecasting (e.g.
\cite{iwafune2014short, Alzate, carevic2010applications, MisitiElec
}). Recently,
\cite{energycon
} proposed to build clustering tools useful for the two tasks simultaneously: clustering individual customers and forecasting the load consumption. The idea is to disaggregate the global signal in such a way that the sum of disaggregated forecasts significantly improves the prediction of the whole global signal. The general strategy is in three steps: first we cluster individual curves defining super-consumers, then we built a hierarchy of partitions within which a best one is finally selected with respect to a disaggregated forecast criterion. The predictions are made with the KWF model which allows one to use it as a off-the-shelve tool.
127 While this work has ended with an the specification of an algorithm, a current need is a real upscaling proof. A first step on this direction was done in
131 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
137 The method proposed in
\cite{energycon
} has been successfully tested on a small data set of EDF clients. With the current development of smart meters in France the available volume of individual data is increasing day after day. Then, there is a genuine need of measuring the upscale skills of the existent methods.
139 This projet's aim is twofold. First, we will evaluate the upscaling capacity of the strategy developed in
\cite{energycon
} to cope with the upgrowing volume of data. Second, we will study how to adapt the KWF prediction method to take into account an exogeneous variable. In our particular problem the exogeneous variables can be any meteorological measurement that affects the load demand and is available at the moment of the prediction.
142 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
147 \section{Means considered
}
150 \paragraph{Clustering analysis.
} In general, clustering methods look for groups of individuals on data in such a way that those belonging to the same group are more similar than those from other groups. Many methods exists to cluster data:
151 hierarchical, center-based, probabilistic, etc. Almost all of them depends heavily
152 on the choice of a similarity measure between individuals. For this challenge we plan
153 to compare individuals in terms of their wavelet spectrum signature. Thanks to this strategy, non
154 stationary signals may be fairly compared. Moreover, the signals need not to be
155 measured on the same temporal grid. However, in order to detect relevant results
156 the wavelet signatures should be corrected by exogenous information (e.g. the one
157 provided as client characteristics).
159 \paragraph{Wavelet analysis.
} Since the objects to analyze (load curves) can be viewed
160 as functions of time, functional data analysis techniques are one possible choice to
161 represent these objects. From a stochastic point of view the functions are realizations
162 of a non stationary random process. Wavelet transform can be used to extract
163 relevant information about the functions both on time and frequency. With an
164 appropriate representation of the objects, it is then possible to construct
165 a meaningful distance between load curves.
167 \paragraph{Forecasting with KWF
}
168 The basic idea of nonparametric forecasting is that similar cases in the past
169 have similar future consequences. For example the electricity consumption is
170 divided into blocks of one day size. Then, using a dissimilarity measure, the
171 blocks similar to the last observed block are searched in the past and a vector
172 of weights is built. Finally, the forecast of the next day is obtained by a
173 weighted average of the most similar future days using previous vector of
174 weights. From the statistical point of view, the model is an estimate of the
175 regression function using the kernel method, of the last block against all the
176 blocks in the past. In
\cite{antoniadis2006functional
} this basic model is
177 extended to the case of stationary functional random variables. But in the
178 context of electrical power demand, the hypothesis of stationarity generally
179 fails: an evolving mean level and the existence of groups that may be seen as
180 classes of stationarity are to be considered. Corrections to take into
181 account these two main nonstationary features are considered in
182 \cite{antoniadis2012prevision
} defining a flexible nonparametric function-valued
183 forecast model called KWF (
\textit{Kernel + Wavelet + Functional
}) well suited
184 to handle nonstationary series. The predictor can be seen as a weighted average
185 of futures of past situations, where the weights increase with the similarity
186 between the past situations and the actual one. Again the similarity is defined
187 thanks to the wavelet decompositions of the two segments.
190 \subsection{Technology
} % to be employed (hardware y software)}
193 The volume of data to deal for this projet can be handled with standard
194 but recent tools for data analysis.
195 The specific software tools will be statistical programming language like
\texttt{R
} with some popular
196 libraries (
\texttt{data.table
},
\texttt{dplyr
}) and specific packages to cope with wavelet analysis. All these elements are open source.
198 When the computational burden will grow, we have direct access to larger computation capacities.
200 All the tools developed on the project will be made available as open source software licences.
202 \subsection{Research team
}
204 The proposed team for developing this projet is composed by theree
207 \item Benjamin Auder, LMO, Univ Paris Saclay
208 \item Jairo Cugliari, ERIC, Univ Lyon
209 \item Jean-Michel Poggi, LMO, Univ Paris Saclay, Univ Paris Descartes
212 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
216 \section{Data description
}
218 \item a first dataset already used in
\cite{energycon
} could be used, at least in a first step, to calibrate the method.
219 \item simulated data could be obtained at EDF following
\cite{bondu15
} or any simulation method preserving confidentiality
220 of individual consumers. Obviously, any amount of such data could be produced to benchmark the scalability of our approach.
221 \item Irish data provided by the Irish commission for energy regulation consisting in
2000 individual consumption (small and
222 medium enterprise and residential) at an half-hourly resolution as well as pre and post experiment survey (see
\cite{Cer_a, Cer_b
}).
227 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
232 The expected global budget for the projet is of
15000 \euro, which comprises a
1 day workshop.
234 \paragraph{Internal budget
} The members of the research team are based on the Paris area and Lyon.
235 The way we work includes video and audio conferences in a regular basis as well as several in-person meetings.
237 We plan to present the work on international conferences both on data science and energy oriented meetings.
239 Last, a stress test for the upscale skill of the proposed method will need to hire computing time on a specialized platform. We have access to
240 the Centre de Calcul de l'Institut National de Physique Nucléaire et de Physique des Particules (
\url{http://cc.in2p3.fr/
}) through the laboratory ERIC, Lyon
2.
242 \paragraph{Worshop organization on Individual Electricity Consumers
}
243 A
1-day workshop dedicated to Individual Electricity Consumers including
244 sessions on data, packages and methods, could be organized in September
245 2017, and could be proposed to The French Statistical Society (SFdS) as a
246 satellite meeting of the Journées de Statistique
2018 which will be held in
247 the campus of EDF Lab in May
2018.
251 \begin{tabular
}{lr
} \hline
252 \textbf{Internal budget
} &
\textbf{10 000 \euro}\\
253 \; Travels &
3 000 \euro\\
254 \; Conference fees &
3 000 \euro\\
255 \; Internal meetings &
2 000 \euro\\
256 \; Hiring of high performance computing time &
2 000 \euro\\
257 \textbf{Worshop organization
} &
\textbf{5 000 \euro} \\
258 \; Invitations of researchers &
3 000 \euro\\
259 \; Organization workshop &
2 000 \euro\\
\hline
260 \textbf{Global budget
} &
\textbf{15 000 \euro} \\
\hline
265 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
271 \paragraph{Benjamin Auder
} is CNRS Research Engineer at LMO, University Paris-Sud Orsay in France.
272 He obtained his PhD in statistics in
2011 at the university Université Pierre et Marie Curie, Paris.
273 His main research areas are Clustering, dimensionality reduction, manifold learning, machine learning
274 in addition to software development and implementation issues of algorithmic solutions.
276 (
\url{http://auder.net/page-upsud/
})
278 \paragraph{Jairo Cugliari
} is Assistant Professor of Statistics at University of Lyon in France. He obtained his PhD in statistics
279 in
2011 at the university Paris-Sud
11 Orsay. His main research areas are functional data analysis methods
280 for classification and prediction for applied statistical problems.
282 (
\url{http://eric.univ-lyon2.fr/~jcugliari/
})
286 \paragraph{Jean-Michel Poggi
} is Professor of Statistics at University of Paris Descartes
287 and at University Paris-Sud Orsay in France. His main research areas are
288 tree-based methods for classification and regression, nonparametric time
289 series forecasting, wavelet methods and applied statistical modeling in energy
290 and environment fields. His publications combine theoretical and practical
291 contributions together with industrial applications and software development.
294 He is an elected member of the ISI, he was President of the French Statistical
295 Society (SFdS) and he is Vice-President of the FENStatS, Vice-President of ENBIS and President of ECAS.
297 (
\url{http://www.math.u-psud.fr/~poggi/
})
299 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
303 \section{Associated industrial company
} % And members
306 \paragraph{Yannig Goude
} is a research-engineer/project manager at EDF R\&D and associate
307 professor at University Paris-Sud Orsay, France. He obtained his PhD in statistics and probability
308 in
2008 at the university Paris-Sud
11 Orsay. His research interests are electricity load forecasting,
309 more generally time series analysis and forecasting, non-parametric models and expert aggregation.
311 (
\url{https://fr.linkedin.com/in/yannig-goude-
768b3980
})
313 \bibliographystyle{plain
}
314 \bibliography{biblio_irsdi
} %,predintervals,rapportfinal}
321 C.~Alzate and M.~Sinn,
322 Improved electricity load forecasting via kernel spectral clustering of
324 \emph{International Conference on Data Mining
}, vol.
948, pp.
943 --
948,
327 \bibitem{antoniadis2006functional
}
328 A.~Antoniadis, E.~Paparoditis and T.~Sapatinas,
329 A functional wavelet-kernel approach for time series prediction,
330 \emph{Journal of the Royal Statistical Society, Series B
},
331 vol.
68(
5), pp.
837 --
857,
2006
333 \bibitem{antoniadis2013clustering
}
334 A.~Antoniadis, X.~Brossat, J.~Cugliari, and J.-M.~Poggi,
335 Clustering functional data using wavelets,
336 \emph{International Journal of Wavelets, Multiresolution and Information
340 \bibitem{antoniadis2012prevision
}
341 A. Antoniadis, X. Brossat, J. Cugliari, J.-M. Poggi,
342 Pr\'
{e
}vision d'un processus \`
{a
} valeurs fonctionnelles en pr\'
{e
}sence de
343 non stationnarit\'
{e
}s. Application \`
{a
} la consommation
344 d'\'
{e
}lectricit\'
{e
}
345 Journal de la Soci\'
{e
}t\'
{e
} Fran
\c{c
}aise de Statistique,
346 Vol.
153, No.
2,
52--
78,
2012
348 \bibitem{brabec2015statistical
}
349 Brabec, M. and Kon
{\'a
}r, O. and Mal
{\`y
}, M. and Kasanick
{\`y
}, I and Pelik
{\'a
}n, E.,
350 Statistical models for disaggregation and reaggregation of natural gas
352 \emph{Journal of Applied Statistics
}, vol.
42(
5), pp.
921--
937,
2015
354 \bibitem{carevic2010applications
}
355 Carevi
{\'c
}, S. and Capuder, T. and Delimar, M.
356 Applications of clustering algorithms in long-term load forecasting
357 \emph{Proceedings Energy Conference and Exhibition (EnergyCon),
358 2010 IEEE International
} 688--
693,
2010
362 Overview and performance assessment of the clustering methods for electrical
363 load pattern grouping, Energy ,
42,
68 --
80,
2012.
366 Figueiredo, V., Rodrigues, F., Vale, Z., Gouveia, J. B.
367 An electric energy consumer characterization framework based on data mining
369 Power Systems, IEEE Transactions on,
20(
2),
596--
602,
2005
371 \bibitem{iwafune2014short
}
372 Iwafune, Y., Yagita, Y., Ikegami, T., Ogimoto K.
373 Short-term forecasting of residential building load for distributed energy
375 \emph{Proceedings Energy Conference (ENERGYCON),
2014 IEEE International
}
379 Kaufman, L. and Rousseeuw, P
380 Finding groups in data: An introduction to cluster analysis,
381 Hoboken NJ John Wiley \& Sons Inc,
1990
384 J. Kwac, Flora, J., Rajagopal, R.
385 Household Energy Consumption Segmentation Using Hourly Data
386 Smart Grid, IEEE Transactions on,
5,
420--
430,
2014
389 Labeeuw, W., Stragier, J., and Deconinck, G.
390 Potential of active demand reduction with residential wet appliances:
391 A case study for Belgium.
392 Smart Grid, IEEE Transactions on,
6(
1),
315--
323,
2015
396 Clustering of time series data--a survey
397 Pattern recognition,
38(
11),
1857--
1874,
2005
400 M.~Misiti, Y.~Misiti, G.~Oppenheim, and J.-M.~Poggi,
401 Optimized Clusters for Disaggregated Electricity Load Forecasting,
402 \emph{REVSTAT -- Statistical Journal
}, vol.
8(
2), pp.
105 --
124,
2010
405 Mutanen, A., Ruska, M., Repo, S., Jarventausta, P.
406 Customer classification and load profiling method for distribution systems.
407 Power Delivery, IEEE Transactions on,
26(
3),
1755--
1763,
2011
410 %Piao, M., Lee, H. G., Park, J. H., Ryu, K. H.
411 % Application of Classification Methods for Forecasting Mid-Term
412 % Power Load Patterns.
413 % In Advanced Intelligent Computing Theories and Applications. Springer, 2008
416 T., R\"
{a
}s\"
{a
}nen, D., Voukantsis, H., Niska, K., Karatzas, M., Kolehmainen
417 Data-based method for creating electricity use load profiles using large
418 amount of customer-specific hourly measured electricity use data
419 Applied Energy,
87(
11),
3538--
3545,
2010
422 J.D. Rhodes, W.J. Cole, C.R. Upshaw, T.F. Edgar, M.E. Webber
423 Clustering analysis of residential electricity demand profiles
424 Preprint submitted to Applied Energy, March
18,
2014
426 \bibitem{steinley2008new
}
427 D. Steinley and M. Brusco,
428 A new variable weighting and selection procedure for k-means cluster analysis.
429 \emph{Multivariate Behavioral Research
},
43:
32,
2008.
431 \bibitem{wijaya2015forecasting
}
432 Wijaya, T. K., Sinn, M., and Chen, B.,
433 Forecasting Uncertainty in Electricity Demand,
434 \emph{AAAI-
15 Workshop on Computational Sustainability, EPFL-CONF-
203769},
438 K. Zhou, S. Yang, C. Shen
439 A review of electric load classification in smart grid environment,
440 Renewable and Sustainable Energy Reviews,
24,
103 --
110,
2013.