contrat/2016_IRSDIproject_v3.tex

   1 \documentclass[12pt, a4paper]{article}
   2
   3 \usepackage[margin=2.5cm]{geometry}
   4 \usepackage[utf8]{inputenc}       % in encoding
   5 \usepackage[T1]{fontenc}          % out-encoding f
   6 \usepackage{eurosym}
   7 \usepackage{lmodern, microtype}   % goes OK with T1 fontenc
   8 %\usepackage[authoryear, round]{natbib}
   9 \usepackage{natbib}
  10 \usepackage{color, tikz, graphicx, subfig}
  11 \usepackage{amssymb, amsmath, amsthm}
  12 \usepackage{setspace, lineno, url, xcolor}
  13 \usepackage{savetrees}
  14
  15 \newcommand{\todo}[1]{\textcolor{blue}{TODO: #1}} % macro for todo entries
  16
  17 % Style options
  18 \renewcommand\familydefault{\sfdefault} % Use with sans serif font
  19 \setlength{\bibsep}{0.0pt}              % Compact bibliography (natbib)
  20
  21 \title{Disaggregated Electricity Forecasting using Clustering of Individual Consumers \\
  22       {\normalsize \color{gray}  IRSDI - RESEARCH INITIATIVE IN INDUSTRIAL DATA SCIENCE}}
  23
  24 \author{Benjamin Auder    \and
  25         Jairo Cugliari    \and
  26         Yannig Goude      \and
  27         Jean-Michel Poggi
  28 }
  29 \date{\normalsize\today
  30 \vspace{-1.2\baselineskip}}
  31
  32
  33
  34 \begin{document}
  35 \maketitle
  36
  37 %\begin{abstract}
  38
  39 %\end{abstract}
  40
  41 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  42 %
  43 %                                        S E C T I O N
  44 %
  45 \section{Context}
  46
  47 \subsection{Industrial}
  48
  49 Electricity load forecasting is crucial for utilities for production
  50 planning as well as marketing offers. Recently, the increasing deployment of
  51 smart grids infrastructure requires the development of more flexible data
  52 driven forecasting methods adapting quite automatically to new data sets.
  53 Electricity load forecasting is crucial for utilities for production planning as
  54 well as marketing offers. New metering infrastructures as smart meters
  55 provide new and potentially massive informations about individual (household,
  56 small and medium enterprise) consumption. As an example, in France,
  57 ERDF (Electricite Reseau Distribution de France the French manager of
  58 the public electricity distribution network) deployed 250000 smart meters,
  59 covering a rural and an urban territory and providing half-hourly household
  60 energy used each day. ERDF plans to install 35 millions of them over the
  61 French territory by the end of 2020 and exploiting such an amount of data
  62 is an exciting but challenging task (see \url{http://www.erdf.fr/Linky}).
  63 We propose to build clustering tools useful for forecasting the load
  64 consumption. The idea is to disaggregate the global signal in such a way that
  65 the sum of disaggregated forecasts significantly improves the prediction of the
  66 whole global signal. The strategy is in three steps: first we cluster curves
  67 defining super-consumers, then we build a hierarchy of partitions within which
  68 the best one is finally selected with respect to a disaggregated forecast
  69 criterion. The proposed strategy is applied to a dataset of individual
  70 consumers from the French electricity provider EDF. A substantial gain
  71 of $16$ \% in forecast accuracy comparing to the 1-cluster approach is provided
  72 by disaggregation while preserving meaningful classes of consumers.
  73
  74 \subsection{Academic}
  75
  76 In the context of economic seasonal univariate continuous time series, it is often
  77 natural to segment it in time, into consecutive curves, for example days, which
  78 are then treated as a discrete time series of functions. In particular, in the
  79 electrical context, the shape of the curves exhibits rich information about the
  80 calendar day type, the meteorological conditions or the existence of special
  81 electricity tariffs. Using the information contained in the shape of the load
  82 curves leads to very elegant formulation of functional forecasting.
  83
  84
  85 %Electricity load experts naturally look at daily demand data as time functions
  86 %called load curves. In a recent paper, \cite{shang2013} uses a functional time
  87 %series approach for forecasting short-term electricity demand. This paper is
  88 %illustrated by the half-hourly electricity demand from Monday to Sunday in South
  89 %Australia. The strategy is also to consider a seasonal univariate time series as
  90 %a time series of curves, then to reduce the dimensionality of curves by applying
  91 %a functional principal component analysis and finally, following
  92 %\cite{shang2011}, the principal component scores are forecasted using a
  93 %univariate ARIMA models. In addition, since data points in the daily electricity
  94 %demand are sequentially observed, a forecast updating method based on
  95 %nonparametric bootstrap approach is proposed to improve the accuracy of point
  96 %forecasts. With respect to this strategy, the scheme we propose handles the
  97 %forecasting problem in a functional way avoiding the hour by hour processing and
  98 %considers a more flexible way to construct the distribution leading to the
  99 %prediction interval.
 100
 101 The shape of the curves exhibits rich information about the calendar day type,
 102 the meteorological conditions or the existence of special electricity tariffs.
 103 Using the information contained in the shape of the load curves, \cite{antoniadis2012prevision} proposed a flexible nonparametric function-valued
 104 forecast model called KWF (\textit{Kernel + Wavelet + Functional}) well suited
 105 to handle nonstationary series. The predictor can be seen as a weighted average
 106 of futures of past situations, where the weights increase with the similarity
 107 between the past situations and the actual one. In addition, this strategy
 108 provides with a simultaneous multiple horizon prediction for a global forecast.
 109
 110 However, there is a need for local electricity load forecasting at different levels of the grid.
 111 Bottom-up approaches, based on a two stage process combining clustering and forecasting
 112 methods, are a promising perspective. First, it
 113 consists in building classes in a population such that each class could be
 114 sufficiently well forecast but corresponds to different load shapes or reacts
 115 differently to exogenous variables like temperature or prices (see e.g.
 116 \cite{labeeuw} in the context of demand response). The second stage consists in
 117 aggregating forecasts to forecast the total or any subtotal of the population
 118 consumption. For example, identify and forecast the consumption of a
 119 sub-population reactive to an incentive is an important need to optimize a
 120 demand response program.
 121
 122 \section{Past work}
 123
 124 Few papers consider the problem of clustering individual consumption for
 125 forecasting (e.g. \cite{iwafune2014short, Alzate, carevic2010applications, MisitiElec}). Recently, \cite{energycon} proposed to build clustering tools useful for the two tasks simultaneously: clustering individual customers and forecasting the load consumption. The idea is to disaggregate the global signal in such a way that the sum of disaggregated forecasts significantly improves the prediction of the whole global signal. The general strategy is in three steps: first we cluster individual curves defining super-consumers, then we built a hierarchy of partitions within which a best one is finally selected with respect to a disaggregated forecast criterion. The predictions are made with the KWF model which allows one to use it as a off-the-shelve tool.
 126
 127 While this work has ended with an the specification of an algorithm, a current need is a real upscaling proof. A first step on this direction was done in
 128 \cite{auder2014}.
 129
 130
 131 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 132 %
 133 %                                        S E C T I O N
 134 %
 135 \section{Aims}
 136
 137 The method proposed in \cite{energycon} has been successfully tested on a small data set of EDF clients. With the current development of smart meters in France the available volume of individual data is increasing day after day. Then, there is a genuine need of measuring the upscale skills of the existent methods.
 138
 139 This projet's aim is twofold. First, we will evaluate the upscaling capacity of the strategy developed in \cite{energycon} to cope with the upgrowing volume of data. Second, we will study how to adapt the KWF prediction method to take into account an exogeneous variable. In our particular problem the exogeneous variables can be any meteorological measurement that affects the load demand and is available at the moment of the prediction.
 140
 141
 142 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 143 %
 144 %                                        S E C T I O N
 145 %
 146
 147 \section{Means considered}
 148
 149 \subsection{Methods}
 150 \paragraph{Clustering analysis.} In general, clustering methods look for groups of individuals on data in such a way that those belonging to the same group are more similar than those from other groups. Many methods exists to cluster data:
 151 hierarchical, center-based, probabilistic, etc. Almost all of them depends heavily
 152 on the choice of a similarity measure between individuals. For this challenge we plan
 153 to compare individuals in terms of their wavelet spectrum signature. Thanks to this strategy, non
 154 stationary signals may be fairly compared. Moreover, the signals need not to be
 155 measured on the same temporal grid. However, in order to detect relevant results
 156 the wavelet signatures should be corrected by exogenous information (e.g. the one
 157 provided as client characteristics).
 158
 159 \paragraph{Wavelet analysis.} Since the objects to analyze (load curves) can be viewed
 160 as functions of time, functional data analysis techniques are one possible choice to
 161 represent these objects. From a stochastic point of view the functions are realizations
 162 of a non stationary random process. Wavelet transform can be used to extract
 163 relevant information about the functions both on time and frequency. With an
 164 appropriate representation of the objects, it is then possible to construct
 165 a meaningful distance between load curves.
 166
 167 \paragraph{Forecasting with KWF}
 168 The basic idea of nonparametric forecasting is that similar cases in the past
 169 have similar future consequences. For example  the electricity consumption is
 170 divided into blocks of one day size. Then, using a dissimilarity measure, the
 171 blocks similar to the last observed block are searched in the past and a vector
 172 of weights is built. Finally, the forecast of the next  day is obtained by a
 173 weighted average of the most similar future days using previous vector of
 174 weights. From the statistical point of view, the model is an estimate of the
 175 regression function using the kernel method, of the last block against all the
 176 blocks in the past. In \cite{antoniadis2006functional}  this basic model is
 177 extended to the case of stationary functional random variables. But in the
 178 context of electrical power demand, the hypothesis of stationarity generally
 179 fails: an evolving mean level and the existence of groups that may be seen as
 180 classes of stationarity are to be considered. Corrections to take into
 181 account these two main nonstationary features are considered in
 182 \cite{antoniadis2012prevision} defining a flexible nonparametric function-valued
 183 forecast model called KWF (\textit{Kernel + Wavelet + Functional}) well suited
 184 to handle nonstationary series. The predictor can be seen as a weighted average
 185 of futures of past situations, where the weights increase with the similarity
 186 between the past situations and the actual one. Again the similarity is defined
 187 thanks to the wavelet decompositions of the two segments.
 188
 189
 190 \subsection{Technology}  % to be employed (hardware y software)}
 191
 192
 193 The volume of data to deal for this projet can be handled with standard
 194 but recent tools for data analysis.
 195 The specific software tools will be statistical programming language like \texttt{R} with some popular
 196 libraries (\texttt{data.table}, \texttt{dplyr}) and specific packages to cope with wavelet analysis. All these elements are open source.
 197
 198 When the computational burden will grow, we have direct access to larger computation capacities.
 199
 200 All the tools developed on the project will be made available as open source software licences.
 201
 202 \subsection{Research team}
 203
 204 The proposed team for developing this projet is composed by theree
 205 academic members :
 206 \begin{itemize}
 207 \item Benjamin Auder, LMO, Univ Paris Saclay
 208 \item Jairo Cugliari, ERIC, Univ Lyon
 209 \item Jean-Michel Poggi, LMO, Univ Paris Saclay, Univ Paris Descartes
 210 \end{itemize}
 211
 212 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 213 %
 214 %                                        S E C T I O N
 215 %
 216 \section{Data description}
 217 \begin{itemize}
 218 \item a first dataset already used in \cite{energycon} could be used, at least in a first step, to calibrate the method.
 219 \item simulated data could be obtained at EDF following \cite{bondu15} or any simulation method preserving confidentiality
 220 of individual consumers. Obviously, any amount of such data could be produced to benchmark the scalability of our approach.
 221 \item Irish data provided by the Irish commission for energy regulation consisting in 2000 individual consumption (small and
 222 medium enterprise and residential) at an half-hourly resolution as well as pre and post experiment survey (see \cite{Cer_a, Cer_b}).
 223 \end{itemize}
 224
 225
 226
 227 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 228 %
 229 %                                        S E C T I O N
 230 %
 231 \section{Budget}
 232 The expected global budget for the projet is of 15000 \euro, which comprises a 1 day workshop.
 233
 234 \paragraph{Internal budget} The members of the research team are based on the Paris area and Lyon.
 235 The way we work includes video and audio conferences in a regular basis as well as several in-person meetings.
 236
 237 We plan to present the work on international conferences both on data science and energy oriented meetings.
 238
 239 Last, a stress test for the upscale skill of the proposed method will need to hire computing time on a specialized platform. We have access to
 240 the Centre de Calcul de l'Institut National de Physique Nucléaire et de Physique des Particules (\url{http://cc.in2p3.fr/}) through the laboratory ERIC, Lyon 2.
 241
 242 \paragraph{Worshop organization on Individual Electricity Consumers}
 243 A 1-day workshop dedicated to Individual Electricity Consumers including
 244 sessions on data, packages and methods, could be organized in September
 245 2017, and could be proposed to The French Statistical Society (SFdS) as a
 246 satellite meeting of the Journées de Statistique 2018 which will be held in
 247 the campus of EDF Lab in May 2018.
 248
 249
 250 \begin{center}
 251 \begin{tabular}{lr} \hline
 252 \textbf{Internal budget}      & \textbf{10 000 \euro}\\
 253 \; Travels                    &  3 000 \euro\\
 254 \; Conference fees            & 3 000 \euro\\
 255 \; Internal meetings          & 2 000 \euro\\
 256 \; Hiring of high performance computing time & 2 000 \euro\\
 257 \textbf{Worshop organization} & \textbf{5 000 \euro} \\
 258 \; Invitations of researchers & 3 000 \euro\\
 259 \; Organization workshop      & 2 000 \euro\\ \hline
 260 \textbf{Global budget}        & \textbf{15 000 \euro} \\ \hline
 261 \end{tabular}
 262 \end{center}
 263
 264
 265 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 266 %
 267 %                                        S E C T I O N
 268 %
 269 \section{Vitas}
 270
 271 \paragraph{Benjamin Auder} is CNRS Research Engineer at LMO, University Paris-Sud Orsay in France.
 272 He obtained his PhD in statistics in 2011 at the university Université Pierre et Marie Curie, Paris.
 273 His main research areas are Clustering, dimensionality reduction, manifold learning, machine learning
 274 in addition to software development and implementation issues of algorithmic solutions.
 275
 276 (\url{http://auder.net/page-upsud/})
 277
 278 \paragraph{Jairo Cugliari} is Assistant Professor of Statistics at University of Lyon in France. He obtained his PhD in statistics
 279 in 2011 at the university Paris-Sud 11 Orsay. His main research areas are functional data analysis methods
 280 for classification and prediction for applied statistical problems.
 281
 282 (\url{http://eric.univ-lyon2.fr/~jcugliari/})
 283
 284
 285
 286 \paragraph{Jean-Michel Poggi} is Professor of Statistics at University of Paris Descartes
 287 and at University Paris-Sud Orsay in France. His main research areas are
 288 tree-based methods for classification and regression, nonparametric time
 289 series forecasting, wavelet methods and applied statistical modeling in energy
 290 and environment fields. His publications combine theoretical and practical
 291 contributions together with industrial applications and software development.
 292
 293 \noindent
 294 He is an elected member of the ISI, he was President of the French Statistical
 295 Society (SFdS) and he is Vice-President of the FENStatS, Vice-President of ENBIS and President of ECAS.
 296
 297 (\url{http://www.math.u-psud.fr/~poggi/})
 298
 299 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 300 %
 301 %                                        S E C T I O N
 302 %
 303 \section{Associated industrial company} % And members
 304
 305
 306 \paragraph{Yannig Goude} is a research-engineer/project manager at EDF R\&D and associate
 307 professor at University Paris-Sud Orsay, France. He obtained his PhD in statistics and probability
 308 in 2008 at the university Paris-Sud 11 Orsay. His research interests are electricity load forecasting,
 309 more generally time series analysis and forecasting, non-parametric models and expert aggregation.
 310
 311 (\url{https://fr.linkedin.com/in/yannig-goude-768b3980})
 312
 313 \bibliographystyle{plain}
 314 \bibliography{biblio_irsdi} %,predintervals,rapportfinal}
 315
 316 \end{document}
 317
 318
 319
 320 \bibitem{Alzate}
 321 C.~Alzate and M.~Sinn,
 322   Improved electricity load forecasting via kernel spectral clustering of
 323   smartmeter,
 324   \emph{International Conference on Data Mining}, vol. 948, pp. 943 -- 948,
 325   2013
 326
 327 \bibitem{antoniadis2006functional}
 328 A.~Antoniadis, E.~Paparoditis and T.~Sapatinas,
 329   A functional wavelet-kernel approach for time series prediction,
 330   \emph{Journal of the Royal Statistical Society, Series B},
 331   vol. 68(5), pp. 837 -- 857, 2006
 332
 333 \bibitem{antoniadis2013clustering}
 334 A.~Antoniadis, X.~Brossat, J.~Cugliari, and J.-M.~Poggi,
 335   Clustering functional data using wavelets,
 336   \emph{International Journal of Wavelets, Multiresolution and Information
 337         Processing},
 338   vol. 11(1), 2013
 339
 340 \bibitem{antoniadis2012prevision}
 341 A. Antoniadis, X. Brossat, J. Cugliari, J.-M. Poggi,
 342   Pr\'{e}vision d'un processus \`{a} valeurs fonctionnelles en pr\'{e}sence de
 343   non stationnarit\'{e}s. Application \`{a} la consommation
 344   d'\'{e}lectricit\'{e}
 345   Journal de la Soci\'{e}t\'{e} Fran\c{c}aise de Statistique,
 346   Vol. 153, No. 2, 52--78, 2012
 347
 348 \bibitem{brabec2015statistical}
 349 Brabec, M. and Kon{\'a}r, O. and Mal{\`y}, M. and Kasanick{\`y}, I and Pelik{\'a}n, E.,
 350   Statistical models for disaggregation and reaggregation of natural gas
 351   consumption data,
 352   \emph{Journal of Applied Statistics}, vol. 42(5), pp. 921--937, 2015
 353
 354 \bibitem{carevic2010applications}
 355 Carevi{\'c}, S. and Capuder, T. and Delimar, M.
 356   Applications of clustering algorithms in long-term load forecasting
 357   \emph{Proceedings Energy Conference and Exhibition (EnergyCon),
 358   2010 IEEE International} 688--693, 2010
 359
 360 \bibitem{Chicco}
 361 G. Chicco
 362   Overview and performance assessment of the clustering methods for electrical
 363   load pattern grouping, Energy , 42, 68 -- 80, 2012.
 364
 365 \bibitem{Figueiredo}
 366 Figueiredo, V., Rodrigues, F., Vale, Z., Gouveia, J. B.
 367   An electric energy consumer characterization framework based on data mining
 368   techniques.
 369   Power Systems, IEEE Transactions on, 20(2), 596--602, 2005
 370
 371 \bibitem{iwafune2014short}
 372 Iwafune, Y., Yagita, Y., Ikegami, T., Ogimoto K.
 373   Short-term forecasting of residential building load for distributed energy
 374   management
 375   \emph{Proceedings Energy Conference (ENERGYCON), 2014 IEEE International}
 376   1197--1204, 2014
 377
 378 \bibitem{kaufmanpj}
 379 Kaufman, L. and Rousseeuw, P
 380   Finding groups in data: An introduction to cluster analysis,
 381   Hoboken NJ John Wiley \& Sons Inc, 1990
 382
 383 \bibitem{Kwac}
 384 J. Kwac,  Flora, J., Rajagopal, R.
 385   Household Energy Consumption Segmentation Using Hourly Data
 386   Smart Grid, IEEE Transactions on, 5, 420--430, 2014
 387
 388 \bibitem{labeeuw}
 389 Labeeuw, W., Stragier, J., and Deconinck, G.
 390   Potential of active demand reduction with residential wet appliances:
 391   A case study for Belgium.
 392   Smart Grid, IEEE Transactions on, 6(1), 315--323, 2015
 393
 394 \bibitem{Liao}
 395 Warren Liao, T.
 396   Clustering of time series data--a survey
 397   Pattern recognition, 38(11), 1857--1874, 2005
 398
 399 \bibitem{MisitiElec}
 400 M.~Misiti, Y.~Misiti, G.~Oppenheim, and J.-M.~Poggi,
 401   Optimized Clusters for Disaggregated Electricity Load Forecasting,
 402   \emph{REVSTAT -- Statistical Journal}, vol. 8(2), pp. 105 -- 124, 2010
 403
 404 \bibitem{Mutanen}
 405   Mutanen, A., Ruska, M., Repo, S., Jarventausta, P.
 406   Customer classification and load profiling method for distribution systems.
 407   Power Delivery, IEEE Transactions on, 26(3), 1755--1763, 2011
 408
 409 %\bibitem{Piao}
 410 %Piao, M., Lee, H. G., Park, J. H., Ryu, K. H.
 411 %  Application of Classification Methods for Forecasting Mid-Term
 412 %  Power Load Patterns.
 413 %  In Advanced Intelligent Computing Theories and Applications. Springer, 2008
 414
 415 \bibitem{Rasanen}
 416 T., R\"{a}s\"{a}nen, D., Voukantsis,  H., Niska, K., Karatzas, M., Kolehmainen
 417   Data-based method for creating electricity use load profiles using large
 418   amount of customer-specific hourly measured electricity use data
 419   Applied Energy, 87(11), 3538--3545, 2010
 420
 421 \bibitem{Rhodes}
 422 J.D. Rhodes, W.J. Cole, C.R. Upshaw, T.F. Edgar, M.E. Webber
 423   Clustering analysis of residential electricity demand profiles
 424   Preprint submitted to Applied Energy, March 18, 2014
 425
 426 \bibitem{steinley2008new}
 427 D. Steinley and M. Brusco,
 428 A new variable weighting and selection procedure for k-means cluster analysis.
 429 \emph{Multivariate Behavioral Research}, 43:32, 2008.
 430
 431 \bibitem{wijaya2015forecasting}
 432 Wijaya, T. K., Sinn, M., and Chen, B.,
 433   Forecasting Uncertainty in Electricity Demand,
 434   \emph{AAAI-15 Workshop on Computational Sustainability, EPFL-CONF-203769},
 435         2015
 436
 437 \bibitem{Zhou}
 438 K. Zhou, S. Yang, C. Shen
 439   A review of electric load classification in smart grid environment,
 440   Renewable and Sustainable Energy Reviews, 24, 103 -- 110, 2013.
 441