code/README

   1 0) Download & compile Benjamin's specific library
   2
   3    git clone git@auder.net:cgds
   4    cd cgds
   5    bash makeMakefile.sh src
   6    make src
   7    sudo make install
   8
   9 Make sure that the install destination is on the LD_LIBRARY_PATH environment variable.
  10
  11 1) Compile source code for 1st stage clustering
  12
  13    mkdir -p build/stage1/src
  14    cd build/stage1/src
  15    cmake ../../../stage1/src
  16    make
  17
  18 2) #repeat previous lines for stage 2 ???
  19
  20
  21 NOTA: Need to have openmpi, mpich (compiler for mpi) libxml, and libgsl installed.
  22
  23
  24 Usage (stage 1) :
  25
  26 Serialize input data using
  27
  28    ppam.exe serialize inputfile_edf outputfile_edf 1 0
  29
  30 # 1 indicates data is by column
  31 # 0 means process all the rows
  32
  33    mpirun -np nbProcess ppam.exe cluster ifilename nbSeriesInChunk nbClusters randomize p_for_dissims
  34
  35 ## ex. > mpirun -np 4 ./ppam.exe cluster ~/tmp/2009.bin 5000 200 1 2
  36
  37 Where :
  38    nbProcess = number of simultaneous processes
  39    ifilename = path to serialized dataset (read below)
  40    nbSeriesInChunk = number of time-series to process sequentially
  41    nbClusters = number of clusters
  42    randomize = 1 to dispatch time-series at random. 0 to process them in order
  43    p_for_dissims = the 'p' of L_p distance used to compute dissimilarities
  44
  45
  46 The results are stored in ppamResult.xml (curves ids and ranks) while ppamFinalSeries.bin
  47 are the curves used in the last clustering step. The ranks in ppamResult.xml refer to the
  48 curves in ppamFinalSeries.bin
  49
  50
  51 Note : custom [de]serialization. Consider writing your own
  52 in src/TimeSeries/ folder if you plan to test the package.
  53
  54 See also src/main.c for the details.