| 1 | 0) Download & compile Benjamin's specific library |
| 2 | |
| 3 | git clone git@auder.net:cgds |
| 4 | cd cgds |
| 5 | bash makeMakefile.sh src |
| 6 | make src |
| 7 | sudo make install |
| 8 | |
| 9 | Make sure that the install destination is on the LD_LIBRARY_PATH environment variable. |
| 10 | |
| 11 | 1) Compile source code for 1st stage clustering |
| 12 | |
| 13 | mkdir -p build/stage1/src |
| 14 | cd build/stage1/src |
| 15 | cmake ../../../stage1/src |
| 16 | make |
| 17 | |
| 18 | 2) #repeat previous lines for stage 2 ??? |
| 19 | |
| 20 | |
| 21 | NOTA: Need to have openmpi, mpich (compiler for mpi) libxml, and libgsl installed. |
| 22 | |
| 23 | |
| 24 | Usage (stage 1) : |
| 25 | |
| 26 | Serialize input data using |
| 27 | |
| 28 | ppam.exe serialize inputfile_edf outputfile_edf 1 0 |
| 29 | |
| 30 | # 1 indicates data is by column |
| 31 | # 0 means process all the rows |
| 32 | |
| 33 | mpirun -np nbProcess ppam.exe cluster ifilename nbSeriesInChunk nbClusters randomize p_for_dissims |
| 34 | |
| 35 | ## ex. > mpirun -np 4 ./ppam.exe cluster ~/tmp/2009.bin 5000 200 1 2 |
| 36 | |
| 37 | Where : |
| 38 | nbProcess = number of simultaneous processes |
| 39 | ifilename = path to serialized dataset (read below) |
| 40 | nbSeriesInChunk = number of time-series to process sequentially |
| 41 | nbClusters = number of clusters |
| 42 | randomize = 1 to dispatch time-series at random. 0 to process them in order |
| 43 | p_for_dissims = the 'p' of L_p distance used to compute dissimilarities |
| 44 | |
| 45 | |
| 46 | The results are stored in ppamResult.xml (curves ids and ranks) while ppamFinalSeries.bin |
| 47 | are the curves used in the last clustering step. The ranks in ppamResult.xml refer to the |
| 48 | curves in ppamFinalSeries.bin |
| 49 | |
| 50 | |
| 51 | Note : custom [de]serialization. Consider writing your own |
| 52 | in src/TimeSeries/ folder if you plan to test the package. |
| 53 | |
| 54 | See also src/main.c for the details. |