0) Download & compile Benjamin's specific library git clone git@auder.net:cgds cd cgds bash makeMakefile.sh src make src sudo make install Make sure that the install destination is on the LD_LIBRARY_PATH environment variable. 1) Compile source code for 1st stage clustering mkdir -p build/stage1/src cd build/stage1/src cmake ../../../stage1/src make 2) #repeat previous lines for stage 2 ??? NOTA: Need to have openmpi, mpich (compiler for mpi) libxml, and libgsl installed. Usage (stage 1) : Serialize input data using ppam.exe serialize inputfile_edf outputfile_edf 1 0 # 1 indicates data is by column # 0 means process all the rows mpirun -np nbProcess ppam.exe cluster ifilename nbSeriesInChunk nbClusters randomize p_for_dissims ## ex. > mpirun -np 4 ./ppam.exe cluster ~/tmp/2009.bin 5000 200 1 2 Where : nbProcess = number of simultaneous processes ifilename = path to serialized dataset (read below) nbSeriesInChunk = number of time-series to process sequentially nbClusters = number of clusters randomize = 1 to dispatch time-series at random. 0 to process them in order p_for_dissims = the 'p' of L_p distance used to compute dissimilarities The results are stored in ppamResult.xml (curves ids and ranks) while ppamFinalSeries.bin are the curves used in the last clustering step. The ranks in ppamResult.xml refer to the curves in ppamFinalSeries.bin Note : custom [de]serialization. Consider writing your own in src/TimeSeries/ folder if you plan to test the package. See also src/main.c for the details.