[epclust.git] / old_C_code / README

0) Download & compile Benjamin's specific library

   git clone git@auder.net:cgds
   cd cgds
   bash makeMakefile.sh src
   make src
   sudo make install

Make sure that the install destination is on the LD_LIBRARY_PATH environment variable.

1) Compile source code for 1st stage clustering

   mkdir -p build/stage1/src
   cd build/stage1/src
   cmake ../../../stage1/src
   make

2) #repeat previous lines for stage 2 ???


NOTA: Need to have openmpi, mpich (compiler for mpi) libxml, and libgsl installed.


Usage (stage 1) :

Serialize input data using 

   ppam.exe serialize inputfile_edf outputfile_edf 1 0 

# 1 indicates data is by column
# 0 means process all the rows

   mpirun -np nbProcess ppam.exe cluster ifilename nbSeriesInChunk nbClusters randomize p_for_dissims

## ex. > mpirun -np 4 ./ppam.exe cluster ~/tmp/2009.bin 5000 200 1 2 

Where :
   nbProcess = number of simultaneous processes
   ifilename = path to serialized dataset (read below)
   nbSeriesInChunk = number of time-series to process sequentially
   nbClusters = number of clusters
   randomize = 1 to dispatch time-series at random. 0 to process them in order
   p_for_dissims = the 'p' of L_p distance used to compute dissimilarities


The results are stored in ppamResult.xml (curves ids and ranks) while ppamFinalSeries.bin
are the curves used in the last clustering step. The ranks in ppamResult.xml refer to the
curves in ppamFinalSeries.bin


Note : custom [de]serialization. Consider writing your own 
in src/TimeSeries/ folder if you plan to test the package.

See also src/main.c for the details.
Commit	Line	Data
b8170623 JC	1	0) Download & compile Benjamin's specific library
	2
	3	git clone git@auder.net:cgds
	4	cd cgds
	5	bash makeMakefile.sh src
	6	make src
	7	sudo make install
	8
	9	Make sure that the install destination is on the LD_LIBRARY_PATH environment variable.
	10
	11	1) Compile source code for 1st stage clustering
ab4a34ef	12
e00da896 BA	13	mkdir -p build/stage1/src
e00da896 BA	14	cd build/stage1/src
4b7107ce	15	cmake ../../../stage1/src
ab4a34ef	16	make
b8170623 JC	17
	18	2) #repeat previous lines for stage 2 ???
	19
	20
	21	NOTA: Need to have openmpi, mpich (compiler for mpi) libxml, and libgsl installed.
	22
ab4a34ef	23
e00da896	24	Usage (stage 1) :
ab4a34ef	25
b8170623 JC	26	Serialize input data using
	27
	28	ppam.exe serialize inputfile_edf outputfile_edf 1 0
	29
	30	# 1 indicates data is by column
	31	# 0 means process all the rows
	32
311c5c07	33	mpirun -np nbProcess ppam.exe cluster ifilename nbSeriesInChunk nbClusters randomize p_for_dissims
ab4a34ef	34
b8170623 JC	35	## ex. > mpirun -np 4 ./ppam.exe cluster ~/tmp/2009.bin 5000 200 1 2
b8170623 JC	36
ab4a34ef	37	Where :
311c5c07	38	nbProcess = number of simultaneous processes
ab4a34ef BA	39	ifilename = path to serialized dataset (read below)
	40	nbSeriesInChunk = number of time-series to process sequentially
	41	nbClusters = number of clusters
	42	randomize = 1 to dispatch time-series at random. 0 to process them in order
	43	p_for_dissims = the 'p' of L_p distance used to compute dissimilarities
	44
b8170623 JC	45
	46	The results are stored in ppamResult.xml (curves ids and ranks) while ppamFinalSeries.bin
	47	are the curves used in the last clustering step. The ranks in ppamResult.xml refer to the
	48	curves in ppamFinalSeries.bin
	49
	50
ab4a34ef BA	51	Note : custom [de]serialization. Consider writing your own
	52	in src/TimeSeries/ folder if you plan to test the package.
	53
	54	See also src/main.c for the details.