[epclust.git] / code / README

0) Download & compile Benjamin's specific library

   git clone git@auder.net:cgds
   cd cgds
   bash makeMakefile.sh src
   make src
   sudo make install

Make sure that the install destination is on the LD_LIBRARY_PATH environment variable.

1) Compile source code for 1st stage clustering

   mkdir -p build/stage1/src
   cd build/stage1/src
   cmake ../../../stage1/src
   make

2) #repeat previous lines for stage 2 ???


NOTA: Need to have openmpi, mpich (compiler for mpi) libxml, and libgsl installed.


Usage (stage 1) :

Serialize input data using 

   ppam.exe serialize inputfile_edf outputfile_edf 1 0 

# 1 indicates data is by column
# 0 means process all the rows

   mpirun -np nbProcess ppam.exe cluster ifilename nbSeriesInChunk nbClusters randomize p_for_dissims

## ex. > mpirun -np 4 ./ppam.exe cluster ~/tmp/2009.bin 5000 200 1 2 

Where :
   nbProcess = number of simultaneous processes
   ifilename = path to serialized dataset (read below)
   nbSeriesInChunk = number of time-series to process sequentially
   nbClusters = number of clusters
   randomize = 1 to dispatch time-series at random. 0 to process them in order
   p_for_dissims = the 'p' of L_p distance used to compute dissimilarities


The results are stored in ppamResult.xml (curves ids and ranks) while ppamFinalSeries.bin
are the curves used in the last clustering step. The ranks in ppamResult.xml refer to the
curves in ppamFinalSeries.bin


Note : custom [de]serialization. Consider writing your own 
in src/TimeSeries/ folder if you plan to test the package.

See also src/main.c for the details.
Commit	Line	Data
	1	0) Download & compile Benjamin's specific library
	2
	3	git clone git@auder.net:cgds
	4	cd cgds
	5	bash makeMakefile.sh src
	6	make src
	7	sudo make install
	8
	9	Make sure that the install destination is on the LD_LIBRARY_PATH environment variable.
	10
	11	1) Compile source code for 1st stage clustering
	12
	13	mkdir -p build/stage1/src
	14	cd build/stage1/src
	15	cmake ../../../stage1/src
	16	make
	17
	18	2) #repeat previous lines for stage 2 ???
	19
	20
	21	NOTA: Need to have openmpi, mpich (compiler for mpi) libxml, and libgsl installed.
	22
	23
	24	Usage (stage 1) :
	25
	26	Serialize input data using
	27
	28	ppam.exe serialize inputfile_edf outputfile_edf 1 0
	29
	30	# 1 indicates data is by column
	31	# 0 means process all the rows
	32
	33	mpirun -np nbProcess ppam.exe cluster ifilename nbSeriesInChunk nbClusters randomize p_for_dissims
	34
	35	## ex. > mpirun -np 4 ./ppam.exe cluster ~/tmp/2009.bin 5000 200 1 2
	36
	37	Where :
	38	nbProcess = number of simultaneous processes
	39	ifilename = path to serialized dataset (read below)
	40	nbSeriesInChunk = number of time-series to process sequentially
	41	nbClusters = number of clusters
	42	randomize = 1 to dispatch time-series at random. 0 to process them in order
	43	p_for_dissims = the 'p' of L_p distance used to compute dissimilarities
	44
	45
	46	The results are stored in ppamResult.xml (curves ids and ranks) while ppamFinalSeries.bin
	47	are the curves used in the last clustering step. The ranks in ppamResult.xml refer to the
	48	curves in ppamFinalSeries.bin
	49
	50
	51	Note : custom [de]serialization. Consider writing your own
	52	in src/TimeSeries/ folder if you plan to test the package.
	53
	54	See also src/main.c for the details.