Data to Download:
 Indexing Data with Missing Values  data sets that were used to test the GammaEXT and GammaAPX techniques for indexing data with missing values.
 Clustering Data  data sets that were used to test the GARDENHD clustering technique in multidimensional spaces.
The data sets provided below have been used to test the GARDENHD clustering technique for multidimensional data. The experiments are described in the paper "Clustering HighDimensional Data Using an Efficient and Effective Data Space Reduction". The data sets included below are normalized and formatted according to the GARDENHD input format (see GARDENHD User Manual ).
DOWNLOADS:
 Download the real data set: covtypeNorm.zip
 Download the synthetic data set: animalsNorm.zip
 Download the synthetic data generator: DataGenerator.zip
(Unzip this folder and double click DataGenertor.bat to generate sets of synthetic data)
More detailed information about the data sets:

The real data set “covtype” is obtained from the UCI Machine Learning Repository (www.ics.uci.edu/~mlearn/MLRepository.html). It has 581,012 points with 54 dimensions (the 55th dimension records the class information of objects). There are 7 classes in this data set, each of which represents one type of tree. 
 Synthetic Data Set
"animals"

The synthetic data set “animals” with 500,000 points is produced by the “animals.c” program obtained from the UCI Machine Learning Repository (www.ics.uci.edu/~mlearn/MLRepository.html). This data set has 72 dimensions (the 73rd dimension records the class information). There are 4 classes in this data set, each of which represents one type of animal. 
 Synthetic Data Sets
"CenterCorners"

One group of 10 synthetic data sets with 100,000 points and varying dimensionality from 10 to 100. The other group of 5 synthetic 10dimensional data has varying number of points from 100,000 to 500,000. All data sets in this group have “CenterCorners” distribution, in which one generated hyperrectangle is placed in the center and others in 10 different corners of the space (origin, far corner, and 8 randomly selected corners). All generated hyperrectangles have the same density. Moreover, each of the hyperrectangles has uniform internal distribution and represents a different class of data. Thus, each point is assigned to one of 11 classes. These synthetic data sets were produced by the DataGenertor.bat program provided above. 
List of Data Sets Produced by the Generator:

Normalized and cleaned set of real data "covtype" (100MBuncompressed). 

Normalized and cleaned set of synthetic data "animals" (223MBuncompressed). 

Normalized synthetic set of 100,000 points in 10 dimensional space with "CenterCorners" distribution (9MBuncompressed). 

Normalized synthetic set of 100,000 points in 20 dimensional space with "CenterCorners" distribution (18MBuncompressed). 

Normalized synthetic set of 100,000 points in 30 dimensional space with "CenterCorners" distribution (28MBuncompressed). 

Normalized synthetic set of 100,000 points in 40 dimensional space with "CenterCorners" distribution (37MBuncompressed). 

Normalized synthetic set of 100,000 points in 50 dimensional space with "CenterCorners" distribution (46MBuncompressed). 

Normalized synthetic set of 100,000 points in 60 dimensional space with "CenterCorners" distribution (55MBuncompressed). 

Normalized synthetic set of 100,000 points in 70 dimensional space with "CenterCorners" distribution (64MBuncompressed). 

Normalized synthetic set of 100,000 points in 80 dimensional space with "CenterCorners" distribution (73MBuncompressed). 

Normalized synthetic set of 100,000 points in 90 dimensional space with "CenterCorners" distribution (82MBuncompressed). 

Normalized synthetic set of 100,000 points in 100 dimensional space with "CenterCorners" distribution (91MBuncompressed). 

Normalized synthetic set of 100,000 points in 10 dimensional space with "CenterCorners" distribution (9MBuncompressed). 

Normalized synthetic set of 200,000 points in 10 dimensional space with "CenterCorners" distribution (19MBuncompressed). 

Normalized synthetic set of 300,000 points in 10 dimensional space with "CenterCorners" distribution (28MBuncompressed). 

Normalized synthetic set of 400,000 points in 10 dimensional space with "CenterCorners" distribution (38MBuncompressed). 

Normalized synthetic set of 500,000 points in 10 dimensional space with "CenterCorners" distribution (47MBuncompressed). 
