|
Developed Software to Download:
- DStar - Storage and retrieval system for data-intensive analytical computing.
- GARDENHD - Software package for multi-dimensional space reduction and data clustering.
- MDI Simulator - Software package with multi-dimensional access methods based on superimposed space partitioning: a) a static pre-partitioning of the space using the Gamma, Theta, or Pyramid Technique; and b) dynamic sub-partitioning of the space using B+-trees, KDB-trees, or R-trees.

| |
SOFTWARE |
GARDENHD |
| |
PLATFORM |
Microsoft Visual Studio 6.0 - Microsoft Windows XP |
| |
IMPLEMENTED |
Ying Lai |
| |
EMAIL |
laiying@iit.edu |
| |
SCHOOL |
Illinois Institute of Technology |
| |
DEPARTMENT |
Computer Science |
| |
ADVISOR |
Dr. Ratko Orlandic |
| |
DATE |
April 15, 2005 |
DOWNLOADS:
GARDENHD is an efficient clustering algorithm for high-dimensional data organized around the notion of data space reduction, i.e. the process of detecting dense areas (dense cells) in the space. The algorithm performs effective and efficient elimination of empty areas that characterize lage and typically sparse high-dimensional spaces. Besides data space reduction, the algorithm employs an efficient adjacency-connected agglomeration of dense cells into larger clusters. GARDENHD is a hybrid of cell-based and density-based clustering. However, unlike typical clustering methods in its class, it applies a recursive partition of sparse regions in the space using a new space-partitioning strategy that greatly facilitates the process of data space reduction.
Effective data space reduction implies the ability to represent compactly the essential properties of the data distribution. Even in high-dimensional spaces, the compact representation of data produced by GARDENHD can preserve all essential properties of data, including the locations, shapes, and densities of clusters. As a result, many data mining processes can be performed efficiently on this representation alone. By providing a good insight into data distribution, GARDENHD can also facilitate the process of organizing data on storage in a way that supports various kinds of retrieval. It can enable a close-to-optimal assignment of data to pages and a significant reduction of the search space before accessing persistent storage.
|