Header image   UIS  
for Data-Intensive Analytical Systems    
line decor
  
line decor
 
 
 
 

 
 
[
Frequently Asked Questions
]
 
 

How does this project relate to current issues?

The development of data grids, which will form a broad information-technology infrastructure for scientific research, is viewed as one of the most important undertakings in computational sciences today. As a generic platform for data-intensive scientific computing, this infrastructure will provide a bridge between two types of technologies that will have significant impact on the future scientific research: data mining and data storage. Data mining is emerging as the primary way of analyzing large sets of multi-dimensional scientific data. Data storage technology must provide high-performance access to this data regardless of the storage media and the way it is interconnected.

Briefly describe the problem and your approach.

In this project, we are concerned with the development of data storage systems appropriate for data-intensive scientific computing, as in high-energy physics, molecular biology, environmental sciences, and astronomy. Our approach is based on certain principles of organizing and accessing multi-dimensional data on the storage. In the pursuit of these principles, the data storage system must acquire a greater knowledge about the data. This effectively means that the storage system appropriate for data-intensive analytical computing must itself incorporate elements of data mining. With advanced indexing and clustering techniques for multi-dimensional data and practical methods of dealing with incomplete data, the data storage system can achieve high-performance data access and provide applications useful insights into the data.

What are the goals of the research?

This project has four goals:

  1. Enhance the infrastructure for scientific research by developing new indexing and clustering techniques for efficient storage and analysis of multi-dimensional data;
  2. Establish a scientific basis for confidence in the techniques and in some principles of organizing and accessing multi-dimensional data on the storage;
  3. Advance the knowledge and technologies required for the development of high-performance intelligent storage systems; and
  4. Provide a useful paradigm, in the form of a prototype system, for achieving tighter integration between data storage and data mining technologies.

Briefly describe the techniques you are developing.

The basic processes of the prototype data engine we are developing include: data clustering, space partitioning, initial and incremental data loading, and data retrieval. To support these processes, the system employs new retrieval and clustering techniques designed to satisfy two sets of requirements -- of the storage system and data-mining applications. The techniques are designed to operate in high-dimensional spaces without dimensionality reduction, which is achieved through the application of our new space-partitioning schemes.


©2007 Dstar