Skip Navigation | ANU Home | Search ANU
The Australian
National University
Mathematical Sciences Institute (MSI)
Advanced Computation and Modelling Program
Printer Friendly Version of this Document

Computational Aspects of Data Mining

Dr Markus Hegland

Description of Project

Data mining is a new area of research, applying methods from statistics, machine learning and computational mathematics to the analysis of very large data sets. Some of the main challenges in data mining include
  • High dimensional data
  • Large number of data points (trillions)
  • Ill-defined goals of searches
We have access to very large data sets and collaborate with several institutions like the NSW Health Department, the HIC, NRMA and the Taxation Office. The honours projects will deal with the computational challenges which are connected with finding information in such large data sets.
The aim of the projects is to develop, analyze, and implement data mining algorithms which can extract useful information from very large data sets. The data size may be in the order of gigabytes or even terabytes. Particular requirements for the algorithms are scalability with respect to the number of data points even in the event of high dimensionality.
Depending on the interests of the students the work can have components of
  • experimental exploration of computational performance using both real and simulated data
  • development and analysis of algorithms and software
  • mathematical error and performance analysis
Of particular interest to our group is the use of Sparse grids for Data Mining.

Requirements

The student should have completed Math2320 Analysis I and basic computational modelling or mathematical background.