ANU Home | Search ANU
The Australian National University
Mathematical Sciences Institute (MSI)
Events - Special Year 2005
Printer Friendly Version of this Document

2005 Special Year on Computational Mathematics

Workshop on High-dimensional Approximation


The talks were held in the Manning Clark Centre, building #26a.
On Tuesday the talks were in JD101 of the John Dedman Building building #27.

Monday Tuesday Wednesday Thursday
Woźniakowski Hegland Smola Collins
Gerstner Newsam Chapelle Wildenhues
Holtz Roberts Garcke Hamaekers
Sloan Kohn McLachlan Tian
Waterhouse Koch Kowalczyk Nitsche
Kuo Pittelkow Griebel

Some of the presentations can be found in the list of abstracts.

Monday, 14th of February

Manning Clark Centre, Theatre T5

8:40 Registration
8:55 Markus Hegland Opening Remarks
Chair Ian Sloan
9:00 Henryk Woźniakowski
Columbia University and
University of Warsaw
Tractability of Multivariate Linear Problems for Weighted Spaces of Functions
9:45 Thomas Gerstner
Universität Bonn
Valuation of performance-dependent options
Part I: Framework and computation using sparse grid quadrature
10:30 Morning Tea
11:15 Markus Holtz
Universität Bonn
Valuation of performance-dependent options
Part II: Dimension reduction by orthant decomposition of hyperplane arrangements
12:00 Lunch break
Chair Thomas Gerstner
14:00 Ian Sloan
University of New South Wales
Numerical Integration in Hundreds of Dimensions - the Lattice Side of the Story
14:45 Ben Waterhouse
University of New South Wales
Randomly-shifted lattice rules on the unit cube for unbounded integrands in high dimensions
15:30 Afternoon Tea
16:15 Frances Kuo
University of New South Wales
Construction of semi-extensible lattice sequences for multivariate integration

Tuesday, 15th of February

John Dedman Building, JD101

Chair Michael Griebel
9:00 Markus Hegland
Australian National University
Concentration and Function Approximation
9:45 Garry Newsam
DSTO
Geometric Models for Graphs based on Finite Dimensional Embeddings
10:30 Morning Tea
11:15 Steve Roberts
Australian National University
Using Sparse Grids to Approximate Probability Density Functions
12:00 Lunch break
Chair Rick Beatson
13:30 Robert Kohn
University of New South Wales
Variable Selection and Model Averaging in Nonparametric Generalised Linear Models
14:15 Inge Koch
University of New South Wales
Dimension Reduction and Independent Component Analysis
15:00 Afternoon Tea
19:00 Workshop dinner at the Little Saigon

Wednesday, 16th of February

Manning Clark Centre, Theatre T5

Chair Geoff McLachlan
9:00 Alex Smola
National ICT Australia
Exponential Families for Estimation
9:45 Olivier Chapelle
MPI Tübingen
Feature selection for Support Vector Machines
10:30 Morning Tea
11:15 Jochen Garcke
Australian National University
Semi-supervised Learning with Sparse Grids
12:00 Lunch break
Chair Alex Smola
14:00 Geoff McLachlan
University of Queensland
On High-Dimensional Statistical Learning
14:45 Adam Kowalczyk
National ICT Australia
Learning Curves for Classification of Anti-Learning Data
15:30 Afternoon Tea
16:15 Yvonne Pittelkow
Australian National University
Challenges in the analysis of high dimensional gene expression data

Thursday, 17th of February

Manning Clark Centre, Theatre T6

Chair Garry Newsam
9:00 Michael Collins
Australian National University
Constructing molecular potential energy surfaces by interpolation
9:45 Ralf Wildenhues
Universität Bonn
Separable approximation for the Schrödinger equation
10:30 Morning Tea
11:15 Jan Hamaekers
Universität Bonn
Fast Fourier Transformation with hierarchical bases on Sparse Grids for the Schrödinger equation
12:00 Lunch break
Chair Markus Hegland
14:00 Tianhai Tian
University of Queensland
Discrete stochastic simulation and its applications
14:45 Andrej Nitsche
ETH Zürich
Tensor Product Approximation Spaces & Elliptic PDEs
15:30 Afternoon Tea
16:00 Michael Griebel
Universität Bonn
Sparse Grids
19:00 MSI colloquium dinner

Abstracts

After reviewing the basics of feature selection and Support Vector Machines (SVM), we will introduce a new method of feature selection for non-linear SVMs. The method is based upon finding those features which minimize the so-called radius / margin bound, which is an upper-bound on the leave-one-out error. This search can be efficiently performed via gradient descent. The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and real-life problems.


Michael Collins, Australian National University

Constructing Molecular Potential Energy Surfaces by Interpolation
Presentation (PDF)

The molecular potential energy surface governs the motion of the atomic nuclei for a molecule in an isolated electronic state. For a molecule of N atoms, this surface is a function of 3N-6 internal coordinates which determine the shape of the molecule. For molecules undergoing chemical reaction, the surface is a relatively complicated function of these many coordinates. Methods have now been developed which allow us to construct this surface as an interpolation of Taylor expansions of the surface around molecular configurations scattered throughout the accessible space.

At some molecular configurations two (or more) electronic states can become energetically degenerate. The motion of the atoms is then coupled with that of the electrons. We must then construct, by interpolation, surfaces for all the elements of a potential energy matrix which describes the atomic motion on all the surfaces and the forces which produce a change in electronic state. The problem is complicated by symmetry considerations and an unknown phase.


Jochen Garcke, Australian National University

Semi-supervised Learning with Sparse Grids
Presentation (PDF)

Recently several algorithms were proposed how to use unlabeled data in the learning phase of classification methods to improve the generalization properties. They exploit the geometry of the marginal distribution and learn the manifold formed by the data. One assumes here, that nearby data points have the same class, also known as the cluster assumption, which states that the decision boundary should not cross high density regions. The learning task is then formulated as a minimization problem in a Reproducing Kernel Hilbert Space and generalisations of Support Vector Machines are used to obtain the classifier.

In this paper, we apply the sparse grid combination method to the regularization problem. The problem is discretized and solved on a sequence of conventional grids with uniform mesh sizes in each coordinate direction. Thus the classifier is build on sparse grid points and not on data points. For standard classification problems it was shown that this approach scales only linear with the number of data to be classified. We will present first experimental results and discuss the complexity of this approach for the semi-supervised learning case. 


Performance-dependent options are financial derivatives whose payoff depends on the performance of one asset in comparison to other assets. The martingale approach yields a fair price of such options as a multidimensional integral whose dimension is the number of assets under consideration. Usually, the integrand is discontinuous which makes accurate solutions difficult to achieve by numerical approaches. However, in the case of performance-dependent options is possible to reformulate the problem such that a solution can also be achieved by integrating several multivariate normal distributions and thus smooth integrands. The number of of integration problems to be solved depends exponentially on the dimension, though. Still, for moderate-dimensional problems sparse grid quadrature methods can provide highly accurate option prices as is shown in part one of this talk.


Michael Griebel, Universität Bonn

Sparse Grids
Presentation (PDF)

The numerical treatment of high(er) dimensional problems suffers in general from the so-called curse of dimensionality. In special cases, i.e. for special function classes, this exponential dependence of O(n-r/d) of the achieved accuracy on the invested work n can be substantially reduced. Here, r denotes smoothness and d dimensionality. This is e.g. the case for spaces of functions with bounded mixed derivatives. The associated numerical schemes involve a series expansion in a multiscale basis for the one-dimensional problem. Then, a product construction and a proper truncation of the resulting d-dimensional expansion result in a so-called sparse grid approximation.
Here, depending on the respective problem and the 1-dimensional multiscale basis used, a variety of algorithms for higher dimensional problems result which allow to break the curse of dimensionality, at least to some extent, and result in complexities of the order O(n-rlog(n)α(d) . In special cases even α(d) = 0 can be achieved. This is possible if the error is measured in the H1 seminorm or if the different dimensions as well as their interactions are not equally important and dimension-adaptive strategies are used.
The constant in these order estimates, however, is still dependent on d. It also reflects subtile details of the implementation of the respective numerical scheme.

We discuss such sparse grid algorithms for the numerical treatment of integration problems, partial differential equations and data analysis.


Presentation (PDF)

The electronic Schrödinger equation forms the basis of most quantum-chemical computations. Here, a high-dimensional eigenvalue problem has to be solved. Due to the Pauli principle, the eigenfunctions which corresponds to fermions have to be antisymmetric. Recently, it has been shown that these wavefunctions possess certain square integrable mixed weak derivatives, which are important for approximation methods that are based on sparse grids or hyperbolic cross spaces (Yserentant, 2002). The talk will first discuss the Fast Fourier Transformation with hierarchical bases on sparse grids or hyperbolic cross spaces. Then, we demonstrate the application of this FFT method within a scheme to calculate the minimal eigenvalue which corresponds to an antisymmetric eigenfunction of a many-body periodic Hamilton operator.



Markus Hegland, Australian National University, Canberra

Concentration and Function Approximation
Presentation (PDF)

The curse of dimensionality is a serious challenge to the practical approximation of functions of many variables. Many special techniques to deal with this challenge have been developed, including radial basis functions and neural nets, additive models and sparse grids. While the effect of smoothness on approximation has been considered, the influence of high-dimensional geometry, in particular of the concentration, has been less studied.

A simple but stunning consequence (demonstrated by Levy in 1920) of the concentration of measure is that the function values of high-dimensional continuous functions are concentrated around the median value mf. Consequently, these functions are well approximated by the constant function f(x) = mf. In our research we investigated if this observation could be generalised to other than constant function approximation. Results were found for piecewise constant approximation, radial basis function approximation and additive modelling.

This is collaborative work with Paola Pozzi, University of Freiburg, and Vladimir Pestov, University of Ottawa.



In the second part of this talk we show how a dimension reduction of the multidimensional model can be achieved. In addition, a corresponding pricing formula for performance-dependent options is derived. However, the dimension reduction of the model initially does not imply a reduction in dimension and complexity for numerical valuation. A suitable dimension reduction leads to integration problems over so-called hyperplane arrangements in a lower- dimensional space. This reduces the number of cells to be integrated substantially. However, the topology of the cells changes from orthants to general polyhedra requiring much more involved integration rules. We show that any hyperplane arrangement can be decomposed into the same number of orthants as original polyhedral cells using suitable cell combinations. Furthermore, we give an almost optimal algorithm for the construction of an optimal orthant decomposition. Numerical results show the efficiency of the overall algorithm for the valuation of performance-dependent options also for high dimensional problems.



Inge Koch, University of New South Wales

Presentation (PDF)


Statistical learning from high dimensional data often requires the reduction to lower dimensional data as a first step of a deeper analysis. The choice of method for dimension reduction affects the performance of the subsequent analysis. Furthermore there is a compromise between preserving important information in the data and reducing the complexity of the data.

Motivated by real high dimensional data sets we examine the use of principal component analysis (PCA) and independent component analysis (ICA) as means of extracting lower dimensional data. A common problem of many dimension reduction methods or feature selection methods - including PCA and ICA -- is the choice of stopping criteria or critical dimension, that is, the dimension which provides the best compromise between dimension reduction and lack of loss of information. We propose some criteria for choosing the critical dimension and show how they work on real applications.



Adam Kowalczyk, National ICT Australia, Canberra

Presentation (PDF)


In this paper we introduce and analyse examples of supervised learning tasks, where machine learning algorithms yield good performance on the training set but worse than random performance on the independent test set. We call this phenomenon anti-learning. Examples of real life, biological classification tasks, are given for motivation, but analysis concentrates on synthetic examples. For some of these datasets anti-learning is both, demonstrated empirically and formally proven. The proofs hold for broad classes of linear and non-linear machines, including linear and kernel support vector machines, ridge regression, perceptron and k-nearest neighbours.



Frances Kuo, University of New South Wales


Integrals with hundreds or even thousands of dimensions occur in many practical applications. Lattice rules are a family of equal-weight quadrature rules which can be used to approximate these high-dimensional integrals. By now it is well established that for a given n, the total number of quadrature points, good lattice rules can be constructed component-by-component for integrands belonging to certain function spaces. However, the lattice rules constructed this way are not "extensible" in sense that if you wish to change the number of points you need to carry out the construction from scratch. In this talk I will start with a review of recent advances in this research area and then introduce a new algorithm for constructing semi-extensible lattice sequences.



Garry Newsam, Defence Science and Technology Organisation, Edinburg, SA
Presentation (PDF)

Studies of a wide variety of real networks (e.g. the Internet, citation lists or networks of friends) have identified a shortlist of key features that appear to identify their overall properties: sparsity (the number of edges is O(N) where N is the number of vertices); power law distributions of vertex degrees; and the "small world" effect (diameters of networks are typically o(logN). Recently a new class of random graph models has been proposed with the aim of reproducing these properties that is based on embedding the graph in some finite (often high) dimensional metric space. Embeddings of this nature are a form of multi-dimensional scaling and are an increasingly common phenomena: the talk will analyse their strengths and weaknesses in the context of network modelling.



Andrej Nitsche, ETH Zürich

Tensor Product Approximation Spaces & Elliptic PDEs
Presentation (PDF)


We discuss approximation theory for tensor product bases. For linear approximation (uniform and non-uniform sparse grid methods), the approximation spaces are well known to be (weighted) Sobolev spaces of mixed highest derivatives. In the nonlinear case (best N term approximation, adaptive approximation), the approximation spaces turn out to be q-tensor products of suitable Besov spaces Bqs(Lq), which might be coined Besov spaces of mixed highest derivatives.

In the second part of the talk, we apply these theoretical results to the numerical solution of elliptic partial differential equations. We discuss Besov regularity of solutions to elliptic PDEs in the tensor product setting as well as in the more standard setting using isotropically supported bases. The results are quite different, showing an unlimited degree of anisotropic Besov regularity (for the tensor product setting) but only a limited degree of isotropic Besov regularity (for the setting using isotropically supported basis functions) in dimension 3 and higher.



Yvonne Pittelkow, Australian National University

Presentation (PDF)



Steve Roberts, Australian National University

Presentation (PDF)

We present a method for estimating probability density functions in high dimensions based on the sparse grid space, which scales linearly with data size and also has a weak dependence on the dimension of the space.

In particular we present an investigation of discrete histosplines using sparse grids spaces as the discretisation space. We provide a convergence analysis, which shows that for smooth enough underlying probability density functions, the complexity of the space is O(log(h)d-1h-1) and the approximation error is O(log(h)d-1h2) $O(|\log(h)^{d-1} h2)$. The use of classical sparse grids allows us to practically deal with data sets with up to 15 dimensions.

The method is demonstrated on a 10 dimensional data collection containing information on vegetation cover and various geographical features. We use a sparse grid histospline to generate a predictive model of vegetation type in terms of geographical data.



Ian Sloan, University of New South Wales

Presentation (PDF)

Nowadays integration problems with very large numbers of variables arise in many applications, including mathematical finance. While Monte Carlo methods are available for problems with any number of integration variables, increasingly attention has turned to quasi-Monte Carlo methods. These have the appearance of Monte Carlo methods, but with the integration points chosen deterministically instead of randomly. This talk will review the history of quasi-Monte Carlo methods, with an emphasis on latice methods for non-periodic functions, finishing with recent step-by-step constructions of rules with very large numbers of points and hundreds of variables.



Alex Smola, National ICT Australia

Presentation (PDF)


Tianhai Tian, University of Queensland

Stochastic models have been developed recently for describing biological networks containing species with small molecular numbers. The stochastic simulation algorithm (SSA) is an essentially exact procedure for simulating the time evolution of biological reaction systems. However, it is difficult to apply the SSA to simulate biological networks including elementary reactions differing by many orders of magnitude in the number of molecules involved due to the huge computational time. In this talk I will first present the most recent progress in designing efficient numerical methods for improving the efficiency of the SSA. The application of these methods will be also discussed for genetic regulatory networks, ecological systems and epidemic systems.


Ben Waterhouse, University of New South Wales

Presentation (PDF)

We study the problem of multivariate integration on the unit cube for unbounded integrands. Our study is motivated by problems in mathematical finance, where unbounded integrands can arise as a result of using the cumulative inverse normal transformation to map the integral to the unit cube. We define a new space of functions which possess the boundary behavior of those unbounded integrands arising from finance applications. Our new function space is a weighted tensor-product reproducing-kernel Hilbert space. We carry out a worst-case analysis in this space and show that good randomly-shifted lattice rules can be constructed component-by-component to achieve a worst-case error smaller than the worst-case Monte Carlo error. Numerical experiments indicate that our lattice rules are reasonably robust and perform well in pricing Asian options.


Ralf Wildenhues, Universität Bonn

Presentation (PDF)

Recently, algorithms have been proposed to approximate functions with low separation rank in high dimensions. These methods provide the opportunity to overcome the curse of dimensionality, provided that the involved functions and operators can be well-represented with low rank. We review some of these methods and try to extend them to a weak formulation, where the goal is to ultimately apply them to the Schrödinger equation.


Henryk Woźniakowski, Department of Computer Science, Columbia University, and Institute of Applied Mathematics, University of Warsaw

Presentation (PDF)

Multivariate linear problems for spaces of functions of many variables d occur in many applications. Examples of such problems include integration, approximation, and the solution of differential equations. The number d of variables is sometimes in the hundreds or thousands as it is the case for some problems in financial mathematics.

Tractability of multivariate problems has been intensively studied in recent years. This concept is defined in terms of the minimal number n(ε,d) of function values needed to compute an ε-approximation in the worst case or in the average case setting. Tractability means that n(ε,d) can be bounded by a polynomial in ε-1 and d. Strong tractability means that n(ε,d) is independent of d and polynomially dependent on ε-1.

For many classical spaces all variables play the same role, and n(ε,d) depends exponentially on d. This is called the curse of dimensionality, and leads to intractability. The first such an example was given by Bahkvalov in 1959 for multivariate integration of r times continuously differentiable functions. This is also the case for multivariate integration for tensor product Sobolev spaces, and this problem is strictly related to the L2-discrepancy.

To vanquish the curse of dimensionality, we need to treat variables of functions with diminishing importance. This leads to weighted spaces of functions in which the influence of each variable or a group of variables is moderated by the corresponding weight.

In many applications, although d is huge, functions can be well approximated by sums of functions that depend on groups of just a few variables up to a given order k, with the order defined as the number of variables in a group. We can model this situation by finite-order weights.

Necessary and sufficient conditions on weights to obtain tractability or strong tractability of linear multivariate problems have been obtained. In particular, for finite-order weights we have tractability or even strong tractability of many multivariate problems. This holds even in the worst case setting for multivariate integration for tensor product Sobolev spaces. Tractability bounds can be achieved by shifted lattice rules with generators computed by the component-by-component algorithm, and by well-known low discrepancy sequences such as Halton, Sobol and Niederreiter sequences. For other multivariate problems such as approximation and the solution of differential equations, tractability bounds are achieved by weighted Smolyak-type algorithms.