Work Address

Home Address

School of Mathematical Sciences

1 Weld Street

Australian National University

Yarralumla

Canberra

A.C.T. 2600

A.C.T. 0200

Australia

Work URL

Home URL

http://wwwmaths.anu.edu.au/~mike/

http://mngo.members.veritel.com.au/public/

Mike.Osborne@maths.anu.edu.au

                mikeandgloria@gmail.com

Office phone

Home phone

(061)-(02)- 6125 3449

(02)- 6281 1648

                Australian Mathematical Society, Emeritus member American Mathematical Society,

1957-1959

Scientific Officer, Royal Australian Navy.

1959

Research Student, Imperial College, London University

1960-1962

Assistant Lecturer and Lecturer in Mathematics, Reading University

1962-1963

Lecturer in Mathematics, Imperial College.

1963-1965

Assistant Director, Computer Unit, Edinburgh University.

1966-1976

Head, Computer Centre, Australian National University

1971

U.K. Science Research Council Senior Visiting Fellow at University of Dundee and Visiting Professor, Computer Science Department, Stanford University.

1976-1977

Director, Computer Centre, Australian National University.

1977

      U.K. Science Research Council Senior Visiting Fellow, DAMTP,

Cambridge University.

1977-1980

Head, Computing Research Group, Australian National University

1980

U.K. Science Research Council Senior Visiting Fellow, Mathematics Department, Manchester University

1980-1991

Professorial Fellow, Department of Statistics, Institute of Advanced Studies, Australian National University

1992-

Professor, Statistics Research Section (incorporated in Centre for Mathematics and its Applications March 31,1993), School of Mathematical Sciences, Australian National University

1981-1982

Visiting Professor, Mathematics Department, University of Utah

1983

Visiting Professor, Mathematics Department, University of Western Australia.

1984-1985

Visiting Professor, Department of Mathematics and Statistics, University of New Mexico.

1989

Visiting Professor, Department of Statistics, Texas A&M University.

1990-1999.

Convenor, Program in Advanced Computation, School of Mathematical Sciences, Australian National University

1994

Visiting Professor, Departments of Computer Science and Mathematics, Texas A&M University.

1998-1999

Acting Head, Centre for Mathematics and it's Applications

2000

Visiting Fellow, Centre for Mathematics and it's Applications

2002

Visiting Professor, I.W.R., Heidelberg

Research and Development Grants

Research Activities

  1. Use of wrap-around partitioning to vectorize the solution of block bidiagonal matrices

  2. Inverse iteration for the solution of eigenvalue problems

  3. Parallel numerical libraries

  1. Collocation methods for the discretization of boundary value problems

  2. Factorisation of operators, dichotomy, and the stability of computations

  3. Cyclic reduction and its relation to representation theorems

  4. Estimation of ODE's, variable selection, and the suitability of bases

  5. Kalman filters, stochastic ODE's, and smoothing problems

  6. ODE eigenvalue problems, stability, bifurcation, and inverse problems

  1. Fisher's method of scoring for maximising likelihoods

  2. Convergence rates in large sample problems.

  3. Rank regression and l1 estimation

  4. Robust methods

  5. Methods of variable selection

  6. Trust region and (piecewise linear) homotopy methods

  1. Structure functional representation

  2. Simplicial methods for minimization

  3. Resolution of degeneracy

  4. Polyhedral constrained optimization

  1. Methods for constrained and unconstrained problems

  2. Barrier and penalty methods

  3. Interior point methods

  1. Event oriented model descriptions

  2. Object oriented programming implementation

 

Major Research Contributions

The standard method involved contour integration and the assumption of constant layer properties. The spectral approach has the advantages that asymptotic properties can be used to deduce the spectral density, ambiguities in contour selection are avoided, continuous layer dependence is easily treated, and convenient formulae are available for computing associated quantities such as group velocities. The discrete spectrum explains sound channelling, while the continuous spectrum can be interpreted by geometric optics as an outgoing spherical wave corresponding to rays that are not channelled [5], [7]. The approach can also be used in inverse problems [53], [64], [66], [149]. The problem that appears feasible is estimating the velocity depth dependence given observation of group velocities of individual modes (eigensolutions) as functions of frequency. This raises an interesting question. The parametrization of the velocity depth curve is a parametrisation of its spacial dependence, while the fitting is done in the frequency domain after this parametrization has been filtered by the differential system. Under what conditions is independence of the parameters preserved? Current investigations indicate that if p parameters are to be estimated then independent observations are required on at least p distinct modes. This condition combines with considerations of stochastic convergence which then require the number of observations on each individual mode to grow without limit [135].

This work was stimulated by the late Professor Bickley who derived the basic formalism used. My most important contribution was a characterization of the optimal overrelaxation parameter for the successive overrelaxation iterative method as the solution of a multiparameter eigenvalue problem. An effective method for solving this problem was also given. It is believed to be the first algorithm for this class of problem. [9], [16], [17], [18], [25].

This formulation of a class of iterative methods based on Newton's method was one of the first applications of Wilkinson's results on the stability of inverse iteration. [7], [13], [23]. It includes within its scope just about all superlinearly convergent methods for finding or accelerating convergence to a single eigenvalue so it provides a generalisation of estimates like the Rayleigh quotient to general (even nonlinear) eigenvalue problems. The first application considered was to discretizations of second order differential equations that permitted uniformly good estimates to be found for relatively large numbers of eigenvalues, and this led to nonlinear problems. Another early application was to compute the neutral curve and curves of time amplification for studies of the hydrodynamic stability of flow over a flat plate [23]. My student Ralph Jordinson studied the related space amplification problem, a nonlinear eigenvalue problem, by the same methods in his Ph.D thesis. Application to the inverse eigenvalue problem is considered in [53]. The achievable rates of convergence (often>2) are explored in [84]. Methods suitable for large scale problems on parallel and vector computers are treated in [144], [163], [167], [168], [174], [175]. A method with rate 3.56 and using a deflation technique applicable with very general weighting matrices is given in [167]. This method has been implemented using wrap-around partitioning under the Area 4 contract with Fujitsu.

I don't think that anything like this - the use of the computer to develop the discretization of the differential system - had been considered before. I was invited to do the development at A.E.R.E. Harwell. They even punched up the tapes for the Mercury computer [11]. The main modern codes (eg COLNEW) can show a direct line of development to this precursor - for example, in the choice of local basis. These codes are the principal methods now used for the solution of boundary value problems. I have been using one of the explicit original formulae (an O(h4) accurate formula for first order systems) to redo some hydrodynamic neutral curve calculations to illustrate a vectorizing block bidiagonal eigensolver using the 3.56 algorithm developed for Fujitsu [174], [175].

The key question asks how to pick the collocation points? This paper was the first to establish the fundamental result that the optimal choice of collocation points are Gaussian points for quadrature with respect to suitable weight functions [22], [69]. For first order systems these are the Legendre points. This choice is central to the modern developments of collocation codes. This latter reference was the first to identify noncompact collocation schemes which would appear to provide a collocation analogue to upwinding for singular perturbation problems[148].

Shooting methods apply initial value techniques by guessing an initial condition, integrating forward to check if the boundary conditions are satisfied, and adjusting the initial guess if needed. The main shortcoming of the method is instability in the initial value problem. The key ideas in multiple shooting are: construct a valid discrete analogue of the differential system by constructing a sequence of local fundamental bases for the solution (instability can be controlled because the calculation is local), and then use a stable method for solving the resulting linear system. The variant `compactification’ suggested by some other workers was explicitly and correctly rejected. I was able to solve satisfactorily all the problems then reported as difficult in the literature on shooting methods using only single precision (32 bits) on an IBM 360/50 [40]. The correct order of the condition number is estimated in [67] where connections with finite difference methods and collocation are established.

The original interest was in constructing best approximations to functions by nonlinear families with the aim of developing subroutines for evaluating special functions [20]. Linear programming can be used in formulating these problems on discrete point sets in the linear case[24]. Watson and I were the first to use it iteratively to construct nonlinear approximations [34], [35], [52], and the algorithm is often referred to under our names. Second order convergence was first demonstrated in [54]. The most complete description of the convergence properties in the l¥ norm is given in [91]. A key result is that the useful sufficient condition of strong uniqueness is not necessary. The dependence of the rate of convergence of these generalised Newton schemes is analysed as a function of the norm used in [83], [91], [95].

This result [75] partly explains the observed good behaviour of this algorithm forms of which provide the most widely used algorithms for nonlinear least squares problems. The rest of the story comes later [115], [135]. The published code became the basis of Allan Miller’s excellent package.

Methods of convergence acceleration for barrier methods are discussed in [48], [59]. It is shown that barrier functions giving arbitrarily fast convergence in r are possible. For the case of the log barrier function the exact rate of convergence is shown to be order r 1/ 2 in the case of non-strict complementarity [87]. In [90] it is shown how to restore order r convergence in these cases. The procedure is applied to extrapolation methods of convergence acceleration.

Godonov suggested the stabilized march in 1943 – a most surprising algorithm to come out of wartime Russia. It became popular in computer codes in the '60's and '70's. This first stability analysis used a backward argument to show that if multiple shooting was stable then so was the stabilized march [88], [89]. The method is important because it reduces significantly the number of integrations of the differential system required in comparison with those required by multiple shooting in the separated boundary conditions case.

This is highly original work. We were able to capture the essential geometry of the process, and in an important special case to reduce the asymptotic behaviour of the Newton iterates to the study of a simple recurrence relation. The conclusion was that methods using only first derivatives would potentially be unsuitable at irregular singular points. Higher order methods are needed if there is to be a progression from theory to practice. These methods need to imbed the nonlinear system into an overdetermined but consistent system that is built using higher order tensors constructed using higher order derivatives of the original system. I suspect this explains in large part why Griewank was motivated to go on and develop automatic differentiation systems! [94], [99].

  1. Solution of the Huber M-estimation problem for the complete range of the scale parameter c classifying residuals as large or small [105] (the method depends only on the product s c so strategies for variance estimation are also available);

  2. Computation of all regression quantiles as a function of the quantile parameter [136];

  3. Implementation of Tibshirani's Lasso in which a sum of squares is minimized subject to an l1 constraint with the constraint parameter controlling the number of variables entering the least squares problem. Here the constraint bound is the homotopy parameter [182]. Recent work at Stanford predicts that the entire range of the homotopy parameter can be covered by few more steps than the number of variables. Thus this approach is extremely competitive. Other applications have led to the suggested use of other objective functions such as the l_1 norm. Turlach and I have shown that a homotopy approach is feasible. But that it has the interesting property that continuation must be applied alternately to the constraint bound and the Lagrange multiplier. The resulting algorithm, in this form, does not share the computational efficiency noted for the least squares objective and has much of the character of a small step l_1 minimization algorithm. Thus it becomes of interest to seek a large step mechanism for the l_1 objective homotopy analogous to the use of a line search in l_1 optimisation.

  4. Piecewise linear homotopy can be applied also to Vapnik's support vector regression. Here the best strategy is probably to use an application of an active set algorithm to start the computation.

The interesting question in the implementation of the homotopy is what happens at the join of adjacent linear pieces, with special considerations being necessary in each of the above cases to show continuity. The algebra is not unlike that of the pivoting steps in linear and quadratic programming. This approach extends to optimization of a positive definite quadratic form subject to a polyhedral constraint where now the parameter in the Lagrangian formulation becomes the homotopy parameter. The Lasso has suggested a fresh look also at trust region and Levenberg algorithms for nonlinear least squares [179].

Degeneracy now seems an odd problem to have bedeviled much of the theoretical and practical work on simplicial methods for linear programming! But the understanding necessary is quite recent, and is built on my resolution of the problem [103]. My idea was to connect the ability to make progress in the simplicial algorithm with the construction of a descent direction for a reduced problem. Such a direction exists if and only if the current point is not optimal and so provides a complete characterisation. It is an important step to identify the direction as a direction of recession for the reduced problem (there is a sense in which this observation goes back to Wolfe and this is noted in [120]), and such directions are independent of the constraint right hand sides. Constructive methods are suggested immediately and have minimal cost implications. The basic result generalises to the problem of resolving degeneracies in minimizing polyhedral functions [103], [137], [139], [141].

These new approaches apply convex analysis to the problem of characterizing extreme points of the epigraph of a polyhedral convex function. They have made possible practical algorithms for important problems previously considered intractable [103], [106], [187]. Previous attempts had been based on the idea of transforming the problem into a linear programming problem, but this can have exponentially large numbers of constraints (or worse) in the obvious approach. Sets of structure functionals are used to define vertices of the epigraph in a parsimonious fashion, and simplicial methods generate descending paths along edges linking adjacent vertices. Rank and signed rank regression, estimation methods of remarkably high statistical efficiency, in which discontinuities in derivative are linked to ties in a ranking set, provide the most complicated examples so far attempted. Implementation is considered in [187]. This reference considers aspects of nonconvex problems (for example, rank regression with redescending scores to increase robustness), and shows that simplicial methods can be surprisingly computationally robust in this more complex situation. An extension to develop active set methods for functions which are the sum of a smooth convex function and a polyhedral function has been made with Alistair Watson. This work is an extension of the descent algorithm for the Lasso.

Factorizing boundary value problems into reduced systems with appropriate stability properties has been a recurrent theme (for example, in the work on so called invariant imbedding methods). The use of the Riccati equation which is a key aspect of this work was known to be problematical (there is an analogy with the elimination solution of linear equations without interchanges). The key result here leads to a computational strategy that has analogies with the use of interchanges to stabilize the linear equations case [107], [122], [123].

The numerical analysis literature is significantly deficient in its description of the limitations of these methods precisely in the context in which they are most important. Small residuals are not important in data analysis practice! What is important is the right kind of cancellation, and the right tool for describing this phenomenon is the law of large numbers [113], [115], [135], [158], [202]. The results give order of magnitude better results than previously known in the case that the assumed model is correct and the amount of data is adequate. For example, the linear least squares problem has asymptotically a rounding error dependence on the condition number (not the condition number squared) as the number of observations grows without bound. Under the same conditions the convergence rate of the nonlinear algorithms is asymptotically second order.

Rescuing Prony's method in the case of trigonometric estimation comes down to picking a uniquely determined scaling condition, and this turns the algorithm into an eigenvalue problem (Pisarenko’s method). A related nonlinear eigenvalue problem provides the maximum likelihood estimator. Computation of this problem in the exponential fitting case can be simplified without loss of eficiency provided a correct scaling condition is implemented. This algorithm shows promise also in trigonometric estimation problems when not only can the scaling conditions be relaxed, but also it proves reliable in cases where maximum likelihood methods have problems in locating the consistent estimator in a cluster of many competing maxima [70], [134], [140], [153]. The pattern that emerges of the trigonometric problem being easier than the exponential one is noteworthy. The methods apply whenever the basis functions satisfy a difference equation with constant coefficients (these become the problem unknowns). However, the methods appear significantly more effective in the case of exponential and trigonometric functions than in other cases (rational functions being one example). The reason for this is still unclear.

Cyclic reduction is the classic technique for applying recursive halving in the solution of tridiagonal linear systems. Up til now this has been the most widely used technique for vectorizing these systems. Restrictive conditions (no pivoting or orthogonal reduction is allowed so usually the matrix is required to be positive definite and system order a power of two) are imposed. Markus Hegland introduced wrap-around partitioning - a more general reordering transformation - for a one-step reduction of tridiagonal systems to improve vectorization while using stabilized factorization techniques. I looked at the corresponding transformation for block bidiagonal systems and discovered the restrictions just melted away. In this context it contains cyclic reduction as a special case. We believe the method will become the standard method for vectorizing general, regularly structured, narrow band linear systems [172]. These can readily be transformed into block bidiagonal form. The method is already in the standard library on Fujitsu supercomputers, and routines to automatically translate band matrices into block bidiagonal form will be added this year.

This is very exciting! Why should cyclic reduction (a purely algebraic device) applied to discretized ordinary differential equations connect with a whole class of representations of solutions corresponding to boundary value problems of twice the order of the original system of differential equations? Twice the order may not sound necessarily positive, but the process has the important advantages that the resulting class includes stable problems with known dichotomy structure, and, of course, with solutions that can be computed as a by-product of the cyclic reduction procedure [163].

The first problem with the embedding method is that of choosing  boundary conditions that provide a stably posed boundary value problem. A scheme for  choosing essentially optimum conditions automatically has been formulated [196,204]. The embedding method is almost a standard maximum likelihood estimation problem. The only difference is that the likelihood values can only be computed approximately by integrating the differential equations. However, the error incurred here is small compared with that arising from stochastic sources and is readily easily accounted for. This permits consistency of the estimates to be demonstrated [205]. Gauss-Newton is the preferred method for solution, and fast convergence can be demonstrated on large sample problems. The embedding method has the advantage that it can make use of standard differential equation and nonlinear least squares software without major modification. It has a computational cost in that the differential system must be solved for every function value required. The simultaneous method  avoids the somewhat arbitrary choices of boundary data, also avoids repeated differential equation solution, but is otherwise an apparently a more complex mathematical problem. However, equivalence of local optima with those of the embedding method is shown readily enough [204]. An iteration suggested by Georg Bock, which is analogous to Gauss-Newton in omitting calculation of second derivatives of the differential equation forcing functions, has been shown to have a similar large sample convergence rate to Gauss-Newton but under more restrictive conditions on the distribution governing the measurement errors. The equivalence results have been extended by deducing the necessary conditions on the embedding method from the Kuhn-Tucker conditions satisfied by the simultaneous method. This enables the consistency results for the estimation problem to be put in a more pleasing form.

Research Exposition

Methods for unconstrained optimization problems [29]: Kowalik invited me to join him in authoring a text on unconstrained optimization problems. It was clearly a good time to bring together the burgeoning developments in this area (the famous Fletcher Powell paper was the most cited mathematics reference according to the Library at the Rutherford Laboratory). At the time I had developing interests in least squares problems but had no other special qualifications. In the end I finished up writing the text and contributing substantially to other areas. I was pleased with my chapter on direct search methods and referees agreed. The book was remarkably successful.

Lecture notes on optimization [59]: One consequence of the unconstrained optimization book was an invitation from George Forsythe to give a graduate course on optimization at Stanford. The optimization area had moved on, and I had developed an interest in applications to penalty and barrier methods. Thus these notes provided an opportunity to bring up to date the material in the earlier book. It was also positioned much closer to the frontiers of current research. I was able to develop rate of convergence estimates for barrier and penalty algorithms, to address the question of the relation between the structure of barrier methods and their rate of convergence, to base the treatment of conjugate direction algorithms for unconstrained optimization on the product form updating methods, and, with Michael Saunders, make a detailed study of the use of smart updating (surgery) in the implementation of the product form methods [58]. Students of the course who have gone on to distinguished careers in the area include Margaret Wright and Michael Saunders. Forsythe requested the notes be published as a technical report. It would have been very suitable for publication on the web.

Finite Algorithms in Optimization and Data Analysis [103]: This book went through an important phase of its evolution as a graduate course at the University of Utah. The original ideas had been to develop a treatment of algorithms for minimizing polyhedral functions (linear programming is the best known of these problems) based on modern methods of convex analysis, and to exploit systematically the idea that tableau data structures could be used in implementing generic versions of stabilized algorithms such as Bartels-Golub and projected gradient methods. However, the challenge to develop a general formalism for treating the necessary conditions in these optimization problems, motivated in particular by the difficult rank based estimation problems, led me to the concept of structure functionals and the associated compact forms for the subdifferential which is where the convex analysis enters as a unifying technique. A coherent treatment of degeneracy in these polyhedral minimization problems did not exist and was clearly desirable, and one of the most important aspects of the book is that I was able to do this by way of a necessary and sufficient condition that leads to effective algorithms for the class of polyhedral problems. Finiteness is interpreted as the existence of rational algorithms. These include data analysis problems in least squares, quadratic programming, and robust estimation, and problems of this kind are also discussed. A highlight is the inclusion of work from David Clark's thesis to deal with rational methods for the Huber M-estimator (see also [105]), and an account of results obtained with Alistair Watson on a generalisation of the `errors in variables' model (see also [102]).

Simplicial Algorithms for Minimizing Polyhedral Functions [187]: Since the development of the `Finite Algorithms' book there has been significant developments in the area. Most notably, Karen George was able to make a full implementation of the rank regression algorithm before she fell ill. This book aims as an important component of its coverage to provide a reporting of her contributions as part of the task of bringing the previous one up to date. This put the onus on me to provide a working rank regression program which turned out to commit me to several months of intensive work. I made use of object oriented programming techniques which I believe helped significantly - this is argued in detail in the book. Also, I had the advantage of joint work with Alistair Watson on line search methods for these problems [159] which highlights the utility of a modified secant algorithm, and I had completed implementations of degeneracy resolving codes for linear programming [141]. The work on structure functional descriptions of these problems has been revised to take account of developments (for example, [106]), and Rob Womersley's fundamental work on censored l1 estimation as an example of a nonconvex problem put in a more general setting. A different kind of extension applies the structure functional framework to a smooth function subject to a polyhedral function constraint to develop the analog of active set methods. There is a corresponding development for the associated Lagrangian function.

Computational Methods in Least Squares and Maximum Likelihood: These notes have been used for a lecture course at the University of Heidelberg. They cover a number of the research areas of current interest indicated above, especially the application of scoring to maximum likelihood problems which arise in the estimation of differential equations and related problems of variable selection. They are quite regularly updated and/or corrected.

Graduate Students

W.McLewin

Lecturer in Mathematics, Manchester University  (ret)

R.Jordinson

Senior Lecturer in Mathematics, Edinburgh University (ret.)

G.A.Watson

Professor of Mathematics , University of Dundee

D.M.Ryan

Professor and Deputy Dean of Engineering, University of Auckland

L.S.Jennings

Associate Professor , University of Western Australia

F. De Hoog

Chief Scientist, CSIRO

D.H.Anderson

C.A.S.A.

G.X.Woodford

Consultant

D.C.C.Bover

Mission Critical Systems

Krisorn Jittorntrum

Director of Computer Services, Cheng Mai University

Andreas Griewank

Professor of Mathematics, Humboldt University Berlin

D.I.Clark

Senior Lecturer, University of Canberra

Linping Sun

Professor of Mathematics, University of Nanjing

G.K.Smyth

Senior Research Scientist, WEHI

Tania Prvan

Senior Lecturer in Statistics, Macquarie University

Karen George

Lecturer, University of Canberra (ret)

J.H.Taylor

Research Scientist, Department of Defence

Sergey Bakin

Suncorp Metway

Zengfeng Li

Department of Health

Publications