The Australian National University
Mathematical Sciences Institute (MSI)
A meeting to celebrate Sue Wilson's 34 years at ANU
document location: http://wwwmaths.anu.edu.au/events/Suefest/abstracts.html

Celebrating Sue Wilson's 34 years at ANU

Abstracts

Sue Wilson - Statistician Extraordinaire!

Kaye Basford
University of Queensland

Professor Sue Wilson is currently one of Australia's best-known applied statisticians, with a deservedly outstanding international reputation for high quality research and professional contributions to biostatistics and statistical genetics. This talk summarises the impact of her major contributions to scientific research (spanning the past three decades), the leadership she has demonstrated at an international level in statistical science (through professional and administrative activities, particularly with the International Biometric Society), and the high esteem in which she is held by her scientific collaborators and colleagues around the world. It also includes her important contributions to training new generations of statisticians and her role as mentor for junior women academics and students.



My introduction to generalised linear models

Michael Adena
Covance Pty Ltd

In 1978, Sue introduced me to the computer package GLIM, which provided an easy and unified way to fit generalised linear models. This lead to work together, including a book on the analysis of case-control studies. I will outline my perspective on this project.

I will also show that 'fiddling around with data' in those early years provided an unexpectedly useful precedent for the design of a clinical trial.



Collaborating over the Decades

Yvonne Pittelkow
Australian National University

Throughout Sues career she has worked towards, and encouraged others, in the pursuit of excellence and rigour in biostatistical research and application, as well as in the empirical social sciences and in archeology. To pursue these aims she has actively collaborated with countless researchers and clinicians.

The research questions behind some of the collaborations was often interesting and occasionally topical. For example Sue published several papers in the 70s on methods for comparing radio carbon dates and these methods were used for dating the Shroud of Turin. In this talk I will give some background to this as well as to some other collaborations in which Sue and I were involved.



Use of Mixture Models in Multiple Hypothesis Testing with Applications in Bioinformatics

Geoff McLachlan and Leesa Wockner
Department of Mathematics and Institute for Molecular Bioscience
University of Queensland

There are many important problems these days where consideration has to be given to carrying out hundreds or even thousands of hypothesis testing problems at the same time. For example, in forming classifiers on the basis of high-dimensional data, the aim might be to select a small subset of useful variables for the prediction problem at hand. In the field of bioinformatics, there are many examples where a large number of hypotheses have to be tested simultaneously tested. For example, a common problem in this field is the detection of genes that are differentially expressed in a given number of classes. The problem of testing many hypotheses at the same time can be expressed in a two-component mixture framework, using an empirical Bayes approach; see, for example, Efron (2004). In this framework, we extend the results of McLachlan et al. (2006) on the adoption of normal mixture models to provide a parametric approach to the estimation of the so-called local false discovery rate. The latter can be viewed as the posterior probability that a given null hypothesis does hold. With this approach, not only can the global false discovery rate be controlled, but also the implied probability of a false negative can be assessed. The methodology is demonstrated on some problems in bioinformatics.



Linkage analysis in the high density SNP chip era

Melanie Bahlo



Alignment-free sequence comparisons using k-word matches

Conrad Burden
Australian National University

A common problem faced by biologists is finding a close match in a database to a given DNA or protein sequence. This is used, for example, to identify homologous genes or proteins in a particular species, or to find genes related by a common ancestor in two different species. The most popular, currently available sequence matching algorithms attempt to align long sequences. This may not always be appropriate when related sequences have been rearranged or spliced, or when identifying short regulatory motifs. An alignment free method, called k-word matches, is being developed by the Centre for Bioinformation Science to address these cases. The idea is to use as a comparison statistic the number of exact or partial short word matches of a given pre-specified length. We have found accurate representations of the statistical properties of word match counts under suitable null hypotheses, and are developing fast computer algorithms for biological applications.



From oligos to socio-genomics via a few simple stats

Sylvain Foret
Australian National University

After the sequencing of the honeybee genome, a microarray was designed with all the genes resulting from automated gene predictions and manual annotations.

The initial goal of this study as to asses the quality of this microarray platform. In order to estimate how many of the probes spotted onto this array can capture gene expression, we developed a method to assess the presence or absence of expression for each probe and applied it to data from a few selected body parts, tissues and developmental stages.

After validating the method against a number of genes that had previously been characterised by real-time quantitative PCR and Northern blots, the resulting gene expression bar code was used to investigate various properties of the honeybee genome.

In particular, we investigated the relations between the genes patterns of expression and their methylation levels, as we showed recently that methylation plays a central role in the shaping of the honeybee social structure. Remarkably, we found that genes expressed only in restricted conditions, tend to have a low level of methylation. This contradicts the current view that methylation functions to suppress spurious transcriptional initiation within infrequently transcribed genes.



Some uses of principal components with gene expression and microarray data

Terry Speed
WEHI

Throughout her career, Sue Wilson has been a strong advocate for and a creative contributor to visualization methods in statistics. In recent years she has championed methods related to principal components analysis (PCA) - biplots, h-plots, singular value decomposition - for visualizing microarray gene expression data. In this talk I will discuss a different use of these methods: for identifying and adjusting for artifacts, the non-genetic factors which are often a significant addition to the biological signal in microarray data. While the value of PCA in this context has long been accepted, the systematic use of the idea is just coming in through the use of programs such as EIGENSTRAT (Price et al, 2006), SVA (Leek & Storey, 2007), Bayesian PCA and VIBES (Bishop, 1999, Bishop et al, 2002). I'll illustrate these methods, and add some thoughts of my own.