Celebrating Sue Wilson's 34 years at ANU
Abstracts
Sue Wilson - Statistician Extraordinaire!
Kaye Basford
University of Queensland
Professor Sue Wilson is currently one of Australia's best-known applied
statisticians, with a deservedly outstanding international reputation
for high quality research and professional contributions to
biostatistics and statistical genetics. This talk summarises the impact
of her major contributions to scientific research (spanning the past
three decades), the leadership she has demonstrated at an international
level in statistical science (through professional and administrative
activities, particularly with the International Biometric Society), and
the high esteem in which she is held by her scientific collaborators and
colleagues around the world. It also includes her important
contributions to training new generations of statisticians and her role
as mentor for junior women academics and students.
My introduction to generalised linear models
Michael Adena
Covance Pty Ltd
In 1978, Sue introduced me to the computer package GLIM, which provided an
easy and unified way to fit generalised linear models. This lead to work
together, including a book on the analysis of case-control studies. I will
outline my perspective on this project.
I will also show that 'fiddling around with data' in those early years
provided an unexpectedly useful precedent for the design of a clinical
trial.
Collaborating over the Decades
Yvonne Pittelkow
Australian National University
Throughout Sues career she has worked towards, and encouraged others, in the pursuit
of excellence and rigour in biostatistical research and application, as well as in
the empirical social sciences and in archeology. To pursue these aims she has
actively collaborated with countless researchers and clinicians.
The research questions behind some of the collaborations was often interesting and
occasionally topical. For example Sue published several papers in the 70s on methods
for comparing radio carbon dates and these methods were used for dating the Shroud
of Turin. In this talk I will give some background to this as well as to some other
collaborations in which Sue and I were involved.
Use of Mixture Models in Multiple Hypothesis Testing with
Applications in Bioinformatics
Geoff McLachlan and Leesa Wockner
Department of Mathematics and Institute for Molecular Bioscience
University of Queensland
There are many important problems these days where consideration has to be
given to carrying out hundreds or even thousands of hypothesis testing
problems at the same time. For example, in forming classifiers
on the basis of high-dimensional data, the aim might be to select a small
subset of useful variables for the prediction problem at hand. In
the field of bioinformatics, there are many examples where a large number of
hypotheses have to be tested simultaneously tested. For example,
a common problem in this field is the detection of genes that are
differentially expressed in a given number of classes. The problem of testing
many hypotheses at the same time can be expressed in a two-component mixture
framework, using an empirical Bayes approach; see, for example, Efron
(2004). In this framework, we extend the results of McLachlan et al.
(2006) on the adoption of normal mixture models to provide a parametric
approach to the estimation of the so-called local false discovery rate.
The latter can be viewed as the posterior probability that a given null
hypothesis does hold. With this approach, not only can the global false
discovery rate be controlled, but also the implied probability of a false
negative can be assessed. The methodology is demonstrated on some
problems in bioinformatics.
Linkage analysis in the high density SNP chip era
Melanie Bahlo
Alignment-free sequence comparisons using k-word matches
Conrad Burden
Australian National University
A common problem faced by biologists is finding a close match in a database to a
given DNA or protein sequence. This is used, for example, to identify homologous
genes or proteins in a particular species, or to find genes related by a common
ancestor in two different species. The most popular, currently available sequence
matching algorithms attempt to align long sequences. This may not always be
appropriate when related sequences have been rearranged or spliced, or when
identifying short regulatory motifs. An alignment free method, called k-word
matches, is being developed by the Centre for Bioinformation Science to address
these cases. The idea is to use as a comparison statistic the number of exact or
partial short word matches of a given pre-specified length. We have found accurate
representations of the statistical properties of word match counts under suitable
null hypotheses, and are developing fast computer algorithms for biological
applications.
From oligos to socio-genomics via a few simple stats
Sylvain Foret
Australian National University
After the sequencing of the honeybee genome, a microarray was designed
with all the genes resulting from automated gene predictions and manual
annotations.
The initial goal of this study as to asses the quality of this
microarray platform. In order to estimate how many of the probes
spotted onto this array can capture gene expression, we developed a
method to assess the presence or absence of expression for each probe
and applied it to data from a few selected body parts, tissues and
developmental stages.
After validating the method against a number of genes that had
previously been characterised by real-time quantitative PCR and Northern
blots, the resulting gene expression bar code was used to investigate
various properties of the honeybee genome.
In particular, we investigated the relations between the genes patterns
of expression and their methylation levels, as we showed recently that
methylation plays a central role in the shaping of the honeybee social
structure. Remarkably, we found that genes expressed only in restricted
conditions, tend to have a low level of methylation. This contradicts
the current view that methylation functions to suppress spurious
transcriptional initiation within infrequently transcribed genes.
Some uses of principal components with gene expression and microarray data
Terry Speed
WEHI
Throughout her career, Sue Wilson has been a strong advocate for and a creative
contributor to visualization methods in statistics. In recent years she has
championed methods related to principal components analysis (PCA) - biplots,
h-plots, singular value decomposition - for visualizing microarray gene expression
data. In this talk I will discuss a different use of these methods: for identifying
and adjusting for artifacts, the non-genetic factors which are often a significant
addition to the biological signal in microarray data. While the value of PCA in this
context has long been accepted, the systematic use of the idea is just coming in
through the use of programs such as EIGENSTRAT (Price et al, 2006), SVA (Leek &
Storey, 2007), Bayesian PCA and VIBES (Bishop, 1999, Bishop et al, 2002). I'll
illustrate these methods, and add some thoughts of my own.