Organization
________________________________
Section Statistical Software of The Netherlands Society for Statistics
and Operations Research
Program Committee
________________________________
Dr. Ruud Koning, Universiteit of Groningen
Prof.dr. Arno Siebes, University of Utrecht
Dr. Siem Heisterkamp, National Institute of Public Health and the
Environment (RIVM)
Prof.dr. Patrick Groenen, Erasmus University Rotterdam
Large Data Sets
________________________________
Fifteen years ago, handling of large datasets, let alone analysis in
them was a nearly impossible task for researchers. The data were often
stored on tape, and even the process of reading the dataset into the
memory of a mainframe was slow. Memory was scarce, and so it was
difficult to save intermediate results. Such datasets were analyzed
using either tailor-made statistical software, or self-written programs
using routines from numerical libraries like NAG or IMSL.
Maximum-likelihood estimation of non-linear models was non-trivial if
not impossible, and researchers often had to be satisfied with one-step
improvements over some consistent estimator.
Things have changed for the better, from a technical point of view.
Huge datasets are routinely available to researchers in different
fields, like finance, marketing, biomedical sciences, particle physics,
astronomy, life sciences, and social sciences. Datasets used to be
large in the sense of containing many observations on a small number of
variables. But nowadays, e.g. in the life sciences we are confronted
with datasets with a small number of observations and a huge number of
variables. Data can be transported on media that can be read by most
personal computers, and the computing power on the desk of a
statistical researcher is absolutely impressive. Instead of focusing on
the mechanics of the analysis of datasets, researchers can focus on the
actual statistical analysis. Thus the question has turned into: Now
that we have a lot of data, what could we do with it?
This conference addresses the analysis of very large datasets, both
from the point of view of a statistician who works with such datasets
as well as the point of view of practitioners from various fields. By
presenting several applications and tools available to a modern day
statistical researcher, we want to show that large datasets offer
unique opportunities for researchers to answer questions that were
difficult to tackle before. The program committee is delighted to be
able to present a selection of the top researchers on this topic.
Program
________________________________
9:30 registration and coffee
10:00 opening
10:05 Yoav Benjamini
Tel-Aviv University
Multiplicity issues related to complex research questions
in microarrays analysis
10:55 Philip Hans Franses
Erasmus University, Rotterdam
More, but also better?
11:40 Paul Eilers
Leiden University Medical Centre
Low Memory, High Speed Smoothing on Large
Multidimensional Grids
12:30 Lunch
13:30 Andreas Buja
University of Pennsylvania
Hands-On Experiences with Mining Telecom Data
14:15 Jos Roerdink
University of Groningen
Visualization of large data sets with applications in
life science
15:00 coffee/ tea break
15:15 Geert Wets
Limburg University, Belgium
Large data sets in traffic safety
16:30 Drinks
VVS-SSP
Nieuwpoortkade 25
1055 RX Amsterdam
The Netherlands
T +31 (0)20 5608410
F +31 (0)20 5608448
E [email protected]
U www.vvs-ssp.nl