Last updated: 26 April 2005
Photo used with permission of the WZB
Wissenschaftszentrum Berlin (WZB)
Reichpietschufer 50
D-10785 Berlina
Germany
AbstractRegression models play a central role in epidemiology and clinical studies. In epidemiology the emphasis is typically either on determining whether a given risk factor affects the outcome of interest (adjusted for confounders), or on estimating a dose/response curve for a given factor, again adjusting for confounders. An important class of clinical studies is the so-called prognostic factors studies, in which the outcome for patients with chronic diseases such as cancer is predicted from various clinical features. In both application areas, it is almost always necessary to build a multivariable model incorporating known or suspected influential variables while eliminating those found to be unimportant.
It is commonplace for risk or prognostic factors to be measured on a continuous scale, an obvious example being a person's age. Conventionally, such factors are either modelled as linear functions or are converted into categories according to some chosen set of cut-points. However, categorisation and use of the resulting estimates is a procedure known to be fraught with difficulty. A linear function may fit the data badly and give misleading estimates of risk. Therefore, reliable approaches for representing the effects of continuous factors in multivariable models are urgently needed.
Building multivariable regression models by selecting influential covariates and determining the functional form of the relationship between a continuous covariate and the outcome when analysing data from clinical and epidemiological studies is the main concern of this talk. Systematic procedures which combine selection of influential variables with determination of functional form for continuous factors are rare. Analysts may apply their individual subjective preferences for each part of the model-building process, estimate parameters for several models and then decide on the final strategy according to the results they find. By contrast, we will present here the multivariable fractional polynomial (MFP) approach as a systematic way to determine a multivariable regression model. The MFP approach was made generally available to Stata users in version 8 as the -mfp- command. Major concerns will be discussed, including robustness and possible model instability. Regarding determination of the functional form, we will also discuss some alternatives with more emphasis on local estimation of the function (e.g. splines). The MFP procedure may be used for various types of regression models (linear regression model, logistic model, Cox model, and many more). Examples with real data will be used as illustrations.
Additional information
royston.ppt
AbstractThe decomposition technique introduced by Blinder (1973) and Oaxaca (1973) is widely used to study outcome differences between groups. For example, the technique is commonly applied to the analysis of the gender wage gap. However, despite the procedure's frequent use, very little attention has been paid to the issue of estimating the sampling variances of the decomposition components. We therefore suggest an approach that introduces consistent variance estimators for several variants of the decomposition. The accuracy of the new estimators under ideal conditions is illustrated with the results of a Monte Carlo simulation. As a second check, the estimators are compared to bootstrap results obtained using real data. In contrast to previously proposed statistics, the new method takes into account the extra variation imposed by stochastic regressors.
Additional information
jann.pdf
AbstractThis presentation outlines a panel data retrieval program written for Stata/SE, which allows easier accessing of the German Socio-Economic Panel Data set. Using a drop-down menu system, the researcher selects variables from any and all available years of the panel. The data is automatically retrieved and merged to form a rectangular "wide file". The wide file is transposed to form a "long file", which can be directly used by the Stata panel estimators. The system implements modular data cleaning programs called plugins.
Additional information
soepmenu.pdf
AbstractAmong survey statisticians Stata is increasingly recognized as one of the more powerful statistical software packages for the analysis of complex survey data. This paper will survey the capabilities of Stata to analyze complex survey data. We will briefly review and compare different methods for variance estimation for stratified and clustered samples, and discuss the handling of survey weights. Examples will be given for the practical importance of Stata's survey capabilities. In addition we will point to statistical solutions that aren't yet part of the official package, and review user written ados that currently extend Stata's survey capabilities. Among the specific topics we will cover are replication variance estimation (jackknife, balanced repeated replication, and the bootstrap), issues associated with degrees of freedom and domain estimates, quantile estimation, and some concerns related to model fitting using survey data.
Additional information
kreuter.pdf
AbstractWe present a theoretical and empirical analysis of the fitness of national German (German Commercial Code � Handelsgesetzbuch (HGB)) and international (IAS and US-GAAP) accounting information, as well as European patent data to explain the market values of German manufacturing firms. For the chosen volatile period from 1997 to 2002, cautious national accounting information does not correlate with the firms' residual market values (RMV). International accounting information makes no meaningful contribution to explaining firms' RMV and seems to measure over-investment only. Finally, patents counted at the individual country level correlate with the firms' RMV. To the best of our knowledge this is the first paper which use a panel fixed effects estimator for a non-linear equation. We estimate the model using an algorithm programmed with Stata and Ox.
Additional information
ramb.ppt
AbstractThematic maps illustrate the spatial distribution of one or more variables of interest within a given geographical unit. The purpose of this talk is to present version 2.0 of the -tmap- package, a suite of Stata programs designed to draw several kinds of thematic map. The first public release of -tmap- was published in The Stata Journal in 2004. This presentation will focus on the new features of the package.
Additional information
pisati.ppt
AbstractBerlecon Research is a German-based research company that analyzes the potential of new technologies within the IT, Internet and mobile industry in Germany and Europe. The analysis of survey data - typically deliverd by market research companies - are an integral part of the Berlecon activities. In 2004, the company implemented Stata 8 in order to streamline the data processing and to design high quality graphs and tables. The presentation will discuss the specific requirements for Professional Research organisations needed by Stata program. Thereby, main challenges and ways chosen to overcome them - as far as the Stata usage by Berlecon - will be explained. Lastly, a wish list for the Stata corporation will be presented.
Additional information
stiehler.ppt