Dear Statalist readers,
I am a new user of Stata and now have problem with discrete-time
hazard model. I am not sure whether I handle the model with correct
Stata commands, or the data are set up correctly. If anyone can give
me some suggestions or tips, I will truly appreciate it. Here is my
question:
We want to know whether higher price predicts quitting from smoking.
We have a 3-wave survey on smokers in the US and Canada. We look at
smokers with 4 smoking patterns: SSS, SSQ, SQS, and SQQ. SSS means one
is a smoker at all 3 waves, SSQ means he or she smoks at first two
waves but quit at the third wave, and so on.
The data are set up this way (somebody else set up this actually, I
think her setup is correct based on my limited knowledge on this
model. Please let me know if there is anything incorrect here.):
1) Starting from one-row-per-person dataset, create a variable to
indicate the number of waves that smokers are at "risk" of quitting.
So, SSS and SSQ respondents are assigned value 3 and SQQ and SQS value
2.
2) Based on this indicator, expand the dataset, so SSS and SSQ have 3
rows of observation per person, and SQS and SQQ 2 rows per person.
3) By uniqid, that is for each person, create a counter of rows.
4) By uniqid, create a binary variable "qtsmok" which equals 1 at the
last row for SQQ, SQS and SSQ; it takes value 0 for all rows of SSS
and other-than-last row(s) of SQQ, SQS and SSQ. This is the dependent
variable of our model.
5) A "wave" variable is created, which takes values 1, 2 and 3 to
indicate the wave; 3 dummies - wave1, wave2 and wave3 - are created as
well.
Then I set up the survey design with the strata and weights. I use
-svy: logit- command. The explanatory variables include the
conventional demographic and socioeconomic variables, price, a dummy
variable for Canada, wave2 and wave3. Since in wave 1, everybody is
smoker, no "quitting" event happens. So I do not include the "wave1"
indicator in the equation. Besides, I use the option of "noconstant" -
all my model setup is based on my reading of the on-line lecture notes
by Prof. Stephen Jenkins in UK.
The problem is the coefficient before our price variable is negative
(small magnitude though) and significant at 1%! This is not what we
expected. I tried many ways to explore:
1) removing "wave2" and "wave3"
2) removing survey setting
3) regression with only US or Canada sample
4) regression with the "wave" variable which has 3 values
I got similar results each time. Then I tried more:
5) neglecting the fact that the dependent is binary, instead, I used
"svy: reg", now the coefficient before price is positive and
significant at 10%!
6) there is a categorical variable in the data set which defines
smoking cessation stages: precontemplation, contemplation,
preparation, action, and maintanence. A higher value of it means
higher motivation to quit smoking or the quitting has already
happened. This variable is positively correlated with the binary
quitting dependent variable in this model. I cross-tabulated it with
our dependent variable; it is consistent with our dependent variable -
so again it seems that our dependent variable is correct.
I ran OLS and ordered logit models for this cessation stage variable.
In both models, the coefficients before the price are positive and
significant.
Based on this, I really do not know how to explain the negative and
significant price coefficient in the hazard model (the logit model).
I never did hazard model before, and I am still new to Stata. I am not
sure whether my problem is in the data setup or in the modeling. Any
suggestions will be greatly appreciated!
Thanks for your time reading my question.
Best,
Lili
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/