Thanks a lot for your prompt response, Tom!
On 10/18/07, Steichen, Thomas J. <[email protected]> wrote:
> I see nothing wrong with the data generation steps you performed,
> so the question is whether this model makes sense.
>
> First, I will speculate that you have brand-specific prices at
> the time of each wave. Since cigarette prices tend to rise
> fairly uniformly between brands over time, either due to
> manufacturer price increases due to inflation or government tax
> increases, there is almost certainly a meaningful correlation
> between wave and price. Thus, having both a "price" variable and
> one or more "wave" variables will lead to confusion in the
> coefficients.
>
> In this model, the "wave2" variable can be thought of as estimating
> the average quit rate differential from the missing wave (wave 1)...
> and this includes an average price differential effect. Likewise,
> "wave3" estimates the average quit rate differential of wave 3 from
> wave 1.
>
> So what does "price" itself estimate in this model? I'd speculate
> it really only estimates how specific brands affect quitting.
> In your logit model, I'd guess that it indicates that subjects
> who smoke higher-than-average-priced brands quit at a lower rate.
> Said differently, those who smoke low-priced brands are more likely
> to quit due to a price increase. However, without knowing exactly
> what your variables represent, I can't go beyond speculation.
>
> I'm less clear why it remains negative when you take the wave
> variables out. If real, it implies that price differential (if
> it truly has a positive effect on quitting) wasn't great enough to
> overcome other, competing but correlated issues (not explained by
> any other variable in the model)that caused smokers to continue
> smoking during this time period. If so, price represents the
> increase in ALL of these issues and the ones for continued smoking
> dominated the result.
>
> On a different issue, using or not using the svy: prefix should
> change the estimated coefficients, so no particular importance
> should be placed on the fact that a coefficient changed signs
> between these two. Without the prefix, you are estimating what
> happened for the specific group of subjects surveyed in this study.
> When you add the weighting via the svy: prefix, you change the
> importance of those individual subjects based on their sampling
> weights.
>
> For example, you may have surveyed specific subjects who quit
> but represent only a very, very small part of the overall population.
> If you don't use the survey weights, their behavior may have
> a large effect on the sample results but little effect on the
> population results, even to the point of sign reversal.
>
> On yet another issue, marking pattern SQS as a successful "quit"
> seems possibly misleading. Clearly, if price continued to rise
> over the time period between waves (which seems likely to me),
> prices were higher in wave 3 than wave 2, yet these individuals
> started smoking again. This seems to suggest that price was not
> the most important motivating factor for quiting in wave 2 (or
> restatring in wave 3). One can argue that you should code these
> subjects as at "risk" for all three waves and as failing to quit.
>
> Tom
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Lili Yan
> Sent: Thursday, October 18, 2007 2:25 PM
> To: [email protected]
> Subject: Re: st: help needed on discrete-time hazard model
>
> Hi Thomas,
>
> Thank you very much for helping out!
> I know little about this model, so I thought the two zeros indicate
> something wrong in the data. The e(N) is correct, which I am sure.
>
> Here are some codes of setting up the data. I need explain first that
> smok_stat = 1 for SSS, 2 for SSQ, 3 for SQS and 4 for SQQ.
> ................codes start here................
>
> gen smk_time=3 if smok_stat==1 | smok_stat==2;
> replace smk_time=2 if smok_stat==3 | smok_stat==4;
>
> gen cessyear=2004 if smok_stat==1;
> replace cessyear=2004 if smok_stat==2;
> replace cessyear=2003 if (smok_stat==3 | smok_stat==4);
>
> expand smk_time;
> bysort uniqid: gen seqvar=_n;
> bysort uniqid: gen qtsmok=smok_stat>1 & _n==_N;
>
> bysort uniqid: gen evntyear=cessyear;
> replace evntyear=2002 if seqvar==1;
> replace evntyear=2003 if seqvar==2;
> drop cessyear;
> rename evntyear cessyear;
>
> gen wave=1 if cessyear==2002;
> replace wave=2 if cessyear==2003;
> replace wave=3 if cessyear==2004;
>
> gen wave1=wave==1;
> gen wave2=wave==2;
> gen wave3=wave==3;
>
> svy: logit qtsmok male age married white mdrt_educ high_educ incm_mdrt
> incm_high canada rPSPPPi wave2 wave3, noconstant
> ...............codes end here..........
>
> Here is the output:
>
> ..............output starts here................
> Survey: Logistic regression
>
> Number of strata = 26 Number of obs = 5642
> Number of PSUs = 5642 Population size = 5773.9291
> Design df = 5616
> F( 12, 5605) = 166.35
> Prob > F = 0.0000
>
> ------------------------------------------------------------------------------
> | Linearized
> qtsmok | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> male | -.1715913 .1273081 -1.35 0.178 -.4211643 .0779817
> age | -.0326805 .0053098 -6.15 0.000 -.0430898 -.0222713
> married | .0156776 .1427494 0.11 0.913 -.2641663 .2955215
> white | -.5607068 .1443603 -3.88 0.000 -.8437088 -.2777048
> mdrt_educ | -.0291425 .1441877 -0.20 0.840 -.3118061 .2535212
> high_educ | .5113156 .1800797 2.84 0.005 .1582899 .8643414
> incm_mdrt | -.0339146 .1557743 -0.22 0.828 -.3392925 .2714632
> incm_high | .1405313 .1766122 0.80 0.426 -.2056968 .4867595
> canada | 1.802811 .2552666 7.06 0.000 1.30239 2.303233
> rPSPPPi | -.0083975 .000842 -9.97 0.000 -.0100481 -.0067468
> wave2 | 2.111112 .1326945 15.91 0.000 1.850979 2.371244
> wave3 | 2.411039 .1389374 17.35 0.000 2.138668 2.68341
> ------------------------------------------------------------------------------
> ....................output ends here..............
>
> The rPSPPPi is our price variable. We have more price variables but
> logit results with them are similar to what reported here.
>
> Thank you very much!
>
> Lili
>
> On 10/18/07, Steichen, Thomas J. <[email protected]> wrote:
> > Why do you consider this an indication of something wrong?
> >
> > Having zero completely determined successes e(N_cds) and failures
> > e(N_cdf) is what you prefer.
> >
> > Is your overall # of records e(N) wrong?
> >
> > Show us some sample commands and output so we can see what you are doing.
> >
> >
> > -----Original Message-----
> >
> > I checked the data just now. After running logit model with our
> > dependent variable, the stored results show:
> >
> > e(N) = 5463
> > e(N_cds) = 0
> > e(N_cdf) = 0
> >
> > So seems there is something wrong in the data setup. Could anyone
> > please give me some help?
> >
> >
> > -----------------------------------------
> > CONFIDENTIALITY NOTE: This e-mail message, including any
> > attachment(s), contains information that may be confidential,
> > protected by the attorney-client or other legal privileges, and/or
> > proprietary non-public information. If you are not an intended
> > recipient of this message or an authorized assistant to an intended
> > recipient, please notify the sender by replying to this message and
> > then delete it from your system. Use, dissemination, distribution,
> > or reproduction of this message and/or any of its attachments (if
> > any) by unintended recipients is not authorized and may be
> > unlawful.
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/support/faqs/res/findit.html
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> >
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/