Thank you, Austin!
I wasn't paying attention to the weight (i.e., pw vs. fw); I just
copied verbatim without thinking (rarely a good idea...). This makes
a lot more sense--and allows me to use -inteff-.
Misha
On Tue, Sep 22, 2009 at 5:25 PM, Austin Nichols <[email protected]> wrote:
> Misha Spisok <[email protected]> :
> If the variable f records the true number of observations with that
> covariate pattern, then
> logit union t south txsouth [fw=f]
> would be the right code (see -help weight-).
>
> You can also use the -svy- commands I outlined, starting with the command
> svyset, srs
> or
> svyset _n
> to declare the data as not coming from a complex survey.
>
> On Tue, Sep 22, 2009 at 5:49 PM, Misha Spisok <[email protected]> wrote:
>> Many Thanks, Austin and Jeph!
>>
>> The Norton, Wang, and Ai SJ article was very informative. Also, the
>> code examples clarified some things and, of course, raised more
>> questions.
>>
>> If it is not kosher to post follow-up questions on the same thread,
>> please let me know and I will re-post as new questions. Otherwise, my
>> follow-up questions are below.
>>
>> The short version is, what's the difference between -blogit- and
>> -logit-? Or, more accurately, in the context of grouped data, which
>> standard error estimate is correct?
>>
>> If, after using Austin's example, I run the following:
>>
>> logit union t south txsouth [pw=f]
>>
>> and
>>
>> blogit y pop t south txsouth
>>
>> I get, as expected (or hoped, in my case), the same coefficients. The
>> standard errors are smaller in -blogit- because, as I might
>> understand, -blogit- is considering pop to be the number of
>> observations per row, so the number of "effective" observations is the
>> sum of pop.
>>
>> I think this explains the difference in the standard errors.
>> Specifically, with some minor adjustment for the "robustified" -logit-
>> standard errors, the relationship between -logit- and -blogit-
>> standard errors is something like the following:
>>
>> s_blogit = sqrt(s_logit^2*(n_logit - k)/(n_blogit - k))
>>
>> where s_blogit is the se from -blogit-, s_logit is the se from
>> -logit-, n_logit is the number of observation from -logit-, n_blogit
>> is the number of observations from -blogit-, and k is the number of
>> dependent variables, including the constant.
>>
>> It strikes me that the standard errors from -blogit- are more
>> reasonable, given the actual number of observations that lie behind
>> the summarized data. Thus, it seems that the standard errors from
>> using -inteff- will be as incorrect as those from -logit- for
>> summarized data. While I could use the formula from Ai and Norton
>> (2003) to calculate the standard error for the interaction term using
>> the variance-covariance matrix returned after -blogit-, would this be
>> making a mistake?
>>
>> My data are not survey data. They are "actual" data, in the sense
>> that f is the true number of people with the condition and pop is the
>> true population.
>>
>> Thanks again,
>>
>> Misha
>> (Using Stata 10.1)
>>
>>
>> On Fri, Sep 18, 2009 at 7:00 AM, Jeph Herrin <[email protected]> wrote:
>>>
>>> Thanks Austin,
>>>
>>> Yes, I should have specified the -rd- option, I meant
>>> the linear link function. I've become a fan of using
>>> binary (and binomial) linear regression for testing
>>> hypotheses.
>>>
>>> cheers,
>>> Jeph
>>>
>>>
>>> Austin Nichols wrote:
>>>>
>>>> Jeph--
>>>> Doesn't the interaction problem discussed in
>>>> http://www.stata-journal.com/sjpdf.html?articlenum=st0063
>>>> also rear its ugly head here?
>>>>
>>>> Probably also have to be careful of SEs--if the total populations are
>>>> summed weights from a survey, significance will likely be overstated.
>>>>
>>>> I'd probably go to -svy:tab- first in that case...
>>>>
>>>> sysuse psidextract, clear
>>>> keep if t>5
>>>> set seed 1
>>>> g f=ceil(uniform()*1000)
>>>> egen pop=total(f), by(south t)
>>>> svyset [pw=f], strata(t)
>>>> egen gp=group(t south), lab
>>>> svy:tab gp union if t>5, row ci
>>>> lincom _b[p42]-_b[p22]-(_b[p32]-_b[p12])
>>>> g txsouth=t*south
>>>> egen y=total(union*f), by(gp)
>>>> bys gp: replace y=. if _n<_N
>>>> li y t south pop if y<.
>>>> binreg y t south txsouth, n(pop)
>>>> binreg union t south txsouth [pw=f]
>>>> logit union t south txsouth [pw=f]
>>>> findit inteff
>>>>
>>>> On Thu, Sep 17, 2009 at 4:53 PM, Jeph Herrin <[email protected]> wrote:
>>>>>
>>>>> Not sure whether this helps you, but I would normally test this
>>>>> with an interaction term in a model. For instance
>>>>>
>>>>> gen txsouth=t*south
>>>>> binreg f t south txsouth, n(pop)
>>>>>
>>>>> Then testing the coefficient on -txsouth- is the same as
>>>>> testing whether there is a significant difference in differences.
>>>>>
>>>>> hth,
>>>>> Jeph
>>>>>
>>>>> Misha Spisok wrote:
>>>>>>
>>>>>> Hello, Statalist,
>>>>>>
>>>>>> In brief, how does one test a difference in difference of proportions?
>>>>>> My question is re-stated briefly at the end with reference to the
>>>>>> variables I present. A formula and/or reference would be appreciated
>>>>>> if no command exists.
>>>>>>
>>>>
>>>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/