Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Convergence never achieved with MI impute chained
From
Maarten Buis <[email protected]>
To
[email protected]
Subject
Re: st: Convergence never achieved with MI impute chained
Date
Thu, 21 Jun 2012 14:22:56 +0200
I have just seen that you used only the first digit of the ISIC
classification. However, it contains lots of sparse categories(*).
These will cause problems. Also inspect education for sparse
categories. You'll need to combine those sparse categories with
"adjacent" categories in order to get sufficiently filled cells. Also
look at a cross tabulation of industry and education, and see if there
aren't any cells that are too empty. That will probably mean a second
round of merging categories.
I would not use ordered models or mvn for imputing industry, that just
does not make sense.
Hope this helps,
Maarten
(*) If this is recent data from a western country than you have made a
coding error. In that case there are way way way too many farmers.
On Thu, Jun 21, 2012 at 1:46 PM, Lena Lindbjerg Sperling
<[email protected]> wrote:
>>
>> Thank you for your answer!
>>
>> It does seem though that all occupations are represented in both private and public sectors.
>> And I also have another data set where I only impute educational level, industry (ISIC 3 category) and wage and I still get not convergence, even though that's just one mlogit, one ologit and one pmm...so that doesn't seem to be the problem.
>>
>> I got a result out for the mi xeq 0: mlogit for industry however and it looks like this:
>> -> mlogit industry
> Iteration 0:00 log likelihood = -4875.9554
> Iteration 1:00 log likelihood = -4875.9554
> Multinomial logistic regression Number of obs =
> LR chi2(0) = 0
> Prob > chi2 = .
> Log likelihood = -4875.9554 Pseudo R2 =
> industry Coef. Std. Err. z P>z [95%
> Agriculture__Hunting__etc_ (base outcome)
> Mining
> _cons -4.982464 0.2896632 -17.2 0 -5.550194 -4.414735
> Manufacturing
> _cons -2.671581 0.0939994 -28.42 0 -2.855816 -2.487345
> Public_services
> _cons -3.42432 0.134593 -25.44 0 -3.688117 -3.160522
> Construction
> _cons -3.204691 0.1210617 -26.47 0 -3.441968 -2.967415
> Retail__Hotels
> _cons -1.714798 0.0612048 -28.02 0 -1.834758 -1.594839
> Transport_and_telecomnunications
> _cons -4.759321 0.2593031 -18.35 0 -5.267546 -4.251096
> Finance_and_business_serv_
> _cons -6.368759 0.5778449 -11.02 0 -7.501314 -5.236204
> Communal_services
> _cons -0.830113 0.0433825 -19.13 0 -0.9151412 -0.7450848
> Others_not_well_specified
> _cons -1.753638 0.0622235 -28.18 0 -1.875594 -1.631683
>>
>> Should I use something else to impute this? It runs from 1 to 10 so maybe ordered is better? I get convergence if I use ordered logit for industry and occupation. They really shouldn't be ordered, but how important is that choice?
>>
>>
>> I can get results out if I use mvn, but is that a very bad idea? Seems like the literature disagrees quite a bit on how severe it is to assume normality?
>>
>> Best,
>> Lena
>>
>> Den Jun 21, 2012 kl. 10:48 AM skrev Maarten Buis:
>>
>>> On Thu, Jun 21, 2012 at 10:15 AM, Lena Lindbjerg Sperling wrote:
>>>> I just looked at the mail again, and the data is not as bad as it looks, as I'm only imputing on the employed population (lstatus==1) and when we only look at them mi describe shows:
>>>> mi describe
>>>>
>>>> Style: wide
>>>> last mi update 21jun2012 10:03:51, 18 seconds ago
>>>>
>>>> Obs.: complete 2,702
>>>> incomplete 912 (M = 0 imputations)
>>>> ---------------------
>>>> total 3,614
>>>>
>>>> Vars.: imputed: 7; occup(126) ocusec(144) whours(167) edulevel(171) ocu(228) industry(204) mwage(598)
>>>
>>> Just looking at the variable names I suspect that this is an extremely
>>> hard model to estimate. How many categories do the variables occup,
>>> ocusec, ocu, and industry have? Are there combinations of three or
>>> less of these that for some observations perfectly predict one or more
>>> remaining variables? For example, if we know that someone is a mayor
>>> than we also know that (s)he is working in the public sector.
>>>
>>> <snip>
>>>> Iteration 14: log pseudolikelihood = -2454486.7 (not concave)
>>>> Not completely sure what this means. Can you see where things are wrong from this?
>>>
>>> It means that this sub-model did not converge, probably because of the
>>> problems indicated above.
>>>
>>>> When I use -mi xeq 0: mlogit - the result is:
>>>> m=0 data:
>>>> -> mlogit
>>>> last estimates not found
>>>> r(301);
>>>>
>>>> But I thought it was the observed data...which should be there?
>>>
>>> What you asked for was for Stata to replay the last -mlogit- command,
>>> and it replied that the last command wasn't -mlogit-. You probably
>>> pressed break before the model finished estimating, which makes sense
>>> if it did not converge.
>>>
>>> Hope this helps,
>>> Maarten
>>>
>>> --------------------------
>>> Maarten L. Buis
>>> Institut fuer Soziologie
>>> Universitaet Tuebingen
>>> Wilhelmstrasse 36
>>> 72074 Tuebingen
>>> Germany
>>>
>>>
>>> http://www.maartenbuis.nl
>>> --------------------------
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/