Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Phil Clayton <philclayton@internode.on.net> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: replace missing values of a variable with the median of that variable based on particular criterion |
Date | Sun, 12 Aug 2012 14:09:18 +1000 |
It is probably simpler to obtain the medians using -summarize- sum Inc if Inc>=0 & Inc<200, d replace Inc=r(p50) if Range==1 & missing(Inc) sum Inc if Inc>=200 & Inc<600, d replace Inc=r(p50) if Range==2 & missing(Inc) Having said that you need to be careful about imputing a median like this. For example, this is from the introductory section of Stata's multiple imputation manual: "... single-imputation methods do not discard missing values. They treat the imputed values as known in the analysis. This underestimates the variance of the estimates and so overstates precision and results in confidence intervals and significance tests that are too optimistic." Phil On 12/08/2012, at 1:23 PM, Airey, David C wrote: > . > Here is my try. > > I'm curious, what did you try, and what were your problems? > For me, the difficulty was noting the save option to tabstat > and getting the statistic into a local. > > -Dave > > > clear > input PersonID Inc Range > 1 100 . > 2 . 1 > 3 200 . > 4 . 2 > 5 232 . > 6 500 . > 7 150 . > 8 340 . > 9 . 1 > 10 55 . > end > > tabstat Inc if Inc < 200 & !missing(Inc), s(q) save > matrix mymatrix = r(StatTotal) > local mymedian1 = mymatrix[2,1] > replace Inc = `mymedian1' if Range == 1 > > tabstat Inc if Inc < 600 & Inc >= 200 & !missing(Inc), s(q) save > matrix mymatrix = r(StatTotal) > local mymedian2 = mymatrix[2,1] > replace Inc = `mymedian2' if Range == 2 > > list > > PersonID Inc Range > 1. 1 100 . > 2. 2 100 1 > 3. 3 200 . > 4. 4 286 2 > 5. 5 232 . > 6. 6 500 . > 7. 7 150 . > 8. 8 340 . > 9. 9 100 1 > 10. 10 55 . > > > >> Dear all, >> >> I have two variables Inc (labeled as income) and Range on respondent-level. >> >> For each respondent, if Inc<., then Range=.; >> if Inc=., then Range is nonmissing, and provides the range of income for >> that respondent (Range=1 means 0<=Inc<200, Range=2 means 200<=Inc<600 ). >> >> The data structure is like: >> >> PersonID Inc Range >> 1 100 . >> 2 . 1 >> 3 200 . >> 4 . 2 >> 5 232 . >> 6 500 . >> 7 150 . >> 8 340 . >> 9 . 1 >> 10 55 . >> >> I want to realize the following procedure to replace the missing values of >> Inc: >> When Inc=. & Range=1, replace the missing value of Inc with the median >> income of those respondents who provide nonmissing values for Inc and 0<= >> Inc<200; >> When Inc=. & Range=2, replace the missing value of Inc with the median >> income of those respondents who provide nonmissing values for Inc and >> 200<=Inc<600. >> >> How to realize this purpose? >> >> Thank you very much for your help. >> >> Best, >> Sharon > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/