Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Using Multiple Imputation in a very large dataset
From
Raquel Rangel de Meireles Guimarães <[email protected]>
To
[email protected]
Subject
st: Using Multiple Imputation in a very large dataset
Date
Fri, 03 Jun 2011 12:36:31 -0300
Dear stata users,
I am using Stata MP Dual Core 64-bits on windows 7. I have 4GB RAM, but
I've allocated 5GB to store my data.
I am interested in modeling the determinants of school performance. I
have data for 1.939.147 students. My dependent variable is the reading
proficiency (fully Observed), and I have the student's individual
characteristics (gender, race and age - fully Observed) and the scores
for the socioeconomics constructs (socioeconomic level, student
motivation, parents Involvement, Cultural Capital), which were obtained
via Item Response Theory.
I would like to impute values for the socioeconomic characteristics
according to levels of student's proficiency, gender, race and age.
My data can be found at the following website:
http://sites.google.com/site/raquelscurriculumsite/data/data-school-achievement.rar
I would like to impute values since I will lost a lot of students in my
study doing regressions.
Here is a descriptive statistics of my fully observed variables X:
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------------------------
cod_uf | 1939147 32.73305 9.495694 11 53
região | 1939147 2.898205 1.031377 1 5
qn1 (sex) | 1939147 1.499852 .5000001 1 2
qn2 (race) | 1939147 1.941047 .9668508 1 5
qn4 (age groups) | 1939147 3.752611 1.195527 1 8
-------------+-------------------------------------------------------------------------
leitura (reading proficiency) | 1939147 175.8849 41.19135 0 347.36
Here is the misstable of my missing values:
. misstable sum capitalcultural envolvimento motivacao nse
Obs<.
+------------------------------
| | Unique
Variable | Obs=. Obs>. Obs<. | values Min Max
-------------+--------------------------------+------------------------------
capitalcultural | 42,986 1896161 | 371 -1.662 1.662
envolvimento | 20,302 1918845 | 19 -1.178 1.178
motivacao | 37,507 1901640 | 15 -1.014 .672
nse | 6,092 1933055 | >500 -2.02 2.02
-----------------------------------------------------------------------------
Here is my procedure to do multiple imputation:
mi set mlong
mi register imputed capitalcultural envolvimento motivacao nse
mi register regular leitura qn1 qn2 qn4
tab qn1, g(sexo)
tab qn2, g(raca)
tab região, g(regiao)
xi: mi impute reg capitalcultural = leitura sexo1 raca1 regiao1 qn4,
add(1000)
I got the following message error: insufficient disk space r(699)
Could anyone please help me? Is there a possibility of another
imputation technique? Hotdeck would not be useful since the imputed
variables are not categorized.
Kind regards,
Raquel
--
Raquel Rangel de Meireles Guimarães
Professora Substituta do Departamento de Demografia, UFMG
Doutoranda em Demografia
http://ufmg.academia.edu/RaquelGuimaraes
Cedeplar - Centro de Desenvolvimento e Planejamento Regional
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/