Friedrich,
1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
to upload the individual recode file, I couldn't because there was too many
variables. As a results, I used this program. You can find more info on their
website.
2.)I did as your suggestion. I uploaded the whole household member data, merged
it with weight file and I did NOT use the command keep, only drop command to
take away the _merge variable. Otherwise I cannot merge it with the individual
file. I tried and it gave me an error message: _merge already defined.
So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
write the following command:
merge clnr hhnr lnr using ir
variables clnr hhnr lnr do not uniquely identify observations in the master data
caseid was str12 now str15
tab _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
1 | 23,199 78.11 78.11
2 | 3,100 10.44 88.55
3 | 3,402 11.45 100.00
------------+-----------------------------------
Total | 29,701 100.00
Now comes a tricky part for me. Using the following commands, doesn't give me
the desired results.
keep if _merge==3
drop _merge
This file, just at in the former case when tabulating hv105 (= age of household
member) gives exactly same answer, that is only children's age is included 0-5
years.
But if I don't use the command keep or drop. I have the age of ALL household member.
My question is should I keep the "_merge" variable ? According to what I have
been reading, I thought the functioning of merge is to only keep if _merge ==3.
3.) In your former email you say that : I drop all children without height and
weight data and all adults, including parents. In my analysis, I use as
dependent variable child health measured by age for height Z-score and weight
for age Z-score. For those children having these Z-score, I need to match them
with their respective parents education, age and households characteristics
ect.to see if mothers' father's with higher education have children with better
child health measured bye Z-score. Therefore, shouldn't the way I was doing be
correct? Or I have misunderstood completely.
Thanks
Tharshini
On 2009-07-28, at 15:43, Friedrich Huebler wrote:
> Tharshini,
>
> In step 3 you -drop- all children without height and weight data and
> all adults, including all parents.
>
> You write "The household member data includes to many variables to
> directly upload in stata." The flat household member recode file from
> the Ghana DHS 2003 has 245 variables. The only version of Stata that
> cannot hold 245 variables is Small Stata. Your -tab- output indicates
> that you do not have Small Stata because you were able to work with
> more than 26000 observations (see -help limits-). You should therefore
> be able to open the complete household member file with Stata. I don't
> know a program called "select" but it does not seem to be necessary.
>
> Friedrich
>
> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
> Thangavelu<[email protected]> wrote:
>> Hi Friedrich,
>>
>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the file
>> for height and weight. A describtion of how to processed when merging and which
>> identifying variables to chose in each and every file. I followed this doc.fil
>> I merged the file according to the following way;
>>
>> 1.) The height and weight file for children up to 5 years old.
>> rename HWHHID caseid
>> rename HWLINE linenr
>> sort caseid linenr
>> save weight, replace
>> clear exit
>>
>> 2.) The household member data includes to many variables to directly upload in
>> stata, so I used the program "select", where I selected my variables of
>> interest. Then I uploaded in stata;
>>
>> use hmr1
>> rename hhid caseid
>> rename hvidx linenr
>> sort caseid linenr
>> save hmr1, replace
>>
>> 3.) These two files was then merged together (master data = hmr1)
>>
>> merge caseid linenr using weight
>>
>> tab _merge
>>
>> _merge | Freq. Percent Cum.
>> ------------+-----------------------------------
>> 1 | 22,673 85.23 85.23
>> 3 | 3,928 14.77 100.00
>> ------------+-----------------------------------
>> Total | 26,601 100.00
>>
>> . keep if _merge ==3
>> (22673 observations deleted)
>>
>> . drop _merge
>>
>> Error message : linenr was byte now int
>>
>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>> amount of obs. as in the weight file. I concluded the merging was correctly
>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>
>> 4.) With this resulting file, I merged it with the individual recode file
>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>> mothers' line nr (lnr hc60)
>>
>> In the resulting file, I again renamed the identifying variables
>> rename HV001 clnr
>> rename HV002 hhnr
>> rename hc60 lnr
>> sort clnr hhnr lnr
>> save thesis
>> clear exit
>>
>> 5.)In the individual recode file, just as in the household member recode file, I
>> used the program "select" to chose the variables and the following identifying
>> variables were renamed. Cluster number (clnr v001), Household number (hhnr v002)
>> and Respondent's line number (lnr v003).
>>
>> use ir1
>> rename V001 clnr
>> rename V002 hhnr
>> rename V003 lnr
>> sort clnr hhnr lnr
>> save ir1, replace
>>
>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>
>> merge clnr hhnr lnr using thesis
>> tab _merge
>>
>> _merge | Freq. Percent Cum.
>> ------------+-----------------------------------
>> 1 | 526 7.48 7.48
>> 2 | 3,100 44.11 51.59
>> 3 | 3,402 48.41 100.00
>> ------------+-----------------------------------
>> Total | 7,028 100.00
>>
>> . keep if _merge == 3
>> (3626 observations deleted)
>>
>> . drop _merge
>>
>> Error message: variables clnr hhnr lnr do not uniquely identify observations in
>> the master data. I hope this will help to solve the problem.
>>
>> / Tharshini
>>
>>
>>
>>
>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>> Tharshini,
>>>
>>> On June 11 you wrote that you wanted to merge the household member
>>> file with the height and weight file. In response to your message you
>>> received advice on how you can merge the data. The table in your
>>> message of today makes clear that you did not merge the files
>>> correctly because you only have persons up to 5 years of age. If you
>>> want more help with this and the other problems you described you have
>>> to show us your code, as explained in the Statalist FAQ.
>>>
>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>
>>> Friedrich
>>>
>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>> Thangavelu<[email protected]> wrote:
>>>>
>>>> .tab hv105
>>>> Age of |
>>>> household |
>>>> members | Freq. Percent Cum.
>>>> ------------+-----------------------------------
>>>> 0 | 772 22.69 22.69
>>>> 1 | 706 20.75 43.45
>>>> 2 | 655 19.25 62.70
>>>> 3 | 689 20.25 82.95
>>>> 4 | 553 16.26 99.21
>>>> 5 | 27 0.79 100.00
>>>> ------------+-----------------------------------
>>>> Total | 3,402 100.00
>>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
Tharshini THANGAVELU
Forskarbacken 8 / 101
114 16 Stockholm
Sweden
Phone +46 (0)735 53 43 90
E-mail [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/