Tharshini,
Please read the documentation for -merge- to understand how it works.
Do not -drop- anything after -merge- besides the _merge variable. You
have to keep all household members if you want to assign the parents'
ages and other characteristics to a child. How to do that was
explained in a previous post.
http://www.stata.com/statalist/archive/2009-06/msg00793.html
Friedrich
On Tue, Jul 28, 2009 at 4:30 PM, Tharshini
Thangavelu<[email protected]> wrote:
>
> Friedrich,
>
> 1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
> to upload the individual recode file, I couldn't because there was too many
> variables. As a results, I used this program. You can find more info on their
> website.
>
> 2.)I did as your suggestion. I uploaded the whole household member data, merged
> it with weight file and I did NOT use the command keep, only drop command to
> take away the _merge variable. Otherwise I cannot merge it with the individual
> file. I tried and it gave me an error message: _merge already defined.
>
> So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
> write the following command:
>
> merge clnr hhnr lnr using ir
>
> variables clnr hhnr lnr do not uniquely identify observations in the master data
> caseid was str12 now str15
>
>
> tab _merge
>
> _merge | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 23,199 78.11 78.11
> 2 | 3,100 10.44 88.55
> 3 | 3,402 11.45 100.00
> ------------+-----------------------------------
> Total | 29,701 100.00
>
>
> Now comes a tricky part for me. Using the following commands, doesn't give me
> the desired results.
> keep if _merge==3
> drop _merge
>
> This file, just at in the former case when tabulating hv105 (= age of household
> member) gives exactly same answer, that is only children's age is included 0-5
> years.
>
> But if I don't use the command keep or drop. I have the age of ALL household member.
>
> My question is should I keep the "_merge" variable ? According to what I have
> been reading, I thought the functioning of merge is to only keep if _merge ==3.
>
> 3.) In your former email you say that : I drop all children without height and
> weight data and all adults, including parents. In my analysis, I use as
> dependent variable child health measured by age for height Z-score and weight
> for age Z-score. For those children having these Z-score, I need to match them
> with their respective parents education, age and households characteristics
> ect.to see if mothers' father's with higher education have children with better
> child health measured bye Z-score. Therefore, shouldn't the way I was doing be
> correct? Or I have misunderstood completely.
>
>
> Thanks
> Tharshini
>
>
>
>
>
>
>
> On 2009-07-28, at 15:43, Friedrich Huebler wrote:
>> Tharshini,
>>
>> In step 3 you -drop- all children without height and weight data and
>> all adults, including all parents.
>>
>> You write "The household member data includes to many variables to
>> directly upload in stata." The flat household member recode file from
>> the Ghana DHS 2003 has 245 variables. The only version of Stata that
>> cannot hold 245 variables is Small Stata. Your -tab- output indicates
>> that you do not have Small Stata because you were able to work with
>> more than 26000 observations (see -help limits-). You should therefore
>> be able to open the complete household member file with Stata. I don't
>> know a program called "select" but it does not seem to be necessary.
>>
>> Friedrich
>>
>> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
>> Thangavelu<[email protected]> wrote:
>>> Hi Friedrich,
>>>
>>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the file
>>> for height and weight. A describtion of how to processed when merging and which
>>> identifying variables to chose in each and every file. I followed this doc.fil
>>> I merged the file according to the following way;
>>>
>>> 1.) The height and weight file for children up to 5 years old.
>>> rename HWHHID caseid
>>> rename HWLINE linenr
>>> sort caseid linenr
>>> save weight, replace
>>> clear exit
>>>
>>> 2.) The household member data includes to many variables to directly upload in
>>> stata, so I used the program "select", where I selected my variables of
>>> interest. Then I uploaded in stata;
>>>
>>> use hmr1
>>> rename hhid caseid
>>> rename hvidx linenr
>>> sort caseid linenr
>>> save hmr1, replace
>>>
>>> 3.) These two files was then merged together (master data = hmr1)
>>>
>>> merge caseid linenr using weight
>>>
>>> tab _merge
>>>
>>> _merge | Freq. Percent Cum.
>>> ------------+-----------------------------------
>>> 1 | 22,673 85.23 85.23
>>> 3 | 3,928 14.77 100.00
>>> ------------+-----------------------------------
>>> Total | 26,601 100.00
>>>
>>> . keep if _merge ==3
>>> (22673 observations deleted)
>>>
>>> . drop _merge
>>>
>>> Error message : linenr was byte now int
>>>
>>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>>> amount of obs. as in the weight file. I concluded the merging was correctly
>>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>>
>>> 4.) With this resulting file, I merged it with the individual recode file
>>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>>> mothers' line nr (lnr hc60)
>>>
>>> In the resulting file, I again renamed the identifying variables
>>> rename HV001 clnr
>>> rename HV002 hhnr
>>> rename hc60 lnr
>>> sort clnr hhnr lnr
>>> save thesis
>>> clear exit
>>>
>>> 5.)In the individual recode file, just as in the household member recode file, I
>>> used the program "select" to chose the variables and the following identifying
>>> variables were renamed. Cluster number (clnr v001), Household number (hhnr v002)
>>> and Respondent's line number (lnr v003).
>>>
>>> use ir1
>>> rename V001 clnr
>>> rename V002 hhnr
>>> rename V003 lnr
>>> sort clnr hhnr lnr
>>> save ir1, replace
>>>
>>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>>
>>> merge clnr hhnr lnr using thesis
>>> tab _merge
>>>
>>> _merge | Freq. Percent Cum.
>>> ------------+-----------------------------------
>>> 1 | 526 7.48 7.48
>>> 2 | 3,100 44.11 51.59
>>> 3 | 3,402 48.41 100.00
>>> ------------+-----------------------------------
>>> Total | 7,028 100.00
>>>
>>> . keep if _merge == 3
>>> (3626 observations deleted)
>>>
>>> . drop _merge
>>>
>>> Error message: variables clnr hhnr lnr do not uniquely identify observations in
>>> the master data. I hope this will help to solve the problem.
>>>
>>> / Tharshini
>>>
>>>
>>>
>>>
>>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>>> Tharshini,
>>>>
>>>> On June 11 you wrote that you wanted to merge the household member
>>>> file with the height and weight file. In response to your message you
>>>> received advice on how you can merge the data. The table in your
>>>> message of today makes clear that you did not merge the files
>>>> correctly because you only have persons up to 5 years of age. If you
>>>> want more help with this and the other problems you described you have
>>>> to show us your code, as explained in the Statalist FAQ.
>>>>
>>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>>
>>>> Friedrich
>>>>
>>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>>> Thangavelu<[email protected]> wrote:
>>>>>
>>>>> .tab hv105
>>>>> Age of |
>>>>> household |
>>>>> members | Freq. Percent Cum.
>>>>> ------------+-----------------------------------
>>>>> 0 | 772 22.69 22.69
>>>>> 1 | 706 20.75 43.45
>>>>> 2 | 655 19.25 62.70
>>>>> 3 | 689 20.25 82.95
>>>>> 4 | 553 16.26 99.21
>>>>> 5 | 27 0.79 100.00
>>>>> ------------+-----------------------------------
>>>>> Total | 3,402 100.00
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/