Friedrich,
Thanks! I followed the new way, which actually gave the same results as one of
the previous case. I did it in the original dataset, ie. household member
report. It seems strange that mothers' age have min value of 5. When tabulating,
only one observation had value 5. I assumed that it is missing value and
replaced it.
sum mage fage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mage | 9411 35.21177 8.643875 5 76
fage | 7265 44.92953 12.64342 19 99
________________________________________________________________________
The following output is to show the difference between in the two variables
which normally should be the same. I still have not figured out why this is not
the case.
. sum v730 fage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
v730 | 4463 40.52028 11.91459 18 99
fage | 7265 44.92953 12.64342 19 99
. sum v447a mage
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
v447a | 6502 29.45709 9.297519 15 49
mage | 9409 35.2182 8.633561 15 76
Until now I have used the variables v730 partners age, I assumed this as fathers
age and mothers age as v447a (womens age in years from household report.) For
education I used hc62 and v702 respectively. The method that was introduced by
finding the mothers and fathers age by including hhid hvidx is new for me and
confusing.
How do I now find mothers and fathers education level?
Does this mean that I don't have to merge with the individual recode file once I
have merged with anthropometric and household member data?
I think I am getting rather confused about how to work with microlevel data. I
actually did some regression outputs but I was working with the dataset which
had 3402 observation. That is I had deleated _merge variable and kept == 3(both
using and master data.)
Tharshini
On 2009-07-29, at 15:12, Friedrich Huebler wrote:
> Tharshini,
>
> Your excerpt from the data shows that you changed the sort order
> before you created the variables mage and fage. Try this:
>
> bysort hhid (hvidx): gen mage = hv105[hv112]
> bysort hhid (hvidx): gen fage = hv105[hv114]
>
> Friedrich
>
> On Wed, Jul 29, 2009 at 8:21 AM, Tharshini
> Thangavelu<[email protected]> wrote:
>> Hi,
>>
>>
>> I have been working to figure out the problem to produce the correct mothers'and
>> fathers' age. I came across the following advise on the statalist.
>>
>> http://www.stata.com/statalist/archive/2006-06/msg00323.html
>>
>> However, my dataset seem a bit more strange: The following variables are used to
>> created mothers' and father's age. I still haven't produced the satisfactory
>> results.
>>
>> hhid hv104 hv105 hv112 hv114 mage fage
>>
>> 1 1 2 4 2 1 10 4
>> 1 1 1 10 2 1 10 4
>> 1 1 1 42 . .
>> 1 1 2 36 . .
>> 1 1 2 2 2 1 10 4
>> 1 2 1 28 . .
>> 1 4 1 33 . .
>> 1 5 2 24 . . . .
>> 1 6 1 12 0 0 . .
>>
>> The 0's in hv112 and hv114 denotes mother not in HH, father not in HH
respectively.
>>
>> Here are several things that I don't understand;
>> 1.) the hhid is the case identification where hh denotes the household number
>> and id the household member. I have only describes household no.1, there are 5
>> id for the first member in the household no. 1, followed by member no.2 and
>> member no. 4, 5 and 6. We don't have any id for = 3. Why is there in HHID 5
>> values for household member 1.
>>
>> 2.) I use the follwoing command to produce mage and fage
>>
>> .by hhid : gen mage = hv105[hv112]
>> .by hhid : gen fage = hv105[hv112]
>>
>> I created mothers' and fathers' age variable in household member data, directly
>> after uploaded it into stata. Therefore I have not merged this dataset, which I
>> will do in later state. I thought, creating parents' age in the original data
>> would be more of advantage than doing it in the merged data set. Although,
>> something tells me that it should produced the same results.
>>
>> You can see the result above table, that it cannot be a satisfactory results.
>> Mothers age cannot be 10yrs or father 4yrs for the first hhid.
>>
>> I used the solution proposed at the above link, but the command assert did not
>> work. I don't understand what can be wrong!! If anyone have came across this
>> problem working with microlevel data, any help would be valuable!!
>>
>> Thanks alot!
>>
>>
>>
>>
>> ------>why are there 4 id for the first household member?
>>
>>
>>
>>
>> On 2009-07-29, at 01:45, Friedrich Huebler wrote:
>>> Tharshini,
>>>
>>> Please read the documentation for -merge- to understand how it works.
>>> Do not -drop- anything after -merge- besides the _merge variable. You
>>> have to keep all household members if you want to assign the parents'
>>> ages and other characteristics to a child. How to do that was
>>> explained in a previous post.
>>>
>>> http://www.stata.com/statalist/archive/2009-06/msg00793.html
>>>
>>> Friedrich
>>>
>>> On Tue, Jul 28, 2009 at 4:30 PM, Tharshini
>>> Thangavelu<[email protected]> wrote:
>>>>
>>>> Friedrich,
>>>>
>>>> 1.)The program "select" was suggested by DHS at the FAQ section.. When I wanted
>>>> to upload the individual recode file, I couldn't because there was too many
>>>> variables. As a results, I used this program. You can find more info on their
>>>> website.
>>>>
>>>> 2.)I did as your suggestion. I uploaded the whole household member data, merged
>>>> it with weight file and I did NOT use the command keep, only drop command to
>>>> take away the _merge variable. Otherwise I cannot merge it with the individual
>>>> file. I tried and it gave me an error message: _merge already defined.
>>>>
>>>> So I drop the _merge variable form the resulting file (uppsats.dta). Then, I
>>>> write the following command:
>>>>
>>>> merge clnr hhnr lnr using ir
>>>>
>>>> variables clnr hhnr lnr do not uniquely identify observations in the master
data
>>>> caseid was str12 now str15
>>>>
>>>>
>>>> tab _merge
>>>>
>>>> _merge | Freq. Percent Cum.
>>>> ------------+-----------------------------------
>>>> 1 | 23,199 78.11 78.11
>>>> 2 | 3,100 10.44 88.55
>>>> 3 | 3,402 11.45 100.00
>>>> ------------+-----------------------------------
>>>> Total | 29,701 100.00
>>>>
>>>>
>>>> Now comes a tricky part for me. Using the following commands, doesn't give me
>>>> the desired results.
>>>> keep if _merge==3
>>>> drop _merge
>>>>
>>>> This file, just at in the former case when tabulating hv105 (= age of household
>>>> member) gives exactly same answer, that is only children's age is included 0-5
>>>> years.
>>>>
>>>> But if I don't use the command keep or drop. I have the age of ALL household
>> member.
>>>>
>>>> My question is should I keep the "_merge" variable ? According to what I have
>>>> been reading, I thought the functioning of merge is to only keep if _merge ==3.
>>>>
>>>> 3.) In your former email you say that : I drop all children without height and
>>>> weight data and all adults, including parents. In my analysis, I use as
>>>> dependent variable child health measured by age for height Z-score and weight
>>>> for age Z-score. For those children having these Z-score, I need to match them
>>>> with their respective parents education, age and households characteristics
>>>> ect.to see if mothers' father's with higher education have children with better
>>>> child health measured bye Z-score. Therefore, shouldn't the way I was doing be
>>>> correct? Or I have misunderstood completely.
>>>>
>>>>
>>>> Thanks
>>>> Tharshini
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2009-07-28, at 15:43, Friedrich Huebler wrote:
>>>>> Tharshini,
>>>>>
>>>>> In step 3 you -drop- all children without height and weight data and
>>>>> all adults, including all parents.
>>>>>
>>>>> You write "The household member data includes to many variables to
>>>>> directly upload in stata." The flat household member recode file from
>>>>> the Ghana DHS 2003 has 245 variables. The only version of Stata that
>>>>> cannot hold 245 variables is Small Stata. Your -tab- output indicates
>>>>> that you do not have Small Stata because you were able to work with
>>>>> more than 26000 observations (see -help limits-). You should therefore
>>>>> be able to open the complete household member file with Stata. I don't
>>>>> know a program called "select" but it does not seem to be necessary.
>>>>>
>>>>> Friedrich
>>>>>
>>>>> On Tue, Jul 28, 2009 at 2:54 AM, Tharshini
>>>>> Thangavelu<[email protected]> wrote:
>>>>>> Hi Friedrich,
>>>>>>
>>>>>> When I downloaded the dataset for Ghana 2003, there was a doc.file in the
file
>>>>>> for height and weight. A describtion of how to processed when merging and
which
>>>>>> identifying variables to chose in each and every file. I followed this
doc.fil
>>>>>> I merged the file according to the following way;
>>>>>>
>>>>>> 1.) The height and weight file for children up to 5 years old.
>>>>>> rename HWHHID caseid
>>>>>> rename HWLINE linenr
>>>>>> sort caseid linenr
>>>>>> save weight, replace
>>>>>> clear exit
>>>>>>
>>>>>> 2.) The household member data includes to many variables to directly
upload in
>>>>>> stata, so I used the program "select", where I selected my variables of
>>>>>> interest. Then I uploaded in stata;
>>>>>>
>>>>>> use hmr1
>>>>>> rename hhid caseid
>>>>>> rename hvidx linenr
>>>>>> sort caseid linenr
>>>>>> save hmr1, replace
>>>>>>
>>>>>> 3.) These two files was then merged together (master data = hmr1)
>>>>>>
>>>>>> merge caseid linenr using weight
>>>>>>
>>>>>> tab _merge
>>>>>>
>>>>>> _merge | Freq. Percent Cum.
>>>>>> ------------+-----------------------------------
>>>>>> 1 | 22,673 85.23 85.23
>>>>>> 3 | 3,928 14.77 100.00
>>>>>> ------------+-----------------------------------
>>>>>> Total | 26,601 100.00
>>>>>>
>>>>>> . keep if _merge ==3
>>>>>> (22673 observations deleted)
>>>>>>
>>>>>> . drop _merge
>>>>>>
>>>>>> Error message : linenr was byte now int
>>>>>>
>>>>>> My own conclusion: Since _merge 3 = 3928 observations which is exactly same
>>>>>> amount of obs. as in the weight file. I concluded the merging was correctly
>>>>>> made. I also tried with the inverse case, i.e. having hmr as my master data.
>>>>>>
>>>>>> 4.) With this resulting file, I merged it with the individual recode file
>>>>>> (=womens file). Cluster number (clnrhv001), householdnr (hhnr hv002) and
>>>>>> mothers' line nr (lnr hc60)
>>>>>>
>>>>>> In the resulting file, I again renamed the identifying variables
>>>>>> rename HV001 clnr
>>>>>> rename HV002 hhnr
>>>>>> rename hc60 lnr
>>>>>> sort clnr hhnr lnr
>>>>>> save thesis
>>>>>> clear exit
>>>>>>
>>>>>> 5.)In the individual recode file, just as in the household member recode
>> file, I
>>>>>> used the program "select" to chose the variables and the following
identifying
>>>>>> variables were renamed. Cluster number (clnr v001), Household number (hhnr
>> v002)
>>>>>> and Respondent's line number (lnr v003).
>>>>>>
>>>>>> use ir1
>>>>>> rename V001 clnr
>>>>>> rename V002 hhnr
>>>>>> rename V003 lnr
>>>>>> sort clnr hhnr lnr
>>>>>> save ir1, replace
>>>>>>
>>>>>> 6.)Now, I merge the ir1.dta with the thesis.dta
>>>>>>
>>>>>> merge clnr hhnr lnr using thesis
>>>>>> tab _merge
>>>>>>
>>>>>> _merge | Freq. Percent Cum.
>>>>>> ------------+-----------------------------------
>>>>>> 1 | 526 7.48 7.48
>>>>>> 2 | 3,100 44.11 51.59
>>>>>> 3 | 3,402 48.41 100.00
>>>>>> ------------+-----------------------------------
>>>>>> Total | 7,028 100.00
>>>>>>
>>>>>> . keep if _merge == 3
>>>>>> (3626 observations deleted)
>>>>>>
>>>>>> . drop _merge
>>>>>>
>>>>>> Error message: variables clnr hhnr lnr do not uniquely identify
observations in
>>>>>> the master data. I hope this will help to solve the problem.
>>>>>>
>>>>>> / Tharshini
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2009-07-28, at 06:28, Friedrich Huebler wrote:
>>>>>>> Tharshini,
>>>>>>>
>>>>>>> On June 11 you wrote that you wanted to merge the household member
>>>>>>> file with the height and weight file. In response to your message you
>>>>>>> received advice on how you can merge the data. The table in your
>>>>>>> message of today makes clear that you did not merge the files
>>>>>>> correctly because you only have persons up to 5 years of age. If you
>>>>>>> want more help with this and the other problems you described you have
>>>>>>> to show us your code, as explained in the Statalist FAQ.
>>>>>>>
>>>>>>> http://www.stata.com/support/faqs/res/statalist.html#advice
>>>>>>>
>>>>>>> Friedrich
>>>>>>>
>>>>>>> On Mon, Jul 27, 2009 at 9:42 AM, Tharshini
>>>>>>> Thangavelu<[email protected]> wrote:
>>>>>>>>
>>>>>>>> .tab hv105
>>>>>>>> Age of |
>>>>>>>> household |
>>>>>>>> members | Freq. Percent Cum.
>>>>>>>> ------------+-----------------------------------
>>>>>>>> 0 | 772 22.69 22.69
>>>>>>>> 1 | 706 20.75 43.45
>>>>>>>> 2 | 655 19.25 62.70
>>>>>>>> 3 | 689 20.25 82.95
>>>>>>>> 4 | 553 16.26 99.21
>>>>>>>> 5 | 27 0.79 100.00
>>>>>>>> ------------+-----------------------------------
>>>>>>>> Total | 3,402 100.00
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
Tharshini THANGAVELU
Forskarbacken 8 / 101
114 16 Stockholm
Sweden
Phone +46 (0)735 53 43 90
E-mail [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/