Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: one data management question
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: one data management question
Date
Sun, 6 Mar 2011 09:34:33 +0000
I just needed to see enough real data to be clearer on the rules.
Your example only makes sense to me if your one variable is string.
Run this example to see some technique. Insert lots of -list- or
-edit- commands so that you can see what each step does.
clear
input str5 var
areaA
1
hot
3
4
2
cold
4
areaB
1
warm
6
2
rainy
42
end
gen area = var if substr(var,1,4) == "area"
replace area = area[_n-1] if missing(area)
drop if var == area
gen weather = var if missing(real(var))
gen id = real(var) if !missing(weather[_n+1])
replace weather = weather[_n-1] if missing(id) & missing(weather)
replace id = id[_n-1] if missing(id)
drop if missing(weather)
drop if missing(real(var))
sort area id, stable
by area id : gen j = _n
reshape wide var , i(area id) j(j)
destring var*, replace
2011/3/6 Grace Jessie <[email protected]>:
> Nick, thanks for reminder.
> Indeed, I entered one data just to show my problem. My real data is large, so impossible to show to you.
> It is added that the observation value "area?" of var is characteristic with four characters "area" and that only the indicative string to distinguish each portfolio "hot cold warm rainy" contains alphabetic characters in each portfolio.
> That is all.
Nick Cox
>> This looks like fake data intended to show the flavour of your problem.
>>
>> Thanks, but I think a solution would be easier if you showed real data.
Grace Jessie <[email protected]>:
>> > I have a data like the following with only one variable---var.
>> >
>> > +-------+
>> > | var |
>> > |-------|
>> > 1. | areaA |
>> > 2. | 1 |
>> > 3. | hot |
>> > 4. | 3 |
>> > 5. | 4 |
>> > |-------|
>> > 6. | 2 |
>> > 7. | cold |
>> > 8. | 4 |
>> > 9. | areaB |
>> > 10. | 1 |
>> > |-------|
>> > 11. | warm |
>> > 12. | 6 |
>> > 13. | 2 |
>> > 14. | rainy |
>> > +-------+
>> > There are some areas and some big portfolios within each area in the data. The number of observations for each porfolio varies and the max is 4. If the observation number for each portfolio equals 4, the 4 observations represent id weather x y,respectively and orderly.If the number is less than 4, such as 3, it says the last observation for y is missing.The indicative string is "hot cold warm rainy" which is used to distinguish each portfolio. Each portfolio begins from the observation one before "hot cold warm rainy" and ends one before another portfolio begins.For example, there are two porfolios in areaA. One is "1 hot 3 4", and the other is "2 code 4".
>> > Now I want to change the data to another one available for regression as follows.
>> >
>> > +-----------------------------+
>> > | area id weather x y |
>> > |-----------------------------|
>> > 1. | A 1 hot 3 4 |
>> > 2. | A 2 cold 4 |
>> > 3. | B 1 warm 6 |
>> > 4. | B 2 rainy |
>> > +-----------------------------+
>> > How to realize it?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/