Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: creating a variable name from observations of a string variable
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: creating a variable name from observations of a string variable
Date
Fri, 21 Sep 2012 10:46:37 +0100
This code shows some technique. As you evidently have panel data it is
important that a solution extends to that case.
clear
input str2 state X date
AK 70 1990
AL 35 1990
CA 110 1990
KY 56 1990
AK 80 2000
AL 50 2000
CA 130 2000
KY 70 2000
end
* # of panels: set according to data
local J = 4
* # of times: ditto
local T = 2
* balanced panel assumed
set obs `= `J' * `J' * `T' '
gen state1 = state
gen state2 = state
gen date12 = date
gen ratio = .
local k = 1
qui forval t = 1/`T' {
local start = `J' * (`t' - 1)
forval i = 1/`J' {
forval j = 1/`J' {
replace state1 = state[`start' + `i'] in `k'
replace state2 = state[`start' + `j'] in `k'
replace date12 = date[`start' + `j'] in `k'
replace ratio = X[`start' + `i'] / X[`start' + `j'] in `k'
local ++k
}
}
}
. list
+----------------------------------------------------------+
| state X date state1 state2 date12 ratio |
|----------------------------------------------------------|
1. | AK 70 1990 AK AK 1990 1 |
2. | AL 35 1990 AK AL 1990 2 |
3. | CA 110 1990 AK CA 1990 .6363636 |
4. | KY 56 1990 AK KY 1990 1.25 |
5. | AK 80 2000 AL AK 1990 .5 |
|----------------------------------------------------------|
6. | AL 50 2000 AL AL 1990 1 |
7. | CA 130 2000 AL CA 1990 .3181818 |
8. | KY 70 2000 AL KY 1990 .625 |
9. | . . CA AK 1990 1.571429 |
10. | . . CA AL 1990 3.142857 |
|----------------------------------------------------------|
11. | . . CA CA 1990 1 |
12. | . . CA KY 1990 1.964286 |
13. | . . KY AK 1990 .8 |
14. | . . KY AL 1990 1.6 |
15. | . . KY CA 1990 .5090909 |
|----------------------------------------------------------|
16. | . . KY KY 1990 1 |
17. | . . AK AK 2000 1 |
18. | . . AK AL 2000 1.6 |
19. | . . AK CA 2000 .6153846 |
20. | . . AK KY 2000 1.142857 |
|----------------------------------------------------------|
21. | . . AL AK 2000 .625 |
22. | . . AL AL 2000 1 |
23. | . . AL CA 2000 .3846154 |
24. | . . AL KY 2000 .7142857 |
25. | . . CA AK 2000 1.625 |
|----------------------------------------------------------|
26. | . . CA AL 2000 2.6 |
27. | . . CA CA 2000 1 |
28. | . . CA KY 2000 1.857143 |
29. | . . KY AK 2000 .875 |
30. | . . KY AL 2000 1.4 |
|----------------------------------------------------------|
31. | . . KY CA 2000 .5384616 |
32. | . . KY KY 2000 1 |
+----------------------------------------------------------+
With this data structure
1. There is no explosion in the number of result variables.
2. Which result is which is labelled jointly by -state1 state2- (use
-egen-'s -concat()- function if a single identifier is essential).
3. You can proceed fairly easily to further analyses.
4. You can -drop if state1 == state2- should the ratios identically 1
be redundant for further analyses.
(I thought about various cunning -merge-s, -reshape-s, -cross-es, etc,
without seeing a clear way ahead. Neater code is certainly not ruled
out.)
Nick
On Fri, Sep 21, 2012 at 1:23 AM, Nick Cox <[email protected]> wrote:
> Your first post did appear (see
> http://www.stata.com/statalist/archive/2012-09/msg00740.html). If in
> doubt, check the archives first.
>
> Your example indicates use of 50 U.S. states. But you appear to want
> to -generate- one new variable for each pair of states, which would be
> 2500 variables, as the ratios a/b and b/a are different in general. If
> your example is just notional, and your panels are something else, the
> problem is, for all we can see, even larger. Also you evidently have
> data for several dates, so what does that imply? What are you going to
> do with all those variables?
>
> Note that each variable would just be a constant any way.
>
> One reason why you got no reply first-time round may be that people
> couldn't believe what you were asking, or could believe it and did not
> want to spell out that it was a bad idea.
>
> One possibility for your comparisons is a matrix. Another is a single
> variable for each date in a long data structure.
>
> I suggest that you need to think through what you were asking, or
> alternatively explain what I have misunderstood.
>
> Note that in Stata discussions "variable" should mean "variable" in
> Stata's sense. For a less gnomic statement, see
> http://www.stata.com/statalist/archive/2008-08/msg01258.html
>
> Nick
>
> On Thu, Sep 20, 2012 at 11:24 PM, david parsley <[email protected]> wrote:
>> apparently my 1st attempt to post this did not work....I'm trying again.
>>
>> I have a panel with a string variable identifying the cross-sectional
>> observations, e.g.,
>>
>> state X1 date
>> AK 70 1990
>> AL 35 1990
>> CA 110 1990
>> KY 56 1990
>>
>> and I want to create a variable that compares the ith and jth
>> observations of x1 and
>> has its variable name taken from transformations of the string
>> variable 'state', e.g.,
>>
>> RAKAL = X1(1)/X1(2) 1990
>> RAKCA = X1(1)/X1(3) 1990
>> RAKKY = X1(1)/X1(4) 1990
>> RALCA = X1(2)/X1(3) 1990
>> RALKY = X1(2)/X1(4) 1990
>> RCAKY = X1(3)/X1(4) 1990
>>
>> I want to create all successive comparisons. So far, I HAVE been able to create
>> variables R12, R13, R14, R23, R24, R34 using nested forvalues loops.
>>
>> by date: gen new`i's`j'=X1[`i']/X1[`j']
>>
>>
>> But, I'd rather have the variable name more reflective of what they actually are
>> by using the values of the 'state' variable.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/