Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: creating a variable name from observations of a string variable

From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: creating a variable name from observations of a string variable
Date   Fri, 21 Sep 2012 10:46:37 +0100

This code shows some technique. As you evidently have panel data it is
important that a solution extends to that case.

input str2 state    X    date
 AK    70    1990
 AL    35    1990
 CA    110    1990
 KY    56    1990
 AK    80    2000
 AL    50    2000
 CA    130  2000
 KY    70  2000
* # of panels: set according to data
local J = 4
* # of times: ditto
local T = 2
* balanced panel assumed

set obs `= `J' * `J' * `T' '

gen state1 = state
gen state2 = state
gen date12 = date
gen ratio = .

local k = 1
qui forval t = 1/`T' {
	local start = `J' * (`t' - 1)
	forval i = 1/`J' {
		forval j = 1/`J' {
			replace state1 = state[`start' + `i'] in `k'
			replace state2 = state[`start' + `j'] in `k'
			replace date12 = date[`start' + `j'] in `k'
			replace ratio = X[`start' + `i'] / X[`start' + `j'] in `k'
			local ++k

. list

     | state     X   date   state1   state2   date12      ratio |
  1. |    AK    70   1990       AK       AK     1990          1 |
  2. |    AL    35   1990       AK       AL     1990          2 |
  3. |    CA   110   1990       AK       CA     1990   .6363636 |
  4. |    KY    56   1990       AK       KY     1990       1.25 |
  5. |    AK    80   2000       AL       AK     1990         .5 |
  6. |    AL    50   2000       AL       AL     1990          1 |
  7. |    CA   130   2000       AL       CA     1990   .3181818 |
  8. |    KY    70   2000       AL       KY     1990       .625 |
  9. |           .      .       CA       AK     1990   1.571429 |
 10. |           .      .       CA       AL     1990   3.142857 |
 11. |           .      .       CA       CA     1990          1 |
 12. |           .      .       CA       KY     1990   1.964286 |
 13. |           .      .       KY       AK     1990         .8 |
 14. |           .      .       KY       AL     1990        1.6 |
 15. |           .      .       KY       CA     1990   .5090909 |
 16. |           .      .       KY       KY     1990          1 |
 17. |           .      .       AK       AK     2000          1 |
 18. |           .      .       AK       AL     2000        1.6 |
 19. |           .      .       AK       CA     2000   .6153846 |
 20. |           .      .       AK       KY     2000   1.142857 |
 21. |           .      .       AL       AK     2000       .625 |
 22. |           .      .       AL       AL     2000          1 |
 23. |           .      .       AL       CA     2000   .3846154 |
 24. |           .      .       AL       KY     2000   .7142857 |
 25. |           .      .       CA       AK     2000      1.625 |
 26. |           .      .       CA       AL     2000        2.6 |
 27. |           .      .       CA       CA     2000          1 |
 28. |           .      .       CA       KY     2000   1.857143 |
 29. |           .      .       KY       AK     2000       .875 |
 30. |           .      .       KY       AL     2000        1.4 |
 31. |           .      .       KY       CA     2000   .5384616 |
 32. |           .      .       KY       KY     2000          1 |

With this data structure

1. There is no explosion in the number of result variables.

2. Which result is which is labelled jointly by -state1 state2-  (use
-egen-'s -concat()- function if a single identifier is essential).

3. You can proceed fairly easily to further analyses.

4. You can -drop if state1 == state2- should the ratios identically 1
be redundant for further analyses.

(I thought about various cunning -merge-s, -reshape-s, -cross-es, etc,
without seeing a clear way ahead. Neater code is certainly not ruled


On Fri, Sep 21, 2012 at 1:23 AM, Nick Cox <[email protected]> wrote:
> Your first post did appear (see
> If in
> doubt, check the archives first.
> Your example indicates use of 50 U.S. states. But you appear to want
> to -generate- one new variable for each pair of states, which would be
> 2500 variables, as the ratios a/b and b/a are different in general. If
> your example is just notional, and your panels are something else, the
> problem is, for all we can see, even larger. Also you evidently have
> data for several dates, so what does that imply? What are you going to
> do with all those variables?
> Note that each variable would just be a constant any way.
> One reason why you got no reply first-time round may be that people
> couldn't believe what you were asking, or could believe it and did not
> want to spell out that it was a bad idea.
> One possibility for your comparisons is a matrix. Another is a single
> variable for each date in a long data structure.
> I suggest that you need to think through what you were asking, or
> alternatively explain what I have misunderstood.
> Note that in Stata discussions "variable" should mean "variable" in
> Stata's sense. For a less gnomic statement, see
> Nick
> On Thu, Sep 20, 2012 at 11:24 PM, david parsley <[email protected]> wrote:
>> apparently my 1st attempt to post this did not work....I'm trying again.
>> I have a panel with a string variable identifying the cross-sectional
>> observations, e.g.,
>> state    X1    date
>> AK    70    1990
>> AL    35    1990
>> CA    110    1990
>> KY    56    1990
>> and I want to create a variable that compares the ith and jth
>> observations of x1 and
>> has its variable name taken from transformations of the string
>> variable 'state', e.g.,
>> RAKAL = X1(1)/X1(2)    1990
>> RAKCA = X1(1)/X1(3)    1990
>> RAKKY = X1(1)/X1(4)    1990
>> RALCA = X1(2)/X1(3)    1990
>> RALKY = X1(2)/X1(4)    1990
>> RCAKY = X1(3)/X1(4)    1990
>> I want to create all successive comparisons.  So far, I HAVE been able to create
>> variables R12, R13, R14, R23, R24, R34 using nested forvalues loops.
>>   by date: gen new`i's`j'=X1[`i']/X1[`j']
>> But, I'd rather have the variable name more reflective of what they actually are
>> by using the values of the 'state' variable.
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index