Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: using information from value label to generate new variables

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	st: RE: using information from value label to generate new variables
Date	Thu, 7 Jun 2012 19:11:42 +0100

My main reaction is that the -egen- function -anymatch()- is there to create these indicator variables. 

This example creates a sandpit for me to play in and shows what I mean. 

clear
set obs 10
gen id = _n
label def country 1 Albania 2 Belgium 3 "Czech Republic" 4 Denmark 5 Estonia 6 Finland 7 Greece 8 Haiti 9 Iceland 10 Japan
forval j = 1/5 {
	gen v`j' = ceil(10 * runiform())
	label val v`j' country
}

l 

forval k = 1/10 { 
	egen is`k' = anymatch(v*), val(`k') 
	label var is`k' "`: label country `k''"
} 

d is* 
l is*

This is what one run looks like (yours may differ because of different random numbers): 

clear

. set obs 10
obs was 0, now 10

. gen id = _n

. label def country 1 Albania 2 Belgium 3 "Czech Republic" 4 Denmark 5 Estonia 6 Finland 7 Greece 8 Haiti 9 Iceland 10 Japan

. forval j = 1/5 {
  2.         gen v`j' = ceil(10 * runiform())
  3.         label val v`j' country
  4. }

. 
. l 

     +-------------------------------------------------------------+
     | id               v1        v2        v3        v4        v5 |
     |-------------------------------------------------------------|
  1. |  1            Haiti   Finland    Greece     Haiti   Denmark |
  2. |  2           Greece    Greece    Greece     Haiti   Finland |
  3. |  3          Denmark   Belgium   Estonia   Finland    Greece |
  4. |  4          Albania   Belgium   Belgium   Denmark    Greece |
  5. |  5          Iceland    Greece   Finland   Belgium   Finland |
     |-------------------------------------------------------------|
  6. |  6           Greece   Finland     Japan   Finland   Estonia |
  7. |  7   Czech Republic   Iceland   Iceland   Albania     Japan |
  8. |  8           Greece   Albania    Greece   Denmark     Japan |
  9. |  9          Denmark   Estonia   Albania   Estonia     Japan |
 10. | 10          Iceland   Iceland   Finland   Iceland     Japan |
     +-------------------------------------------------------------+

. 
. forval k = 1/10 { 
  2.         egen is`k' = anymatch(v*), val(`k') 
  3.         label var is`k' "`: label country `k''"
  4. } 

. 
. d is*


              storage  display     value
variable name   type   format      label      variable label
---------------------------------------------------------------------------------------------------------------------------------------------
is1             byte   %8.0g                  Albania
is2             byte   %8.0g                  Belgium
is3             byte   %8.0g                  Czech Republic
is4             byte   %8.0g                  Denmark
is5             byte   %8.0g                  Estonia
is6             byte   %8.0g                  Finland
is7             byte   %8.0g                  Greece
is8             byte   %8.0g                  Haiti
is9             byte   %8.0g                  Iceland
is10            byte   %8.0g                  Japan

. l is* 

     +------------------------------------------------------------+
     | is1   is2   is3   is4   is5   is6   is7   is8   is9   is10 |
     |------------------------------------------------------------|
  1. |   0     0     0     1     0     1     1     1     0      0 |
  2. |   0     0     0     0     0     1     1     1     0      0 |
  3. |   0     1     0     1     1     1     1     0     0      0 |
  4. |   1     1     0     1     0     0     1     0     0      0 |
  5. |   0     1     0     0     0     1     1     0     1      0 |
     |------------------------------------------------------------|
  6. |   0     0     0     0     1     1     1     0     0      1 |
  7. |   1     0     1     0     0     0     0     0     1      1 |
  8. |   1     0     0     1     0     0     1     0     0      1 |
  9. |   1     0     0     1     1     0     0     0     0      1 |
 10. |   0     0     0     0     0     1     0     0     1      1 |
     +------------------------------------------------------------+

Nick 
[email protected] 

Evelyn Ersanilli

I have a cross-sectional survey dataset. 
For question "a", people were asked to list up to 9 countries.
All variables a1-a9 (numeric, up to 3 digits) have the same value label; "locations".
Because it also attached to other variables, the value label locations does not only hold the 3 digit country codes, but also 5-digit regional codes.

For each country (eg France, Germany, Zimbabwe, etc) that was mentioned I would like to generate a variable that is  0 if that country has not been named as any of the 9 replies (many people gave fewer than 9 replies) by a respondent, and 1 if the country has been named as any of the up to 9 replies by a respondent (and missing if the respondent didn't answer a1-a9).
These variables should have the names of the country.

Building on online examples I've gotten close to what I want, but I have problem correctly & efficiently delimiting the list of newly generated variables.
I first tried to get the values and labels from the first answer (a1). However this risks omitting countries that have only been named in a2,a3 etc.
In my second attempt I therefore tried to abstracts the values and labels from the value label 'locations' using labellist and r()
The problem with Attempt 2 is that r() only saves up to (244?) characters, which is fewer that all values together and I haven't found out how to increase the storage capacity.
Ideally I would also limit the abstraction of lables&values to only the 1-3digit country codes., leaving out the 5-digit regional codes.

Any alternative suggestions  would be welcome


Here is my syntax:


*-------------------Attempt I-----------
//Step 1: abstract labels
levelsof a1, local(a1_levels)        
    foreach val of local a1_levels {   
    local c`val' : label locations `val'  
    }
macro list

//Step 2: generate dummies
foreach X of local a1_levels {  
egen var`X'=anymatch(a1 a2 a3 a4 a5 a6 a7 a8 a9), values(`X')
}  

//Step 3: label and rename
local variablelist "var"
foreach variable of local variablelist{     
	foreach value of local a1_levels{     
	label variable `variable'`value' "`c`value''"
	local stringy =strtoname("`c`value''")		//needed because some country names contain spaces or other illegitimate characters
	rename `variable'`value' `stringy'
	}
}
*-----------------------------------------

*-------------------Attempt II-----------
//Step 1: abstract labels
quietly: labellist locations
local loc_levels= r(locations_values)
	foreach val of local loc_levels {   /* loop over all values in local list `var'_levels */
    local c`val' : label locations `val'  /* create macro that contains label for each value */
    }
macro list
//etc
*-----------------------------------------





For step 2 I've also tried:
*-----------------------------------------
foreach X of numlist 2/935 {  
	egen var`X'=anymatch(a1 a2 a3 a4 a5 a6 a7 a8 a9), values(`X')
			}  
*-----------------------------------------
But that generates way too many variables as many of the values between 2 and 935 do not have a country code associated with it.
I could of course just look up all the value that were assigned a label in locations, but where's the fun in that..



Kind regards


Evelyn

Departmental Lecturer
Oxford Department of International Development (QEH)
International Migration Institute
University of Oxford
3 Mansfield Road
Oxford OX1 3TB
United Kingdom
Tel: +44 (0)1865 281717

http://www.eumagine.org
http://www.migration.ox.ac.uk


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: using information from value label to generate new variables
  - From: Nick Cox <[email protected]>

References:
- st: using information from value label to generate new variables
  - From: Evelyn Ersanilli <[email protected]>

Prev by Date: RE: st: data manipulation prob.
Next by Date: st: RE: using information from value label to generate new variables
Previous by thread: st: RE: RE: using information from value label to generate new variables
Next by thread: st: RE: using information from value label to generate new variables
Index(es):
- Date
- Thread