Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: using information from value label to generate new variables
From
Nick Cox <[email protected]>
To
Nick Cox <[email protected]>, "'[email protected]'" <[email protected]>
Subject
st: RE: using information from value label to generate new variables
Date
Thu, 7 Jun 2012 19:12:46 +0100
Sorry; please ignore this. It's based on reading only part of Evelyn's question. I will send a better answer soon.
Nick
[email protected]
-----Original Message-----
From: Nick Cox
Sent: 07 June 2012 19:12
To: '[email protected]'
Subject: RE: using information from value label to generate new variables
My main reaction is that the -egen- function -anymatch()- is there to create these indicator variables.
This example creates a sandpit for me to play in and shows what I mean.
clear
set obs 10
gen id = _n
label def country 1 Albania 2 Belgium 3 "Czech Republic" 4 Denmark 5 Estonia 6 Finland 7 Greece 8 Haiti 9 Iceland 10 Japan
forval j = 1/5 {
gen v`j' = ceil(10 * runiform())
label val v`j' country
}
l
forval k = 1/10 {
egen is`k' = anymatch(v*), val(`k')
label var is`k' "`: label country `k''"
}
d is*
l is*
This is what one run looks like (yours may differ because of different random numbers):
clear
. set obs 10
obs was 0, now 10
. gen id = _n
. label def country 1 Albania 2 Belgium 3 "Czech Republic" 4 Denmark 5 Estonia 6 Finland 7 Greece 8 Haiti 9 Iceland 10 Japan
. forval j = 1/5 {
2. gen v`j' = ceil(10 * runiform())
3. label val v`j' country
4. }
.
. l
+-------------------------------------------------------------+
| id v1 v2 v3 v4 v5 |
|-------------------------------------------------------------|
1. | 1 Haiti Finland Greece Haiti Denmark |
2. | 2 Greece Greece Greece Haiti Finland |
3. | 3 Denmark Belgium Estonia Finland Greece |
4. | 4 Albania Belgium Belgium Denmark Greece |
5. | 5 Iceland Greece Finland Belgium Finland |
|-------------------------------------------------------------|
6. | 6 Greece Finland Japan Finland Estonia |
7. | 7 Czech Republic Iceland Iceland Albania Japan |
8. | 8 Greece Albania Greece Denmark Japan |
9. | 9 Denmark Estonia Albania Estonia Japan |
10. | 10 Iceland Iceland Finland Iceland Japan |
+-------------------------------------------------------------+
.
. forval k = 1/10 {
2. egen is`k' = anymatch(v*), val(`k')
3. label var is`k' "`: label country `k''"
4. }
.
. d is*
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------------------------------------
is1 byte %8.0g Albania
is2 byte %8.0g Belgium
is3 byte %8.0g Czech Republic
is4 byte %8.0g Denmark
is5 byte %8.0g Estonia
is6 byte %8.0g Finland
is7 byte %8.0g Greece
is8 byte %8.0g Haiti
is9 byte %8.0g Iceland
is10 byte %8.0g Japan
. l is*
+------------------------------------------------------------+
| is1 is2 is3 is4 is5 is6 is7 is8 is9 is10 |
|------------------------------------------------------------|
1. | 0 0 0 1 0 1 1 1 0 0 |
2. | 0 0 0 0 0 1 1 1 0 0 |
3. | 0 1 0 1 1 1 1 0 0 0 |
4. | 1 1 0 1 0 0 1 0 0 0 |
5. | 0 1 0 0 0 1 1 0 1 0 |
|------------------------------------------------------------|
6. | 0 0 0 0 1 1 1 0 0 1 |
7. | 1 0 1 0 0 0 0 0 1 1 |
8. | 1 0 0 1 0 0 1 0 0 1 |
9. | 1 0 0 1 1 0 0 0 0 1 |
10. | 0 0 0 0 0 1 0 0 1 1 |
+------------------------------------------------------------+
Nick
[email protected]
Evelyn Ersanilli
I have a cross-sectional survey dataset.
For question "a", people were asked to list up to 9 countries.
All variables a1-a9 (numeric, up to 3 digits) have the same value label; "locations".
Because it also attached to other variables, the value label locations does not only hold the 3 digit country codes, but also 5-digit regional codes.
For each country (eg France, Germany, Zimbabwe, etc) that was mentioned I would like to generate a variable that is 0 if that country has not been named as any of the 9 replies (many people gave fewer than 9 replies) by a respondent, and 1 if the country has been named as any of the up to 9 replies by a respondent (and missing if the respondent didn't answer a1-a9).
These variables should have the names of the country.
Building on online examples I've gotten close to what I want, but I have problem correctly & efficiently delimiting the list of newly generated variables.
I first tried to get the values and labels from the first answer (a1). However this risks omitting countries that have only been named in a2,a3 etc.
In my second attempt I therefore tried to abstracts the values and labels from the value label 'locations' using labellist and r()
The problem with Attempt 2 is that r() only saves up to (244?) characters, which is fewer that all values together and I haven't found out how to increase the storage capacity.
Ideally I would also limit the abstraction of lables&values to only the 1-3digit country codes., leaving out the 5-digit regional codes.
Any alternative suggestions would be welcome
Here is my syntax:
*-------------------Attempt I-----------
//Step 1: abstract labels
levelsof a1, local(a1_levels)
foreach val of local a1_levels {
local c`val' : label locations `val'
}
macro list
//Step 2: generate dummies
foreach X of local a1_levels {
egen var`X'=anymatch(a1 a2 a3 a4 a5 a6 a7 a8 a9), values(`X')
}
//Step 3: label and rename
local variablelist "var"
foreach variable of local variablelist{
foreach value of local a1_levels{
label variable `variable'`value' "`c`value''"
local stringy =strtoname("`c`value''") //needed because some country names contain spaces or other illegitimate characters
rename `variable'`value' `stringy'
}
}
*-----------------------------------------
*-------------------Attempt II-----------
//Step 1: abstract labels
quietly: labellist locations
local loc_levels= r(locations_values)
foreach val of local loc_levels { /* loop over all values in local list `var'_levels */
local c`val' : label locations `val' /* create macro that contains label for each value */
}
macro list
//etc
*-----------------------------------------
For step 2 I've also tried:
*-----------------------------------------
foreach X of numlist 2/935 {
egen var`X'=anymatch(a1 a2 a3 a4 a5 a6 a7 a8 a9), values(`X')
}
*-----------------------------------------
But that generates way too many variables as many of the values between 2 and 935 do not have a country code associated with it.
I could of course just look up all the value that were assigned a label in locations, but where's the fun in that..
Kind regards
Evelyn
Departmental Lecturer
Oxford Department of International Development (QEH)
International Migration Institute
University of Oxford
3 Mansfield Road
Oxford OX1 3TB
United Kingdom
Tel: +44 (0)1865 281717
http://www.eumagine.org
http://www.migration.ox.ac.uk
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/