Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE : Disaggregating the values taken by a Stata variable
From
<[email protected]>
To
<[email protected]>
Subject
st: RE : Disaggregating the values taken by a Stata variable
Date
Sun, 2 Jun 2013 15:14:58 +0000
Thank you so much to Nick, Maarteen and Sergiy. All your suggestions will help me addressing this issue. Sergiy's answer with an example makes it even more concrete.
Regards,
Ruolz Ariste.
On 31 May 2013 12:21, Ruolz <[email protected]> wrote:
> I would like to know if it is possible to assign more than one value to a Stata variable under the same condition. Here is my problem:
>
> In my dataset, I have a variable called “economic_region”. I would like to:
>
> 1) assign sub-regions to each of these economic regions
> 2) assign Postal Codes (PC) or Forward Sortation Area (FSA: the first three characters of the PC) to each of these sub-regions (in order to be able to do a "many to one" match later).
>
> In other words, I would like to disaggregate these economic regions. Note that I know the list of the sub-regions for a given economic region and I also know the list of the FSA for a given sub-region, but this information is not in my dataset and I would like to include it. If I had the reverse problem (aggregation), it would be easy for me to solve. For ex, suppose that I wanted to create the economic region named Ottawa from three sub-regions: Nepean, Kanata, Rockland, it wouldbe:
>
> gen str10 economic_region = ”Ottawa” if sub_region == “Nepean” | sub_region == “Kanata” | sub_region == “Rockland”
>
> But, I want to disaggregate instead. I have the economic region named Ottawa and I want to create three sub-regions from it: Nepean, Kanata, Rockland. I thought I could do that by assigning multiple values to the variable called sub-region. I know how to assign more than one value to a Stata variable under different conditions, but I don’t know how to assign more than one value to a Stata variable under the same condition (or if this is even possible!). Can you tell me if it is possible to do that in Stata and how? I would appreciate your help on this matter.
>
Date: Fri, 31 May 2013 12:36:45 +0100
From: Nick Cox <[email protected]>
Subject: Re: st: Disaggregating the values taken by a Stata variable
We ask list members to use full real names.
I am not clear what you are seeking here. A given observation can hold
one and one value for a given variable. In the case of a string
variable nothing stops that being anything for a string value you
like subject to the limitations on string variable size.
In your situation data on region and sub-region would be held
naturally in two variables. To add more detail, I suspect that you are
reaching for a technique such as is described in
FAQ . . . . . . . . . . . Defining group characteristics to create subsets
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. F. Baum
7/11 How do you efficiently define group characteristics
in your data in order to create subsets?
http://www.stata.com/support/faqs/data-management/
group-characteristics-for-subsets/
One solution lies in adding more variables but you can set up a
translation through -merge-. But to add detail, you need to add
detail; there is no way to escape that.
Nick
[email protected]
---------------------------------------------------------------------
Date: Fri, 31 May 2013 13:39:35 +0200
From: Maarten Buis <[email protected]>
Subject: Re: st: Disaggregating the values taken by a Stata variable
The easiest solution is to merge on multiple variables, in this case
region sub-region, and PC. That way you don't have to create a new
variables.
If you want to store this data in one variable, then that is typically
done in the form of a string variable. All regions and subregions are
given short (often numeric codes, but stored as strings), and these
are pasted together such that the first x characters identify the
region, the next y characters the subregion, etc.
The second method does work, but it is easy to introduce a bug that
way. So I recommend to just merge on multiple variables.
Hope this helps,
Maarten
---------------------------------------------------------------------------------------
Date: Fri, 31 May 2013 19:08:52 -0400
From: Sergiy Radyakin <[email protected]>
Subject: Re: st: Disaggregating the values taken by a Stata variable
you write that you know how to do aggregation, so just do it on the
level of postal codes (extend at ellipsis with the codes) as shown
below. or if you don't have postal codes, that how do you identify the
location? merging is of course better.
Sergiy.
clear
input str6 pc str6 region
K2K0A9 Ottawa
K4K0A2 Ottawa
K4K1A4 Ottawa
K2K1A2 Ottawa
K4K1W2 Ottawa
K2W1J3 Ottawa
end
list
generate sub_reg=""
replace sub_reg="kanata" if pc=="K2K0A9" | pc=="K2K0B2" | pc=="K2K1A2"
| /*... |*/ pc=="K2W1J3"
replace sub_reg="rockland" if pc=="K4K0A2" | pc=="K4K1A4" |
pc=="K4K1A5" | /*... |*/ pc=="K4K1W2"
list
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/