Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: RE: Creating a group variable based on values in observations
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: RE: RE: Creating a group variable based on values in observations
Date
Sat, 21 May 2011 09:33:32 +0100
Going for the interpretation that 1 ... 7 are days of the week, then
another way to do it would have been
gen signature = ""
bysort id (market) : replace signature = signature[_n-1] + string(market)
which creates signatures such as "135", which as Bert says can be
mapped to integers 1 up with -egen, group()-. Strictly, -group()- is
an -egen- function, not an option.
Use the -label- option and they remain intelligible.
Use -tabulate- and you get indicator variables.
Nick
On Sat, May 21, 2011 at 2:32 AM, Bert Jung <[email protected]> wrote:
> Hi Chris,
>
> I am not sure if I understand your problem but maybe this helps:
>
> If you would like group IDs for all unique values within your
> "openmarkets" variable, you could use the "group" option of -egen-. I
> suspect that requires that the values in the openmarkets variable must
> always have the same order since -egen- would consider "2-3-5" and
> "5-3-2" as two different groups but to you and me they're the same.
>
> If "openmarkets" is a string variable you could also remove the "-"
> with -subinstr- or with -destring openmarkets, ignore("-") gen(new)-.
> Since the values 1 to 5 seem to uniquely identify the week days (?)
> that would be similar to Sarah's suggestion.
>
> Cheers,
> Bert
>
>
> On Fri, May 20, 2011 at 7:49 PM, Sarah Edgington <[email protected]> wrote:
>> Chris,
>> I think there are a number of different ways to solve this problem.
>> How many markets are you dealing with? If it's fewer than 20 here's a
>> solution that gets you around the reshaping issue.
>> First, create a new market id where market 1=1, market 2=10, market 3=100,
>> etc. Then sum this id within days. That will give you a group variable
>> where each place represents a particular market (starting with market 1 on
>> the right) and a 1 or 0 tells you if the market was open or not. Your day
>> one group id would be 11111. Day two's would be 10110.
>>
>> gen double mid=10^(market-1)
>> bysort day: egen double margroup=total(mid)
>>
>> This only works well up to 19 markets because of precision issues. In
>> principle, though, you could do it in any base and have everything add up to
>> create a unique group id. So if you used 2 as your base instead of 10 (that
>> is, change the first line to gen double mid=2^(market-1) ) you'd be able to
>> accommodate more markets. Doing that you lose the ability to easily look at
>> it and read which markets are open straight from the group variable. That
>> doesn't really matter for analytical purposes, though.
>>
>> -Sarah
>>
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Chris Parker
>> Sent: Friday, May 20, 2011 3:05 PM
>> To: [email protected]
>> Subject: st: RE: Creating a group variable based on values in observations
>>
>> Hi,
>>
>> I think I have a solution. My data is a bit too big to do this all at once
>> (reshape gives a return code telling me productmarket takes on too many
>> values) but here is what works in case anyone runs into a similar
>> problem:
>>
>> . gen marketdup = market
>> . reshape wide market, i(date) j(marketdup) . egen openmarkets =
>> concat(market*), punc(_) . encode openmarkets, gen(groupid) . drop
>> openmarkets . reshape long . drop marketdup
>>
>> Chris
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Chris Parker
>> Sent: Friday, May 20, 2011 9:36 PM
>> To: [email protected]
>> Subject: st: Creating a group variable based on values in observations
>>
>> Hi Statalist,
>>
>> I have a problem that's been troubling me for a while now. I have daily
>> prices for several products in several markets over time. I use the data to
>> measure price dispersion as the coefficient of variation of prices on a day
>> for a product. However, not every market is open on every day.
>> Systematic differences between the markets that are open (such as average
>> distance between markets, percent of markets of type A, etc.) could impact
>> price dispersion, so I need to control for this. For each product I would
>> like to create a variable that lists which group of markets was open on each
>> day (openmarkets in the example below). I could then encode this variable
>> and include i.groupid which controls for these differences.
>>
>> Example data for one of the products:
>>
>> day market openmarkets groupid
>> 1 1 1-2-3-4-5 1
>> 1 2 1-2-3-4-5 1
>> 1 3 1-2-3-4-5 1
>> 1 4 1-2-3-4-5 1
>> 1 5 1-2-3-4-5 1
>> 2 2 2-3-5 2
>> 2 3 2-3-5 2
>> 2 5 2-3-5 2
>> 3 1 1-3-4-5 3
>> 3 3 1-3-4-5 3
>> 3 4 1-3-4-5 3
>> 3 5 1-3-4-5 3
>> 4 2 2-3-5 2
>> 4 3 2-3-5 2
>> 4 5 2-3-5 2
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/