Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: RE: Creating a group variable based on values in observations |
Date | Sat, 21 May 2011 09:33:32 +0100 |
Going for the interpretation that 1 ... 7 are days of the week, then another way to do it would have been gen signature = "" bysort id (market) : replace signature = signature[_n-1] + string(market) which creates signatures such as "135", which as Bert says can be mapped to integers 1 up with -egen, group()-. Strictly, -group()- is an -egen- function, not an option. Use the -label- option and they remain intelligible. Use -tabulate- and you get indicator variables. Nick On Sat, May 21, 2011 at 2:32 AM, Bert Jung <bjung59@gmail.com> wrote: > Hi Chris, > > I am not sure if I understand your problem but maybe this helps: > > If you would like group IDs for all unique values within your > "openmarkets" variable, you could use the "group" option of -egen-. I > suspect that requires that the values in the openmarkets variable must > always have the same order since -egen- would consider "2-3-5" and > "5-3-2" as two different groups but to you and me they're the same. > > If "openmarkets" is a string variable you could also remove the "-" > with -subinstr- or with -destring openmarkets, ignore("-") gen(new)-. > Since the values 1 to 5 seem to uniquely identify the week days (?) > that would be similar to Sarah's suggestion. > > Cheers, > Bert > > > On Fri, May 20, 2011 at 7:49 PM, Sarah Edgington <sedging@ucla.edu> wrote: >> Chris, >> I think there are a number of different ways to solve this problem. >> How many markets are you dealing with? If it's fewer than 20 here's a >> solution that gets you around the reshaping issue. >> First, create a new market id where market 1=1, market 2=10, market 3=100, >> etc. Then sum this id within days. That will give you a group variable >> where each place represents a particular market (starting with market 1 on >> the right) and a 1 or 0 tells you if the market was open or not. Your day >> one group id would be 11111. Day two's would be 10110. >> >> gen double mid=10^(market-1) >> bysort day: egen double margroup=total(mid) >> >> This only works well up to 19 markets because of precision issues. In >> principle, though, you could do it in any base and have everything add up to >> create a unique group id. So if you used 2 as your base instead of 10 (that >> is, change the first line to gen double mid=2^(market-1) ) you'd be able to >> accommodate more markets. Doing that you lose the ability to easily look at >> it and read which markets are open straight from the group variable. That >> doesn't really matter for analytical purposes, though. >> >> -Sarah >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker >> Sent: Friday, May 20, 2011 3:05 PM >> To: statalist@hsphsun2.harvard.edu >> Subject: st: RE: Creating a group variable based on values in observations >> >> Hi, >> >> I think I have a solution. My data is a bit too big to do this all at once >> (reshape gives a return code telling me productmarket takes on too many >> values) but here is what works in case anyone runs into a similar >> problem: >> >> . gen marketdup = market >> . reshape wide market, i(date) j(marketdup) . egen openmarkets = >> concat(market*), punc(_) . encode openmarkets, gen(groupid) . drop >> openmarkets . reshape long . drop marketdup >> >> Chris >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Chris Parker >> Sent: Friday, May 20, 2011 9:36 PM >> To: statalist@hsphsun2.harvard.edu >> Subject: st: Creating a group variable based on values in observations >> >> Hi Statalist, >> >> I have a problem that's been troubling me for a while now. I have daily >> prices for several products in several markets over time. I use the data to >> measure price dispersion as the coefficient of variation of prices on a day >> for a product. However, not every market is open on every day. >> Systematic differences between the markets that are open (such as average >> distance between markets, percent of markets of type A, etc.) could impact >> price dispersion, so I need to control for this. For each product I would >> like to create a variable that lists which group of markets was open on each >> day (openmarkets in the example below). I could then encode this variable >> and include i.groupid which controls for these differences. >> >> Example data for one of the products: >> >> day market openmarkets groupid >> 1 1 1-2-3-4-5 1 >> 1 2 1-2-3-4-5 1 >> 1 3 1-2-3-4-5 1 >> 1 4 1-2-3-4-5 1 >> 1 5 1-2-3-4-5 1 >> 2 2 2-3-5 2 >> 2 3 2-3-5 2 >> 2 5 2-3-5 2 >> 3 1 1-3-4-5 3 >> 3 3 1-3-4-5 3 >> 3 4 1-3-4-5 3 >> 3 5 1-3-4-5 3 >> 4 2 2-3-5 2 >> 4 3 2-3-5 2 >> 4 5 2-3-5 2 >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/