Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: identify unique string values within lists of elements over chosen time windows
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: identify unique string values within lists of elements over chosen time windows
Date
Fri, 22 Mar 2013 11:55:23 +0000
is the Stack Overflow thread alluded to.
http://www.stata.com/support/faqs/data-management/problems-with-reshape/
is the FAQ alluded to.
On Fri, Mar 22, 2013 at 11:46 AM, Denisa Mindruta <[email protected]> wrote:
> Dear Nick- this has been a very helpful conversation ! For anyone else
> potentially interested in this posting.
>
>
> Another solution proposed by Dimitriy on stackoverflow was to use:
> collapse (sum) new=n, by(obs year) after creating the indicator counting the
> first occurrence of a string value. But Dimitriy's solution requires the
> additional step of merging the new variable back into the original dataset....
> I also asked Nick whether reshaping is the most "efficient" way of approaching
> the issue and here is what he said. I quote Nick:
>
> "(MORE) Further comments focused largely on efficiency, meaning here speed
> rather than space. (Storage space could be biting the poster.)
>
>
> Without a restructure, here using reshape, the problem is a triple loop: over
> identifiers, over observations for each identifier and over variables. Possibly
> the two outer loops can be collapsed to one. But an explicit loop over
> observations is usually slow in Stata.
>
>
> With the restructuring solutions proposed by Dimitriy and myself, by: operations
> go straight to compiled code and are relatively fast: reshape is interpreted
> code and entails file manipulations, so can be slow. On the other hand reshape
> can be fast to write down with some experience, and it really is worth acquiring
> the fluency with reshape which comes with experience. In addition to the help
> for reshape and the manual entry, see the FAQ on reshape I wrote on
>
> www.stata.com.
>
> Another consideration is what else you want to do with this kind of dataset. If
> there are going to be other problems of similar character, they will usually be
> easier with a long structure as produced by reshape, so keeping that structure
> will be a good idea."
>
>
>
>
> ----- Original Message ----
> From: Nick Cox <[email protected]>
> To: [email protected]
> Sent: Fri, March 22, 2013 4:27:35 AM
> Subject: Re: st: identify unique string values within lists of elements over
> chosen time windows
>
> clear
> input obs yr str4 var1 str4 var2 str4 var3
> 1 90 str1 str2 str3
> 1 91 str1 str4 str5
> 2 90 str3 str4
> 2 91 str4 str5
> 2 93 str3 str5
> 2 94 str7
> end
> reshape long var , i(obs yr) j(which)
> bysort obs var (yr) : gen new = _n == 1 & !missing(var)
> bysort obs yr : replace new = sum(new)
> by obs yr : replace new = new[_N]
> reshape wide var, i(obs yr) j(which)
>
> Nick
>
> On Thu, Mar 21, 2013 at 11:22 PM, Denisa Mindruta <[email protected]> wrote:
>> Hi everyone. I have an unbalanced, large panel dataset, where each observation
>> can take multiple string values (each string is stored in a separate
> variable).
>> At each point in time, I need to count whether the string value(s) taken by an
>> observation are "new" , meaning that they do not show up among the values
> taken
>> by the same observation in previous years. How should I approach this problem
>
>
>>?
>> Thanks ! Below is a description of data. I need to calculate newval
>>
>> obs yr var1 var2 var3 newval
>> 1 90 str1 str2 str3 3
>> 1 91 str1 str4 str5 2
>> 2 90 str3 str4 2
>> 2 91 str4 str5 1
>> 2 93 str3 str5 0
>> 2 94 str7 1
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/