Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: identify unique string values within lists of elements over chosen time windows
From
Denisa Mindruta <[email protected]>
To
[email protected]
Subject
Re: st: identify unique string values within lists of elements over chosen time windows
Date
Fri, 22 Mar 2013 04:46:40 -0700 (PDT)
Dear Nick- this has been a very helpful conversation ! For anyone else
potentially interested in this posting.
Another solution proposed by Dimitriy on stackoverflow was to use:
collapse (sum) new=n, by(obs year) after creating the indicator counting the
first occurrence of a string value. But Dimitriy's solution requires the
additional step of merging the new variable back into the original dataset....
I also asked Nick whether reshaping is the most "efficient" way of approaching
the issue and here is what he said. I quote Nick:
"(MORE) Further comments focused largely on efficiency, meaning here speed
rather than space. (Storage space could be biting the poster.)
Without a restructure, here using reshape, the problem is a triple loop: over
identifiers, over observations for each identifier and over variables. Possibly
the two outer loops can be collapsed to one. But an explicit loop over
observations is usually slow in Stata.
With the restructuring solutions proposed by Dimitriy and myself, by: operations
go straight to compiled code and are relatively fast: reshape is interpreted
code and entails file manipulations, so can be slow. On the other hand reshape
can be fast to write down with some experience, and it really is worth acquiring
the fluency with reshape which comes with experience. In addition to the help
for reshape and the manual entry, see the FAQ on reshape I wrote on
www.stata.com.
Another consideration is what else you want to do with this kind of dataset. If
there are going to be other problems of similar character, they will usually be
easier with a long structure as produced by reshape, so keeping that structure
will be a good idea."
----- Original Message ----
From: Nick Cox <[email protected]>
To: [email protected]
Sent: Fri, March 22, 2013 4:27:35 AM
Subject: Re: st: identify unique string values within lists of elements over
chosen time windows
clear
input obs yr str4 var1 str4 var2 str4 var3
1 90 str1 str2 str3
1 91 str1 str4 str5
2 90 str3 str4
2 91 str4 str5
2 93 str3 str5
2 94 str7
end
reshape long var , i(obs yr) j(which)
bysort obs var (yr) : gen new = _n == 1 & !missing(var)
bysort obs yr : replace new = sum(new)
by obs yr : replace new = new[_N]
reshape wide var, i(obs yr) j(which)
Nick
On Thu, Mar 21, 2013 at 11:22 PM, Denisa Mindruta <[email protected]> wrote:
> Hi everyone. I have an unbalanced, large panel dataset, where each observation
> can take multiple string values (each string is stored in a separate
variable).
> At each point in time, I need to count whether the string value(s) taken by an
> observation are "new" , meaning that they do not show up among the values
taken
> by the same observation in previous years. How should I approach this problem
>?
> Thanks ! Below is a description of data. I need to calculate newval
>
> obs yr var1 var2 var3 newval
> 1 90 str1 str2 str3 3
> 1 91 str1 str4 str5 2
> 2 90 str3 str4 2
> 2 91 str4 str5 1
> 2 93 str3 str5 0
> 2 94 str7 1
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/