Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: copying a string variable to all rows within a group
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: copying a string variable to all rows within a group
Date
Wed, 29 Feb 2012 11:54:33 +0000
Yes, my second post corrected the first. Sorry about that.
You need a systematic way of referring to non-missing values and sorting them to the end or the beginning of each block gives you -oldvar[_N]- or -oldvar[1]- respectively as the non-missing value.
However, your information here on using -gsort- doesn't explain why the non-missing strings weren't always in the first observation of each block.
Incidentally, something that could go wrong here is if there are two or more non-missing values in each block. A careful test of that would be
clonevar safecopy = oldvar
bysort groupvar (oldvar) : replace oldvar = oldvar[_N]
l if safecopy != oldvar & !missing(safecopy)
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
Sent: 29 February 2012 11:39
To: [email protected]
Subject: Re: st: copying a string variable to all rows within a group
Hi Again,
you below explanation may go some way to explaining my previous reply to your response to the original query.
However I did do a gsort of the string variable ("gsort groupvar -oldvar" using my original variable names) to address the problem of strings being sorted with missing first....
>>> Nick Cox <[email protected]> 2/29/2012 10:40 AM >>>
Seyi did say "not necessarily on the first row". Only the first of
these is a good solution given that warning. The logic is that -sort-
sorts all non-empty strings to the end of any block of observations.
(Perhaps Seyi started out with the identifiers In the first
observation of each block, but they got scrambled a bit by some later
-sort-. -sort groupvar- makes no guarantees about other variables
unless you specify a stable sort.
Nick
On Wed, Feb 29, 2012 at 9:01 AM, Nick Cox <[email protected]> wrote:
> For "row" read "observation". We know what you mean, but that is the
> correct Stata terminology.
>
> bysort groupvar (oldvar) : replace oldvar = oldvar[_N]
>
> Alternatively,
>
> bysort groupvar : replace oldvar = oldvar[1]
>
> will probably work too. Alternatively,
>
> bysort groupvar : replace oldvar = oldvar[_n-1] if missing(oldvar) & _n > 1
>
> will probably work too. This last is an FAQ
>
> FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> 2/03 How can I replace missing values with previous or
> following nonmissing values?
> http://www.stata.com/support/faqs/data/missing.html
>
> On Wed, Feb 29, 2012 at 8:28 AM, <[email protected]> wrote:
> ,
>> If I have a numeric variable "oldvar" which appears on only the first row of a set of rows defined by a group variable "groupvar" (rest of oldvar rows are blank), it is easy enough to copy oldvar down all rows within groupvar:
>>
>> bysort groupvar: egen newvar=max(oldvar) // can equally use =min(oldvar) as there is only 1 value of oldvar.
>>
>> How might I do the same to copy a string oldvar to all rows - again it only appears once within the group, but not necessarily on the first row.
>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/