Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How can I get the second last non-missing value?
From
Robert Picard <[email protected]>
To
[email protected]
Subject
Re: st: How can I get the second last non-missing value?
Date
Thu, 13 Jun 2013 10:07:31 -0400
This can be done by looping over the desired varlist and storing, for
each observation, the two most recent non-missing values. Because you
are looping over variables and not observations, Stata is faster than
Mata (about 10 times faster on my machine compared to Sergiy's code).
* --------- fake data setup -------------------------------
clear
input id v1 v2 v3 v4 v5
1 1.1 2.2 3.3 . .
2 1.1 2.2 . . .
3 1.1 2.2 3.3 4.4 .
4 1.1 2.2 . . .
5 1.1 . . . .
end
list
local alot 100000
expand `alot'
tempfile testdata
qui save "`testdata'"
* --------- loop over variables ---------------------------
set rmsg on
qui {
gen prelast = .
gen last = .
foreach v of varlist v* {
qui replace prelast = last if !mi(`v')
qui replace last = `v' if !mi(`v')
}
}
set rmsg off
sum prelast
* --------------- using mata ------------------------------
clear all
use "`testdata'"
qui {
mata
void prelast() {
V=.
st_view(V,.,st_local("varlist"))
R=.
st_view(R,.,st_local("result"))
for(i=1;i<=rows(V);i++) {
for(j=0;j<cols(V);j++) {
if (missing(V[i,cols(V)-j])==0) {
// found last non-missing
if (cols(V)-j-1<1) break; //nothing before
for(k=cols(V)-j-1;k>=1;k--) {
if (missing(V[i,k])==0)
R[i,1]=V[i,k]
break;
}
break;
}
}
}
}
end
}
program define sergiy
syntax varlist, result(string)
quietly generate double `result'=.
mata prelast()
end
set rmsg on
sergiy v1 v2 v3 v4 v5, result(r)
set rmsg off
sum r
* --------------- end example -----------------------------
On Wed, Jun 12, 2013 at 6:31 PM, Sergiy Radyakin <[email protected]> wrote:
> Besides loosing a variable it is also quite slow because of 2 reshapes
> (and perhaps you have other variables in the data as well). I'd go to
> Mata with this one, it is about 10 times faster and can probably still
> be optimized:
> r; t=1.72 18:25:22 vs r; t=16.88 18:25:39
> Best, Sergiy
>
> *** begin example ***
> clear all
>
> input id v1 v2 v3 v4 v5
> 1 1 2 3 . .
> 2 1 2 . . .
> 3 1 2 3 4 .
> end
>
> local exp=300000
>
> expand `exp'
>
> mata
>
> void prelast() {
> V=.
> st_view(V,.,st_local("varlist"))
> R=.
> st_view(R,.,st_local("result"))
>
> for(i=1;i<=rows(V);i++) {
>
> for(j=0;j<cols(V);j++) {
> if (missing(V[i,cols(V)-j])==0) {
> // found last non-missing
>
> if (cols(V)-j-1<1) break; //nothing before
>
> for(k=cols(V)-j-1;k>=1;k--) {
> if (missing(V[i,k])==0)
> R[i,1]=V[i,k]
> break;
> }
>
> break;
> }
> }
> }
> }
>
> end
>
> program define sergiy
> syntax varlist, result(string)
>
> quietly generate double `result'=.
> mata prelast()
> end
>
> program define rebecca
> quietly reshape long v, i(id) j(num)
> keep if ! missing(v)
> bys id (num): gen v6 = v[_N-1]
> quietly reshape wide v, i(id) j(num)
> end
>
>
>
> set rmsg on
> sergiy v1 v2 v3 v4 v5, result(r)
> set rmsg off
>
>
> clear
>
> input id v1 v2 v3 v4 v5
> 1 1 2 3 . .
> 2 1 2 . . .
> 3 1 2 3 4 .
> end
>
> expand `exp'
> replace id=_n
>
> set rmsg on
> rebecca
> set rmsg off
>
> On Wed, Jun 12, 2013 at 5:01 PM, Rebecca Pope <[email protected]> wrote:
>> Duygu,
>> I think this will be easier with your data in long form.
>>
>> *** begin example ***
>> clear
>> input id v1 v2 v3 v4 v5
>> 1 1 2 3 . .
>> 2 1 2 . . .
>> 3 1 2 3 4 .
>> end
>>
>> reshape long v, i(id) j(num)
>> keep if ! missing(v)
>> bys id (num): gen v6 = v[_N-1]
>> list, noobs clean
>> reshape wide v, i(id) j(num)
>> list, noobs clean
>>
>> *** end ***
>>
>> In this case, you lose v5, but presumably in your real data that
>> variable is not universally missing. If you don't have a constant
>> prefix for the variables in your dataset, you might want to check out
>> -rename- and look at the methods for renaming groups. Standing advice:
>> test this on a copy of your data rather than the original data.
>>
>> Regards,
>> Rebecca
>>
>> On Wed, Jun 12, 2013 at 3:27 PM, duygu yıldırım <[email protected]> wrote:
>>> Hello everyone,
>>>
>>> I use a cross-sectional individual level data set. I want to ask if there is a way that I can create a variable which consists the second last non-missing value within observations?
>>>
>>> for example;
>>>
>>> id v1 v2 v3 v4 v5
>>> 1 1 2 3 . .
>>> 2 1 2 . . .
>>> 3 1 2 3 4 .
>>>
>>> So the variable that I want to create is;
>>>
>>> id v6
>>> 1 2
>>> 2 1
>>> 3 3
>>>
>>> I can get the last non-missing value within observation by using the rowlast command, but I also need the one just before the last one.
>>>
>>> Thank you all very much for any help,
>>>
>>> Duygu.
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/