Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How can I get the second last non-missing value?

From	Robert Picard <[email protected]>
To	[email protected]
Subject	Re: st: How can I get the second last non-missing value?
Date	Thu, 13 Jun 2013 10:07:31 -0400

This can be done by looping over the desired varlist and storing, for
each observation, the two most recent non-missing values. Because you
are looping over variables and not observations, Stata is faster than
Mata (about 10 times faster on my machine compared to Sergiy's code).

* --------- fake data setup -------------------------------
clear
input id v1 v2 v3 v4 v5
  1 1.1  2.2  3.3    .   .
  2 1.1  2.2    .    .   .
  3 1.1  2.2  3.3  4.4   .
  4 1.1  2.2    .    .   .
  5 1.1    .    .    .   .
end

list
local alot 100000
expand `alot'

tempfile testdata
qui save "`testdata'"

* --------- loop over variables ---------------------------
set rmsg on
qui {
    gen prelast = .
    gen last = .
    foreach v of varlist v* {
        qui replace prelast = last if !mi(`v')
        qui replace last = `v' if !mi(`v')
    }
}
set rmsg off
sum prelast

* --------------- using mata ------------------------------

clear all
use "`testdata'"

qui {
 mata

   void prelast() {
       V=.
       st_view(V,.,st_local("varlist"))
       R=.
       st_view(R,.,st_local("result"))

       for(i=1;i<=rows(V);i++) {

         for(j=0;j<cols(V);j++) {
           if (missing(V[i,cols(V)-j])==0) {
             // found last non-missing

             if (cols(V)-j-1<1) break;  //nothing before

             for(k=cols(V)-j-1;k>=1;k--) {
               if (missing(V[i,k])==0)
                 R[i,1]=V[i,k]
                 break;
             }

             break;
           }
         }
       }
   }

 end
}

 program define sergiy
   syntax varlist, result(string)

   quietly generate double `result'=.
   mata prelast()
 end


 set rmsg on
   sergiy v1 v2 v3 v4 v5, result(r)
 set rmsg off

sum r

* --------------- end example -----------------------------


On Wed, Jun 12, 2013 at 6:31 PM, Sergiy Radyakin <[email protected]> wrote:
> Besides loosing a variable it is also quite slow because of 2 reshapes
> (and perhaps you have other variables in the data as well). I'd go to
> Mata with this one, it is about 10 times faster and can probably still
> be optimized:
> r; t=1.72 18:25:22 vs r; t=16.88 18:25:39
> Best, Sergiy
>
> *** begin example ***
> clear all
>
> input id v1 v2 v3 v4 v5
>   1 1  2  3  .  .
>   2 1  2  .  .  .
>   3 1  2  3  4  .
> end
>
> local exp=300000
>
> expand `exp'
>
>  mata
>
>    void prelast() {
>        V=.
>        st_view(V,.,st_local("varlist"))
>        R=.
>        st_view(R,.,st_local("result"))
>
>        for(i=1;i<=rows(V);i++) {
>
>          for(j=0;j<cols(V);j++) {
>            if (missing(V[i,cols(V)-j])==0) {
>              // found last non-missing
>
>              if (cols(V)-j-1<1) break;  //nothing before
>
>              for(k=cols(V)-j-1;k>=1;k--) {
>                if (missing(V[i,k])==0)
>                  R[i,1]=V[i,k]
>                  break;
>              }
>
>              break;
>            }
>          }
>        }
>    }
>
>  end
>
>  program define sergiy
>    syntax varlist, result(string)
>
>    quietly generate double `result'=.
>    mata prelast()
>  end
>
>  program define rebecca
>     quietly reshape long v, i(id) j(num)
>     keep if ! missing(v)
>     bys id (num): gen v6 = v[_N-1]
>     quietly reshape wide v, i(id) j(num)
>  end
>
>
>
>  set rmsg on
>    sergiy v1 v2 v3 v4 v5, result(r)
>  set rmsg off
>
>
>  clear
>
> input id v1 v2 v3 v4 v5
>   1 1  2  3  .  .
>   2 1  2  .  .  .
>   3 1  2  3  4  .
> end
>
> expand `exp'
> replace id=_n
>
> set rmsg on
>   rebecca
> set rmsg off
>
> On Wed, Jun 12, 2013 at 5:01 PM, Rebecca Pope <[email protected]> wrote:
>> Duygu,
>> I think this will be easier with your data in long form.
>>
>> *** begin example ***
>> clear
>> input id v1 v2 v3 v4 v5
>>   1 1  2  3  .  .
>>   2 1  2  .  .  .
>>   3 1  2  3  4  .
>>  end
>>
>> reshape long v, i(id) j(num)
>> keep if ! missing(v)
>> bys id (num): gen v6 = v[_N-1]
>> list, noobs clean
>> reshape wide v, i(id) j(num)
>> list, noobs clean
>>
>> *** end ***
>>
>> In this case, you lose v5, but presumably in your real data that
>> variable is not universally missing. If you don't have a constant
>> prefix for the variables in your dataset, you might want to check out
>> -rename- and look at the methods for renaming groups. Standing advice:
>> test this on a copy of your data rather than the original data.
>>
>> Regards,
>> Rebecca
>>
>> On Wed, Jun 12, 2013 at 3:27 PM, duygu yıldırım <[email protected]> wrote:
>>> Hello everyone,
>>>
>>> I use a cross-sectional individual level data set. I want to ask if there is a way that I can create a variable which consists the second last non-missing value within observations?
>>>
>>> for example;
>>>
>>> id v1 v2 v3 v4 v5
>>>  1 1  2  3  .  .
>>>  2 1  2  .  .  .
>>>  3 1  2  3  4  .
>>>
>>> So the variable that I want to create is;
>>>
>>> id  v6
>>> 1   2
>>> 2   1
>>> 3   3
>>>
>>> I can get the last non-missing value within observation by using the rowlast command, but I also need the one just before the last one.
>>>
>>> Thank you all very much for any help,
>>>
>>> Duygu.
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: How can I get the second last non-missing value?
  - From: duygu yıldırım <[email protected]>
- Re: st: How can I get the second last non-missing value?
  - From: Rebecca Pope <[email protected]>
- Re: st: How can I get the second last non-missing value?
  - From: Sergiy Radyakin <[email protected]>

Prev by Date: st: Load data using macro
Next by Date: Re: st: How can I get the second last non-missing value?
Previous by thread: Re: st: How can I get the second last non-missing value?
Next by thread: Re: st: How can I get the second last non-missing value?
Index(es):
- Date
- Thread