Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: drop variables in panel data with loop
From
Lisa Wang <[email protected]>
To
[email protected]
Subject
Re: st: RE: drop variables in panel data with loop
Date
Mon, 23 Jul 2012 11:46:31 +1000
Hi all,
Both codes don't seem to drop any observations at all or drop all the
observations.
@Nick - I also tried yours but likewise, it doesn't seem to work
either. I need to summarise the data based on i as this represents
each individual entity - each entity will have multiple r's (of
differing amounts). t is just a variable created to do a timeline kind
of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and the days
in the timeline can vary for each individual entity.
If this is any help:
After I run this code - tabulate i t if window==1 & r==. - I get this
output from Stata:
| Event Timeline
i | -1 0 1 | Total
-----------+---------------------------------+----------
Amy1 | 0 0 1 | 1
Colin1 | 1 1 1 | 3
Chris1 | 0 0 1 | 1
Cat2 | 0 1 1 | 2
Ian1 | 1 1 0 | 2
Queenie1 | 1 1 1 | 3
Sam1 | 0 1 1 | 2
Uncle1 | 1 1 0 | 2
-----------+---------------------------------+----------
Total | 4 6 6 | 16
. levelsof i if window==1 & r==., local(entities)
2 4 6 7 9 14 21 25
(eg. Amy1 is the second entity in my dataset, I want to remove ALL
observations of Amy1 - not only the days (t) that I have missing
observations as I want to omit these people from any further
analysis).
I also want i to be 22 (since 30 - 8 entities I want dropped from my
dataset) as I will do some loops for regressions later on.
Thank you everyone for your kind help so far.
Kind regards,
Lisa
On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <[email protected]> wrote:
> Djalal's code can be simplified to
>
> drop if t==.
>
> as whether t is missing does not depend on its relation to other
> variables. So, it drops observations which are missing on -t-, which
> is not your problem.
>
> However, Lisa overlooks my earlier posting
>
> http://www.stata.com/statalist/archive/2012-07/msg00776.html
>
> I got a bit lost in Lisa's explanation (for example further variables
> -twindow- and -holidaywindow- appear without any explanation) but my
> solution should still be relevant. Another solution might be
>
> bysort i (window) : drop if window[_N] == 1
>
> Nick
>
> On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <[email protected]> wrote:
>> Hi Djala,
>>
>> Thank you for your help.
>>
>> I have tried your recommendation but it does not delete any
>> observations from my data set at all.
>>
>> Maybe I didn't specify my query well enough. If there are missing
>> observations within a particular period, which is denoted by a dummy
>> variable 'window', then drop ALL the observations pertaining to that
>> person - not only the rows that have missing observations.
>>
>> Would you have any other suggestions?
>>
>> Kind regards,
>> Lisa
>>
>>
>>
>> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <[email protected]> wrote:
>>> Hi Lisa,
>>> Have you tried the following syntax?
>>>
>>> by i, sort : drop if t==.
>>>
>>> This will allow you have t variable without any missing observation.
>>> As you have already distinguished wish people/rows are concerned you can
>>> manually drop them from data editor.
>>>
>>> Hope this can hope.
>>>
>>>
>>> Djalal Arinloye
>>>
>>>
>>> -----Message d'origine-----
>>> De : [email protected]
>>> [mailto:[email protected]] De la part de Lisa Wang
>>> Envoyé : Sunday, July 22, 2012 12:51 PM
>>> À : [email protected]
>>> Objet : st: drop variables in panel data with loop
>>>
>>> I am having trouble with Stata and would like some guidance on what I
>>> am doing incorrectly. I am new to Stata (only 1 month into it), so I
>>> am still trying to learn and sometimes still thinking like in Excel.
>>>
>>> I will try to be as detailed as possible, so you can understand my question.
>>>
>>> To describe my data set, I have some panel data and a variable i,
>>> which is the names (eg. Mary, Tom...) but encoded into a numeric as
>>> such: - encode symbol1, generate (i) -. There are 59732 rows and the
>>> count of i is 30.
>>>
>>> What I would like to achieve is to tell the program to drop the
>>> observations that have missing values for a variable for a specific
>>> period (variable window). E.g. If there is no data for "Mary" for day
>>> 102 then drop all the rows pertaining to "Mary" from day 1...T - not
>>> only drop the the observation for Mary on day 102.
>>>
>>> This is my code to try to achieve this:
>>>
>>> version 12.1
>>> clear all
>>> set more off
>>>
>>> cd "C:\Users\Admin\Desktop"
>>>
>>> use window_students, clear
>>>
>>> xtset i t
>>> //check panel structure is correct
>>>
>>>
>>> summ i // this tells me that the max of variable i is 30, which is
>>> correct as I have 30 people I need to analyse
>>>
>>> tabulate i t if window==1 & r==.
>>> //r is another variable stored in another column, which represents
>>> their rates. There are 8 people that don't have any rates within my
>>> window.
>>> ///I would like to remove all the observations pertaining to these peopl
>>>
>>> levelsof i if window==1 & r==., local(entities) //tried to
>>> store the people that were missing into a local macro - these are i =
>>> 2 4 6 7 9 14 21 25
>>>
>>>
>>>
>>> Then I tried this:
>>>
>>> *Method 1 - but then results window has return code 198 and invalid
>>> '4' in red text
>>>
>>> foreach i of local entities{
>>> drop if i==`entities'
>>> }
>>>
>>>
>>> *Method 2 - but then results window has return code 111 and variable i not
>>> found
>>>
>>> foreach i of local entities{
>>> drop i
>>> }
>>>
>>> *Method 3 - but it deleted all of my observations
>>>
>>> foreach i of local entities{
>>> drop i
>>> }
>>>
>>> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc...
>>> that were missing observations I wrote out each line
>>>
>>> drop if i==2
>>> drop if i==4 //etc.....
>>>
>>> summ i // I still get 30 in the summary but it has told me
>>> that it has deleted observations for each drop if line that I
>>> used....shouldn't it be 22 now after I removed the 8 people?
>>>
>>>
>>>
>>> I am stuck now...as I need the i to be correct as I will be doing some
>>> regressions with the i later, that's why I have to drop the people
>>> that don't have observations in my dataset before I do further
>>> analysis.
>>>
>>> eg.
>>> summarize i
>>> local m = r(max)
>>> //create a local macro storing the max
>>> number of distinct entities from an r-scalar
>>>
>>> generate ar = .
>>>
>>>
>>>
>>> forvalues x = 1/`m' {
>>> //run regression for every entity in data set
>>> regress r ind if i==`x' & twindow
>>>
>>> predict res if i==`x', residuals
>>> //predict residuals both
>>> in-sample and out-of-sample
>>> replace ar=res if i==`x' & holidaywindow
>>> //replace ar=. with thes
>>> estimated residuals
>>> drop res
>>> }
>>>
>>>
>>>
>>> Sorry for the long email. This is my first post, so wanted everyone to
>>> be clear of what I have done so far and what I want to do next.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/