Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Observations that keep a feature... an additional problem
From
"Sarah Edgington" <[email protected]>
To
<[email protected]>
Subject
RE: st: Observations that keep a feature... an additional problem
Date
Wed, 22 May 2013 13:50:56 -0700
Miguel,
This discussion would be clearer if your examples actually made it clear exactly what your data looks like.
Your example below looks like you have data in wide form. The solution that Nick suggested is for data in long form. It's easy enough to move between the two, but it's hard to make concrete suggestions about how to proceed when we don't know what the actual data looks like.
I'll start by assuming, as Nick does, that your data is actually in long form and you have three variables: agent, period, score. I'll further assume that for agent 5 you simply have no records for periods 1-5 (that is, you do not have records for those periods with missing values for score). If that's true, you can simply calculate the first period that appears in the data and use that as part of your inclusion criteria. Something like the following will keep only those agents who first appear in the data before period 4:
egen firstperiod=min(period), by(agent)
drop if firstperiod>4
Or maybe you only want to include agents who start in period 1? It's unclear from your question. In that case you'd -drop if firstperiod>1-
For your second example, trying to look at the last time periods, I think you need to clarify what your actual criteria is. You say "I would like to select those agents that overpass the threshold of 0.9 in any the last two periods and are over the threshold until the end of the sample period (ie, agents 4 and 5)." To my eye, that criteria includes all agents except agent 6. You're unlikely to get the results you hope for unless you are precise in the criteria you're using.
Hope that helps.
-Sarah
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Miguel Angel Duran Munoz
Sent: Wednesday, May 22, 2013 11:00 AM
To: [email protected]
Subject: Re: st: Observations that keep a feature... an additional problem
I use the same example than in a previous message, but I add a fifth agent that joins in period six:
Agent 1: 1 1 1 1 1 1...
Agent 2: 0.8 1 1 1 1 1...
Agent 3: 0.8 0.8 0.8 1 1 1...
Agent 4: 0.8 0.8 0.8 0.8 1 1...
Agent 5: . . . . . 1...
I want to keep just the first three agents.
If you don't mind, Nick, I would also like to ask you the following. I take the same example, but I focus on the last periods.
Agent 1: ...1 1 1 1 1 1
Agent 2: ...0.8 1 1 1 1 1
Agent 3: ...0.8 0.8 0.8 1 1 1
Agent 4: ...0.8 0.8 0.8 0.8 1 1
Agent 5: ... . . . . . 1
Agent 6: ...0.8 0.8 0.8 0.8 1 0.8
I would like to select those agents that overpass the threshold of 0.9 in any the last two periods and are over the threshold until the end of the sample period (ie, agents 4 and 5).
I have tried to modify the commands that you have suggested me before, but I have not been able to get the right selection. Would you mind helping me with this? Thank you very much.
> I can't follow this. I see only "the rules select too many agents".
>
> You tell me your precise rules and I will try to think of code to
> implement them.
>
> Nick
> [email protected]
>
>
> On 22 May 2013 18:16, Miguel Angel Duran Munoz <[email protected]> wrote:
>> Nick, after reducing the sample using your suggestion, I have checked
>> the number of agents that there are per period. And the number is
>> increasing in time. I guess this is due to the fact that agents
>> joining the sample as time goes by and satisfying the requirement of
>> being above the threshold are not excluded. Is there any trick to
>> avoid including them? Thanks again.
>>
>>> Assuming variable names
>>>
>>> agent period score
>>>
>>> it seems that you want something like
>>>
>>> bysort agent (period) : gen first3 = _n < 4
>>>
>>> egen max_first3 = max(score / first3), by(agent)
>>>
>>> egen min_rest = min(score / !first3), by(agent)
>>>
>>> keep if max_first3 > 0.9 & min_rest > 0.9
>>>
>>> For the division trick in the -egen- call see e.g.
>>>
>>> http://www.stata.com/statalist/archive/2013-03/msg00917.html
>>>
>>> (reference included therein).
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 22 May 2013 15:03, Miguel Angel Duran Munoz <[email protected]> wrote:
>>>> Nick, thanks for your help. I hope you can help me with another doubt.
>>>> For
>>>> a similar analysis to that of my first message, assume I want to
>>>> keep those agents that that have overpass the threshold before a
>>>> certain period and then have been over it in the rest of the sample
>>>> period.
>>>>
>>>> To illustrate the idea, consider the following (data refer to
>>>> consecutive periods and the threshold is, eg, 0.9):
>>>>
>>>> Agent 1: 1 1 1 1 1...
>>>> Agent 2: 0.8 1 1 1 1...
>>>> Agent 3: 0.8 0.8 0.8 1 1...
>>>> Agent 4: 0.8 0.8 0.8 0.8 1...
>>>>
>>>> I want to keep the first three agents because they have overpassed
>>>> the threshold before period 4 and then have been over the threshold
>>>> in the rest of the sample period, but I do not want to keep agent 4.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Miguel.
>>>>
>>>>
>>>>
>>>>> Correct on -keep-. Sorry about that.
>>>>>
>>>>> The -sort- order
>>>>>
>>>>> bysort entity (const_a) :
>>>>>
>>>>> ensures that -const_a[1]- is the lowest for each agent, not the
>>>>> first.
>>>>> If the lowest value for each agent is above the threshold, then
>>>>> all the observations for that agent are above.
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 21 May 2013 23:16, Miguel Angel Duran Munoz <[email protected]>
>>>>> wrote:
>>>>>> Thanks, Nick. I guess you mean -keep- instead of -drop-.
>>>>>> Nevertheless,
>>>>>> the
>>>>>> command that you suggest would not guarantee that I keep the
>>>>>> agents that have been above the threhsold for the whole sample
>>>>>> period (ie, I would be including agents that were above the
>>>>>> threshold in the first period and then might have been above or
>>>>>> below it).
>>>>>>
>>>>>>> Sounds like
>>>>>>>
>>>>>>> bysort entity (const_a) : drop if const_a[1] > 0.09716
>>>>>>>
>>>>>>> Nick
>>>>>>> [email protected]
>>>>>>>
>>>>>>> On 21 May 2013 23:01, Miguel Angel Duran Munoz <[email protected]>
>>>>>>> wrote:
>>>>>>>> Hi, Statalisters. I want to focus on agents in my dataset that
>>>>>>>> have a particular feature; specifically, for those agents, and
>>>>>>>> for each and every period (out of 64), the value of a variable
>>>>>>>> (const_a) is larger than a particular threshold (0.097116). I
>>>>>>>> have done what I show below.
>>>>>>>> Nevertheless, I have realized that some of my agents are not in
>>>>>>>> the sample since the first period, so what I am doing would
>>>>>>>> mistakenly eliminate them. Will anyone help to solve this
>>>>>>>> problem? Thanks in advance.
>>>>>>>>
>>>>>>>> bysort entity (date2): gen obs=_n drop if const_a<0.097116 by
>>>>>>>> entity: drop if obs[_N]<64
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/