Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Observations that keep a feature... an additional problem
From
"Miguel Angel Duran Munoz" <[email protected]>
To
[email protected]
Subject
RE: st: Observations that keep a feature... an additional problem
Date
Thu, 23 May 2013 00:05:52 +0200 (CEST)
Sarah, thank you for your help. I am very sorry for not having put my
doubts in a sufficiently clear way. And given what you say about the way
data is stored I have realized that there might be other problems around.
I will try to be as clear as possible.
My data is in panel data form. I write the example down again in the way
my data is stored. As regards the example in my previous messages, I add
two agents (6 and 7). Please note also that data referring to agent fifth
is missing in some periods, but there is no line corresponding to those
periods (this is what I had not taken into account so far):
time agent score
t1 1 0.8
t2 1 1
t3 1 1
t4 1 1
t5 1 1
t6 1 1
t1 2 0.8
t2 2 0.8
t3 2 1
t4 2 1
t5 2 1
t6 2 1
t1 3 0.8
t2 3 0.8
t3 3 0.8
t4 3 1
t5 3 1
t6 3 1
t1 4 0.8
t2 4 0.8
t3 4 0.8
t4 4 0.8
t5 4 1
t6 4 1
t6 5 1
t1 6 0.8
t2 6 0.8
t3 6 0.8
t4 6 0.8
t5 6 1
t6 6 1
t1 7 0.8
t2 7 1
t3 7 1
t4 7 0.8
t5 7 0.8
t6 7 1
Having said that, I want to split the sample in different ways. First, I
want to focus on agents that overpass a threshold (eg, 0.9) since the
first period and are always above the threhold (ie, agent 1). Second, I
want to take agents that overpass the threshold at or before a particular
period (eg, t3) and since then they are above the threshold (ie, agents
1-4). Third, agents that overpass the threshold at or after a particular
period (eg, t5) and since then they are above the threshold (ie, agents 5
and 6). Please note that agent 7 is not included in any of the previous
subsamples.
Thank you very much for your help. And once again, I am sorry for not
having been clear enough.
Miguel.
As I mentioned in the message that crossed with yours, I think that I
found a solution for one of the problems I have, but your proposal is
better. Thank you very much.
First, I want to focus on agents that overpass the threshold since the
first period and are always above the threhold
> Miguel,
> This discussion would be clearer if your examples actually made it clear
> exactly what your data looks like.
>
> Your example below looks like you have data in wide form. The solution
> that Nick suggested is for data in long form. It's easy enough to move
> between the two, but it's hard to make concrete suggestions about how to
> proceed when we don't know what the actual data looks like.
>
> I'll start by assuming, as Nick does, that your data is actually in long
> form and you have three variables: agent, period, score. I'll further
> assume that for agent 5 you simply have no records for periods 1-5 (that
> is, you do not have records for those periods with missing values for
> score). If that's true, you can simply calculate the first period that
> appears in the data and use that as part of your inclusion criteria.
> Something like the following will keep only those agents who first appear
> in the data before period 4:
> egen firstperiod=min(period), by(agent)
> drop if firstperiod>4
>
> Or maybe you only want to include agents who start in period 1? It's
> unclear from your question. In that case you'd -drop if firstperiod>1-
>
> For your second example, trying to look at the last time periods, I think
> you need to clarify what your actual criteria is. You say "I would like
> to select those agents that overpass the threshold of 0.9 in any the last
> two periods and are over the threshold until the end of the sample period
> (ie, agents 4 and 5)." To my eye, that criteria includes all agents
> except agent 6. You're unlikely to get the results you hope for unless
> you are precise in the criteria you're using.
>
> Hope that helps.
>
> -Sarah
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Miguel Angel
> Duran Munoz
> Sent: Wednesday, May 22, 2013 11:00 AM
> To: [email protected]
> Subject: Re: st: Observations that keep a feature... an additional problem
>
> I use the same example than in a previous message, but I add a fifth agent
> that joins in period six:
>
>
> Agent 1: 1 1 1 1 1 1...
> Agent 2: 0.8 1 1 1 1 1...
> Agent 3: 0.8 0.8 0.8 1 1 1...
> Agent 4: 0.8 0.8 0.8 0.8 1 1...
> Agent 5: . . . . . 1...
>
> I want to keep just the first three agents.
>
>
> If you don't mind, Nick, I would also like to ask you the following. I
> take the same example, but I focus on the last periods.
>
> Agent 1: ...1 1 1 1 1 1
> Agent 2: ...0.8 1 1 1 1 1
> Agent 3: ...0.8 0.8 0.8 1 1 1
> Agent 4: ...0.8 0.8 0.8 0.8 1 1
> Agent 5: ... . . . . . 1
> Agent 6: ...0.8 0.8 0.8 0.8 1 0.8
>
> I would like to select those agents that overpass the threshold of 0.9 in
> any the last two periods and are over the threshold until the end of the
> sample period (ie, agents 4 and 5).
> I have tried to modify the commands that you have suggested me before, but
> I have not been able to get the right selection. Would you mind helping me
> with this? Thank you very much.
>
>> I can't follow this. I see only "the rules select too many agents".
>>
>> You tell me your precise rules and I will try to think of code to
>> implement them.
>>
>> Nick
>> [email protected]
>>
>>
>> On 22 May 2013 18:16, Miguel Angel Duran Munoz <[email protected]> wrote:
>>> Nick, after reducing the sample using your suggestion, I have checked
>>> the number of agents that there are per period. And the number is
>>> increasing in time. I guess this is due to the fact that agents
>>> joining the sample as time goes by and satisfying the requirement of
>>> being above the threshold are not excluded. Is there any trick to
>>> avoid including them? Thanks again.
>>>
>>>> Assuming variable names
>>>>
>>>> agent period score
>>>>
>>>> it seems that you want something like
>>>>
>>>> bysort agent (period) : gen first3 = _n < 4
>>>>
>>>> egen max_first3 = max(score / first3), by(agent)
>>>>
>>>> egen min_rest = min(score / !first3), by(agent)
>>>>
>>>> keep if max_first3 > 0.9 & min_rest > 0.9
>>>>
>>>> For the division trick in the -egen- call see e.g.
>>>>
>>>> http://www.stata.com/statalist/archive/2013-03/msg00917.html
>>>>
>>>> (reference included therein).
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 22 May 2013 15:03, Miguel Angel Duran Munoz <[email protected]> wrote:
>>>>> Nick, thanks for your help. I hope you can help me with another
>>>>> doubt.
>>>>> For
>>>>> a similar analysis to that of my first message, assume I want to
>>>>> keep those agents that that have overpass the threshold before a
>>>>> certain period and then have been over it in the rest of the sample
>>>>> period.
>>>>>
>>>>> To illustrate the idea, consider the following (data refer to
>>>>> consecutive periods and the threshold is, eg, 0.9):
>>>>>
>>>>> Agent 1: 1 1 1 1 1...
>>>>> Agent 2: 0.8 1 1 1 1...
>>>>> Agent 3: 0.8 0.8 0.8 1 1...
>>>>> Agent 4: 0.8 0.8 0.8 0.8 1...
>>>>>
>>>>> I want to keep the first three agents because they have overpassed
>>>>> the threshold before period 4 and then have been over the threshold
>>>>> in the rest of the sample period, but I do not want to keep agent 4.
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Miguel.
>>>>>
>>>>>
>>>>>
>>>>>> Correct on -keep-. Sorry about that.
>>>>>>
>>>>>> The -sort- order
>>>>>>
>>>>>> bysort entity (const_a) :
>>>>>>
>>>>>> ensures that -const_a[1]- is the lowest for each agent, not the
>>>>>> first.
>>>>>> If the lowest value for each agent is above the threshold, then
>>>>>> all the observations for that agent are above.
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 21 May 2013 23:16, Miguel Angel Duran Munoz <[email protected]>
>>>>>> wrote:
>>>>>>> Thanks, Nick. I guess you mean -keep- instead of -drop-.
>>>>>>> Nevertheless,
>>>>>>> the
>>>>>>> command that you suggest would not guarantee that I keep the
>>>>>>> agents that have been above the threhsold for the whole sample
>>>>>>> period (ie, I would be including agents that were above the
>>>>>>> threshold in the first period and then might have been above or
>>>>>>> below it).
>>>>>>>
>>>>>>>> Sounds like
>>>>>>>>
>>>>>>>> bysort entity (const_a) : drop if const_a[1] > 0.09716
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>> On 21 May 2013 23:01, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Hi, Statalisters. I want to focus on agents in my dataset that
>>>>>>>>> have a particular feature; specifically, for those agents, and
>>>>>>>>> for each and every period (out of 64), the value of a variable
>>>>>>>>> (const_a) is larger than a particular threshold (0.097116). I
>>>>>>>>> have done what I show below.
>>>>>>>>> Nevertheless, I have realized that some of my agents are not in
>>>>>>>>> the sample since the first period, so what I am doing would
>>>>>>>>> mistakenly eliminate them. Will anyone help to solve this
>>>>>>>>> problem? Thanks in advance.
>>>>>>>>>
>>>>>>>>> bysort entity (date2): gen obs=_n drop if const_a<0.097116 by
>>>>>>>>> entity: drop if obs[_N]<64
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/