Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Observations that keep a feature...
From
"Miguel Angel Duran Munoz" <[email protected]>
To
[email protected]
Subject
Re: st: Observations that keep a feature...
Date
Thu, 23 May 2013 22:03:17 +0200 (CEST)
Nick, this is what I am doing. I have a large sample, with about 7,000
agents per period and about 60 periods. I am analyzing whether agents
imitate each other. Once I have (statistically) confirmed that there is an
imitation process going on, the next step is to analyze the differences
between types of agents. In particular, between innovators (those who
start following a rule of behavior right at the beginning of the process),
nonadopters (those who never adotp the rule) and laggards (those who adopt
the rule at a late period).
This is why I need to split the sample the way I have described. I hope
this helps to make it clear what I am doing. Thanks in advance.
> This is getting very intricate to follow.
>
> As Sarah posted yesterday, more or less, we need examples.
>
> I worry on your behalf that you will have to explain your rules to
> somebody reviewing your thesis/dissertation/report/paper and they are
> going to ask you why you couldn't use much simpler rules.
>
> Nick
> [email protected]
>
>
> On 23 May 2013 18:43, Miguel Angel Duran Munoz <[email protected]> wrote:
>> Nick and Sarah, thanks to your help I've been able to solve all but one
>> of
>> my problems. To select agents that are above the threshold after period
>> 2,
>> I've finally used:
>>
>> egen firstperiod = min(period), by(agent)
>> drop if firstperiod > 2
>> bysort agent (period): gen first2 = _n < 3
>> egen min_rest = min(score / !first2), by(agent)
>> keep if min_rest >= 0.9
>>
>> (the max condition that Nick suggested me is, I think, unnecessary)
>>
>> Nevertheless, I am not sure about how to select agents that overpass the
>> threshold in the final periods (say at or after t3) and maintain over
>> it.
>> In principle, based on your suggestions, I thought of this:
>>
>> bysort agent (period): gen last=score[_N]
>> bysort entity (date2): gen first2 = _n < 3
>> egen min_rest = min(score / !first2), by(agent)
>> keep if last>=0.9 & min_rest<=0.9
>>
>> Nevertheless, this implies that I am excluding agents that satisfy the
>> criterion (overpassing the threshold at or after t3) but appear in the
>> sample at an intermediate period.
>>
>> Will someone please help to solve this? Thanks in advance.
>>
>> Miguel.
>>
>>> Sarah, thank you for your help. I am very sorry for not having put my
>>> doubts in a sufficiently clear way. And given what you say about the
>>> way
>>> data is stored I have realized that there might be other problems
>>> around.
>>> I will try to be as clear as possible.
>>>
>>> My data is in panel data form. I write the example down again in the
>>> way
>>> my data is stored. As regards the example in my previous messages, I
>>> add
>>> two agents (6 and 7). Please note also that data referring to agent
>>> fifth
>>> is missing in some periods, but there is no line corresponding to those
>>> periods (this is what I had not taken into account so far):
>>>
>>> time agent score
>>> t1 1 0.8
>>> t2 1 1
>>> t3 1 1
>>> t4 1 1
>>> t5 1 1
>>> t6 1 1
>>>
>>> t1 2 0.8
>>> t2 2 0.8
>>> t3 2 1
>>> t4 2 1
>>> t5 2 1
>>> t6 2 1
>>>
>>> t1 3 0.8
>>> t2 3 0.8
>>> t3 3 0.8
>>> t4 3 1
>>> t5 3 1
>>> t6 3 1
>>>
>>> t1 4 0.8
>>> t2 4 0.8
>>> t3 4 0.8
>>> t4 4 0.8
>>> t5 4 1
>>> t6 4 1
>>>
>>> t6 5 1
>>>
>>> t1 6 0.8
>>> t2 6 0.8
>>> t3 6 0.8
>>> t4 6 0.8
>>> t5 6 1
>>> t6 6 1
>>>
>>> t1 7 0.8
>>> t2 7 1
>>> t3 7 1
>>> t4 7 0.8
>>> t5 7 0.8
>>> t6 7 1
>>>
>>> Having said that, I want to split the sample in different ways. First,
>>> I
>>> want to focus on agents that overpass a threshold (eg, 0.9) since the
>>> first period and are always above the threhold (ie, agent 1). Second, I
>>> want to take agents that overpass the threshold at or before a
>>> particular
>>> period (eg, t3) and since then they are above the threshold (ie, agents
>>> 1-4). Third, agents that overpass the threshold at or after a
>>> particular
>>> period (eg, t5) and since then they are above the threshold (ie, agents
>>> 5
>>> and 6). Please note that agent 7 is not included in any of the previous
>>> subsamples.
>>>
>>> Thank you very much for your help. And once again, I am sorry for not
>>> having been clear enough.
>>>
>>> Miguel.
>>>
>>>
>>>
>>>
>>>> Miguel,
>>>> This discussion would be clearer if your examples actually made it
>>>> clear
>>>> exactly what your data looks like.
>>>>
>>>> Your example below looks like you have data in wide form. The
>>>> solution
>>>> that Nick suggested is for data in long form. It's easy enough to
>>>> move
>>>> between the two, but it's hard to make concrete suggestions about how
>>>> to
>>>> proceed when we don't know what the actual data looks like.
>>>>
>>>> I'll start by assuming, as Nick does, that your data is actually in
>>>> long
>>>> form and you have three variables: agent, period, score. I'll further
>>>> assume that for agent 5 you simply have no records for periods 1-5
>>>> (that
>>>> is, you do not have records for those periods with missing values for
>>>> score). If that's true, you can simply calculate the first period
>>>> that
>>>> appears in the data and use that as part of your inclusion criteria.
>>>> Something like the following will keep only those agents who first
>>>> appear
>>>> in the data before period 4:
>>>> egen firstperiod=min(period), by(agent)
>>>> drop if firstperiod>4
>>>>
>>>> Or maybe you only want to include agents who start in period 1? It's
>>>> unclear from your question. In that case you'd -drop if
>>>> firstperiod>1-
>>>>
>>>> For your second example, trying to look at the last time periods, I
>>>> think
>>>> you need to clarify what your actual criteria is. You say "I would
>>>> like
>>>> to select those agents that overpass the threshold of 0.9 in any the
>>>> last
>>>> two periods and are over the threshold until the end of the sample
>>>> period
>>>> (ie, agents 4 and 5)." To my eye, that criteria includes all agents
>>>> except agent 6. You're unlikely to get the results you hope for
>>>> unless
>>>> you are precise in the criteria you're using.
>>>>
>>>> Hope that helps.
>>>>
>>>> -Sarah
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Miguel
>>>> Angel
>>>> Duran Munoz
>>>> Sent: Wednesday, May 22, 2013 11:00 AM
>>>> To: [email protected]
>>>> Subject: Re: st: Observations that keep a feature... an additional
>>>> problem
>>>>
>>>> I use the same example than in a previous message, but I add a fifth
>>>> agent
>>>> that joins in period six:
>>>>
>>>>
>>>> Agent 1: 1 1 1 1 1 1...
>>>> Agent 2: 0.8 1 1 1 1 1...
>>>> Agent 3: 0.8 0.8 0.8 1 1 1...
>>>> Agent 4: 0.8 0.8 0.8 0.8 1 1...
>>>> Agent 5: . . . . . 1...
>>>>
>>>> I want to keep just the first three agents.
>>>>
>>>>
>>>> If you don't mind, Nick, I would also like to ask you the following. I
>>>> take the same example, but I focus on the last periods.
>>>>
>>>> Agent 1: ...1 1 1 1 1 1
>>>> Agent 2: ...0.8 1 1 1 1 1
>>>> Agent 3: ...0.8 0.8 0.8 1 1 1
>>>> Agent 4: ...0.8 0.8 0.8 0.8 1 1
>>>> Agent 5: ... . . . . . 1
>>>> Agent 6: ...0.8 0.8 0.8 0.8 1 0.8
>>>>
>>>> I would like to select those agents that overpass the threshold of 0.9
>>>> in
>>>> any the last two periods and are over the threshold until the end of
>>>> the
>>>> sample period (ie, agents 4 and 5).
>>>> I have tried to modify the commands that you have suggested me before,
>>>> but
>>>> I have not been able to get the right selection. Would you mind
>>>> helping
>>>> me
>>>> with this? Thank you very much.
>>>>
>>>>> I can't follow this. I see only "the rules select too many agents".
>>>>>
>>>>> You tell me your precise rules and I will try to think of code to
>>>>> implement them.
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 22 May 2013 18:16, Miguel Angel Duran Munoz <[email protected]>
>>>>> wrote:
>>>>>> Nick, after reducing the sample using your suggestion, I have
>>>>>> checked
>>>>>> the number of agents that there are per period. And the number is
>>>>>> increasing in time. I guess this is due to the fact that agents
>>>>>> joining the sample as time goes by and satisfying the requirement of
>>>>>> being above the threshold are not excluded. Is there any trick to
>>>>>> avoid including them? Thanks again.
>>>>>>
>>>>>>> Assuming variable names
>>>>>>>
>>>>>>> agent period score
>>>>>>>
>>>>>>> it seems that you want something like
>>>>>>>
>>>>>>> bysort agent (period) : gen first3 = _n < 4
>>>>>>>
>>>>>>> egen max_first3 = max(score / first3), by(agent)
>>>>>>>
>>>>>>> egen min_rest = min(score / !first3), by(agent)
>>>>>>>
>>>>>>> keep if max_first3 > 0.9 & min_rest > 0.9
>>>>>>>
>>>>>>> For the division trick in the -egen- call see e.g.
>>>>>>>
>>>>>>> http://www.stata.com/statalist/archive/2013-03/msg00917.html
>>>>>>>
>>>>>>> (reference included therein).
>>>>>>>
>>>>>>> Nick
>>>>>>> [email protected]
>>>>>>>
>>>>>>>
>>>>>>> On 22 May 2013 15:03, Miguel Angel Duran Munoz <[email protected]>
>>>>>>> wrote:
>>>>>>>> Nick, thanks for your help. I hope you can help me with another
>>>>>>>> doubt.
>>>>>>>> For
>>>>>>>> a similar analysis to that of my first message, assume I want to
>>>>>>>> keep those agents that that have overpass the threshold before a
>>>>>>>> certain period and then have been over it in the rest of the
>>>>>>>> sample
>>>>>>>> period.
>>>>>>>>
>>>>>>>> To illustrate the idea, consider the following (data refer to
>>>>>>>> consecutive periods and the threshold is, eg, 0.9):
>>>>>>>>
>>>>>>>> Agent 1: 1 1 1 1 1...
>>>>>>>> Agent 2: 0.8 1 1 1 1...
>>>>>>>> Agent 3: 0.8 0.8 0.8 1 1...
>>>>>>>> Agent 4: 0.8 0.8 0.8 0.8 1...
>>>>>>>>
>>>>>>>> I want to keep the first three agents because they have overpassed
>>>>>>>> the threshold before period 4 and then have been over the
>>>>>>>> threshold
>>>>>>>> in the rest of the sample period, but I do not want to keep agent
>>>>>>>> 4.
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>> Miguel.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Correct on -keep-. Sorry about that.
>>>>>>>>>
>>>>>>>>> The -sort- order
>>>>>>>>>
>>>>>>>>> bysort entity (const_a) :
>>>>>>>>>
>>>>>>>>> ensures that -const_a[1]- is the lowest for each agent, not the
>>>>>>>>> first.
>>>>>>>>> If the lowest value for each agent is above the threshold, then
>>>>>>>>> all the observations for that agent are above.
>>>>>>>>> Nick
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21 May 2013 23:16, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> Thanks, Nick. I guess you mean -keep- instead of -drop-.
>>>>>>>>>> Nevertheless,
>>>>>>>>>> the
>>>>>>>>>> command that you suggest would not guarantee that I keep the
>>>>>>>>>> agents that have been above the threhsold for the whole sample
>>>>>>>>>> period (ie, I would be including agents that were above the
>>>>>>>>>> threshold in the first period and then might have been above or
>>>>>>>>>> below it).
>>>>>>>>>>
>>>>>>>>>>> Sounds like
>>>>>>>>>>>
>>>>>>>>>>> bysort entity (const_a) : drop if const_a[1] > 0.09716
>>>>>>>>>>>
>>>>>>>>>>> Nick
>>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>> On 21 May 2013 23:01, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi, Statalisters. I want to focus on agents in my dataset that
>>>>>>>>>>>> have a particular feature; specifically, for those agents, and
>>>>>>>>>>>> for each and every period (out of 64), the value of a variable
>>>>>>>>>>>> (const_a) is larger than a particular threshold (0.097116). I
>>>>>>>>>>>> have done what I show below.
>>>>>>>>>>>> Nevertheless, I have realized that some of my agents are not
>>>>>>>>>>>> in
>>>>>>>>>>>> the sample since the first period, so what I am doing would
>>>>>>>>>>>> mistakenly eliminate them. Will anyone help to solve this
>>>>>>>>>>>> problem? Thanks in advance.
>>>>>>>>>>>>
>>>>>>>>>>>> bysort entity (date2): gen obs=_n drop if const_a<0.097116 by
>>>>>>>>>>>> entity: drop if obs[_N]<64
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/