Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Observations that keep a feature...
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Observations that keep a feature...
Date
Fri, 24 May 2013 02:32:04 +0100
The idea of a spell in modest generality may be of help to you.
Setting aside what I learned of spells at a moderately well known
school in northern Britain, the program -tsspell- (SSC) and the 2007
article
SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/07 SJ 7(2):249--265 (no commands)
shows how to handle spells with complete control over
spell specification
http://www.stata-journal.com/sjpdf.html?articlenum=dm0029
provide independent accounts of some ideas for identifying spells.
Two kinds of spells seem relevant to your problem:
1. A spell initiated by a particular event (e.g. a change of government).
2. A spell defined by some condition being true throughout its length
(e.g. a rainy spell is one in which it rained every day).
I would look at -tsspell- first and then the SJ article.
Nick
[email protected]
On 23 May 2013 21:03, Miguel Angel Duran Munoz <[email protected]> wrote:
> Nick, this is what I am doing. I have a large sample, with about 7,000
> agents per period and about 60 periods. I am analyzing whether agents
> imitate each other. Once I have (statistically) confirmed that there is an
> imitation process going on, the next step is to analyze the differences
> between types of agents. In particular, between innovators (those who
> start following a rule of behavior right at the beginning of the process),
> nonadopters (those who never adotp the rule) and laggards (those who adopt
> the rule at a late period).
>
> This is why I need to split the sample the way I have described. I hope
> this helps to make it clear what I am doing. Thanks in advance.
>
>> This is getting very intricate to follow.
>>
>> As Sarah posted yesterday, more or less, we need examples.
>>
>> I worry on your behalf that you will have to explain your rules to
>> somebody reviewing your thesis/dissertation/report/paper and they are
>> going to ask you why you couldn't use much simpler rules.
>>
>> Nick
>> [email protected]
>>
>>
>> On 23 May 2013 18:43, Miguel Angel Duran Munoz <[email protected]> wrote:
>>> Nick and Sarah, thanks to your help I've been able to solve all but one
>>> of
>>> my problems. To select agents that are above the threshold after period
>>> 2,
>>> I've finally used:
>>>
>>> egen firstperiod = min(period), by(agent)
>>> drop if firstperiod > 2
>>> bysort agent (period): gen first2 = _n < 3
>>> egen min_rest = min(score / !first2), by(agent)
>>> keep if min_rest >= 0.9
>>>
>>> (the max condition that Nick suggested me is, I think, unnecessary)
>>>
>>> Nevertheless, I am not sure about how to select agents that overpass the
>>> threshold in the final periods (say at or after t3) and maintain over
>>> it.
>>> In principle, based on your suggestions, I thought of this:
>>>
>>> bysort agent (period): gen last=score[_N]
>>> bysort entity (date2): gen first2 = _n < 3
>>> egen min_rest = min(score / !first2), by(agent)
>>> keep if last>=0.9 & min_rest<=0.9
>>>
>>> Nevertheless, this implies that I am excluding agents that satisfy the
>>> criterion (overpassing the threshold at or after t3) but appear in the
>>> sample at an intermediate period.
>>>
>>> Will someone please help to solve this? Thanks in advance.
>>>
>>> Miguel.
>>>
>>>> Sarah, thank you for your help. I am very sorry for not having put my
>>>> doubts in a sufficiently clear way. And given what you say about the
>>>> way
>>>> data is stored I have realized that there might be other problems
>>>> around.
>>>> I will try to be as clear as possible.
>>>>
>>>> My data is in panel data form. I write the example down again in the
>>>> way
>>>> my data is stored. As regards the example in my previous messages, I
>>>> add
>>>> two agents (6 and 7). Please note also that data referring to agent
>>>> fifth
>>>> is missing in some periods, but there is no line corresponding to those
>>>> periods (this is what I had not taken into account so far):
>>>>
>>>> time agent score
>>>> t1 1 0.8
>>>> t2 1 1
>>>> t3 1 1
>>>> t4 1 1
>>>> t5 1 1
>>>> t6 1 1
>>>>
>>>> t1 2 0.8
>>>> t2 2 0.8
>>>> t3 2 1
>>>> t4 2 1
>>>> t5 2 1
>>>> t6 2 1
>>>>
>>>> t1 3 0.8
>>>> t2 3 0.8
>>>> t3 3 0.8
>>>> t4 3 1
>>>> t5 3 1
>>>> t6 3 1
>>>>
>>>> t1 4 0.8
>>>> t2 4 0.8
>>>> t3 4 0.8
>>>> t4 4 0.8
>>>> t5 4 1
>>>> t6 4 1
>>>>
>>>> t6 5 1
>>>>
>>>> t1 6 0.8
>>>> t2 6 0.8
>>>> t3 6 0.8
>>>> t4 6 0.8
>>>> t5 6 1
>>>> t6 6 1
>>>>
>>>> t1 7 0.8
>>>> t2 7 1
>>>> t3 7 1
>>>> t4 7 0.8
>>>> t5 7 0.8
>>>> t6 7 1
>>>>
>>>> Having said that, I want to split the sample in different ways. First,
>>>> I
>>>> want to focus on agents that overpass a threshold (eg, 0.9) since the
>>>> first period and are always above the threhold (ie, agent 1). Second, I
>>>> want to take agents that overpass the threshold at or before a
>>>> particular
>>>> period (eg, t3) and since then they are above the threshold (ie, agents
>>>> 1-4). Third, agents that overpass the threshold at or after a
>>>> particular
>>>> period (eg, t5) and since then they are above the threshold (ie, agents
>>>> 5
>>>> and 6). Please note that agent 7 is not included in any of the previous
>>>> subsamples.
>>>>
>>>> Thank you very much for your help. And once again, I am sorry for not
>>>> having been clear enough.
>>>>
>>>> Miguel.
>>>>
>>>>
>>>>
>>>>
>>>>> Miguel,
>>>>> This discussion would be clearer if your examples actually made it
>>>>> clear
>>>>> exactly what your data looks like.
>>>>>
>>>>> Your example below looks like you have data in wide form. The
>>>>> solution
>>>>> that Nick suggested is for data in long form. It's easy enough to
>>>>> move
>>>>> between the two, but it's hard to make concrete suggestions about how
>>>>> to
>>>>> proceed when we don't know what the actual data looks like.
>>>>>
>>>>> I'll start by assuming, as Nick does, that your data is actually in
>>>>> long
>>>>> form and you have three variables: agent, period, score. I'll further
>>>>> assume that for agent 5 you simply have no records for periods 1-5
>>>>> (that
>>>>> is, you do not have records for those periods with missing values for
>>>>> score). If that's true, you can simply calculate the first period
>>>>> that
>>>>> appears in the data and use that as part of your inclusion criteria.
>>>>> Something like the following will keep only those agents who first
>>>>> appear
>>>>> in the data before period 4:
>>>>> egen firstperiod=min(period), by(agent)
>>>>> drop if firstperiod>4
>>>>>
>>>>> Or maybe you only want to include agents who start in period 1? It's
>>>>> unclear from your question. In that case you'd -drop if
>>>>> firstperiod>1-
>>>>>
>>>>> For your second example, trying to look at the last time periods, I
>>>>> think
>>>>> you need to clarify what your actual criteria is. You say "I would
>>>>> like
>>>>> to select those agents that overpass the threshold of 0.9 in any the
>>>>> last
>>>>> two periods and are over the threshold until the end of the sample
>>>>> period
>>>>> (ie, agents 4 and 5)." To my eye, that criteria includes all agents
>>>>> except agent 6. You're unlikely to get the results you hope for
>>>>> unless
>>>>> you are precise in the criteria you're using.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> -Sarah
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of Miguel
>>>>> Angel
>>>>> Duran Munoz
>>>>> Sent: Wednesday, May 22, 2013 11:00 AM
>>>>> To: [email protected]
>>>>> Subject: Re: st: Observations that keep a feature... an additional
>>>>> problem
>>>>>
>>>>> I use the same example than in a previous message, but I add a fifth
>>>>> agent
>>>>> that joins in period six:
>>>>>
>>>>>
>>>>> Agent 1: 1 1 1 1 1 1...
>>>>> Agent 2: 0.8 1 1 1 1 1...
>>>>> Agent 3: 0.8 0.8 0.8 1 1 1...
>>>>> Agent 4: 0.8 0.8 0.8 0.8 1 1...
>>>>> Agent 5: . . . . . 1...
>>>>>
>>>>> I want to keep just the first three agents.
>>>>>
>>>>>
>>>>> If you don't mind, Nick, I would also like to ask you the following. I
>>>>> take the same example, but I focus on the last periods.
>>>>>
>>>>> Agent 1: ...1 1 1 1 1 1
>>>>> Agent 2: ...0.8 1 1 1 1 1
>>>>> Agent 3: ...0.8 0.8 0.8 1 1 1
>>>>> Agent 4: ...0.8 0.8 0.8 0.8 1 1
>>>>> Agent 5: ... . . . . . 1
>>>>> Agent 6: ...0.8 0.8 0.8 0.8 1 0.8
>>>>>
>>>>> I would like to select those agents that overpass the threshold of 0.9
>>>>> in
>>>>> any the last two periods and are over the threshold until the end of
>>>>> the
>>>>> sample period (ie, agents 4 and 5).
>>>>> I have tried to modify the commands that you have suggested me before,
>>>>> but
>>>>> I have not been able to get the right selection. Would you mind
>>>>> helping
>>>>> me
>>>>> with this? Thank you very much.
>>>>>
>>>>>> I can't follow this. I see only "the rules select too many agents".
>>>>>>
>>>>>> You tell me your precise rules and I will try to think of code to
>>>>>> implement them.
>>>>>>
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 22 May 2013 18:16, Miguel Angel Duran Munoz <[email protected]>
>>>>>> wrote:
>>>>>>> Nick, after reducing the sample using your suggestion, I have
>>>>>>> checked
>>>>>>> the number of agents that there are per period. And the number is
>>>>>>> increasing in time. I guess this is due to the fact that agents
>>>>>>> joining the sample as time goes by and satisfying the requirement of
>>>>>>> being above the threshold are not excluded. Is there any trick to
>>>>>>> avoid including them? Thanks again.
>>>>>>>
>>>>>>>> Assuming variable names
>>>>>>>>
>>>>>>>> agent period score
>>>>>>>>
>>>>>>>> it seems that you want something like
>>>>>>>>
>>>>>>>> bysort agent (period) : gen first3 = _n < 4
>>>>>>>>
>>>>>>>> egen max_first3 = max(score / first3), by(agent)
>>>>>>>>
>>>>>>>> egen min_rest = min(score / !first3), by(agent)
>>>>>>>>
>>>>>>>> keep if max_first3 > 0.9 & min_rest > 0.9
>>>>>>>>
>>>>>>>> For the division trick in the -egen- call see e.g.
>>>>>>>>
>>>>>>>> http://www.stata.com/statalist/archive/2013-03/msg00917.html
>>>>>>>>
>>>>>>>> (reference included therein).
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22 May 2013 15:03, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Nick, thanks for your help. I hope you can help me with another
>>>>>>>>> doubt.
>>>>>>>>> For
>>>>>>>>> a similar analysis to that of my first message, assume I want to
>>>>>>>>> keep those agents that that have overpass the threshold before a
>>>>>>>>> certain period and then have been over it in the rest of the
>>>>>>>>> sample
>>>>>>>>> period.
>>>>>>>>>
>>>>>>>>> To illustrate the idea, consider the following (data refer to
>>>>>>>>> consecutive periods and the threshold is, eg, 0.9):
>>>>>>>>>
>>>>>>>>> Agent 1: 1 1 1 1 1...
>>>>>>>>> Agent 2: 0.8 1 1 1 1...
>>>>>>>>> Agent 3: 0.8 0.8 0.8 1 1...
>>>>>>>>> Agent 4: 0.8 0.8 0.8 0.8 1...
>>>>>>>>>
>>>>>>>>> I want to keep the first three agents because they have overpassed
>>>>>>>>> the threshold before period 4 and then have been over the
>>>>>>>>> threshold
>>>>>>>>> in the rest of the sample period, but I do not want to keep agent
>>>>>>>>> 4.
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>> Miguel.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Correct on -keep-. Sorry about that.
>>>>>>>>>>
>>>>>>>>>> The -sort- order
>>>>>>>>>>
>>>>>>>>>> bysort entity (const_a) :
>>>>>>>>>>
>>>>>>>>>> ensures that -const_a[1]- is the lowest for each agent, not the
>>>>>>>>>> first.
>>>>>>>>>> If the lowest value for each agent is above the threshold, then
>>>>>>>>>> all the observations for that agent are above.
>>>>>>>>>> Nick
>>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 21 May 2013 23:16, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>> Thanks, Nick. I guess you mean -keep- instead of -drop-.
>>>>>>>>>>> Nevertheless,
>>>>>>>>>>> the
>>>>>>>>>>> command that you suggest would not guarantee that I keep the
>>>>>>>>>>> agents that have been above the threhsold for the whole sample
>>>>>>>>>>> period (ie, I would be including agents that were above the
>>>>>>>>>>> threshold in the first period and then might have been above or
>>>>>>>>>>> below it).
>>>>>>>>>>>
>>>>>>>>>>>> Sounds like
>>>>>>>>>>>>
>>>>>>>>>>>> bysort entity (const_a) : drop if const_a[1] > 0.09716
>>>>>>>>>>>>
>>>>>>>>>>>> Nick
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>
>>>>>>>>>>>> On 21 May 2013 23:01, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi, Statalisters. I want to focus on agents in my dataset that
>>>>>>>>>>>>> have a particular feature; specifically, for those agents, and
>>>>>>>>>>>>> for each and every period (out of 64), the value of a variable
>>>>>>>>>>>>> (const_a) is larger than a particular threshold (0.097116). I
>>>>>>>>>>>>> have done what I show below.
>>>>>>>>>>>>> Nevertheless, I have realized that some of my agents are not
>>>>>>>>>>>>> in
>>>>>>>>>>>>> the sample since the first period, so what I am doing would
>>>>>>>>>>>>> mistakenly eliminate them. Will anyone help to solve this
>>>>>>>>>>>>> problem? Thanks in advance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> bysort entity (date2): gen obs=_n drop if const_a<0.097116 by
>>>>>>>>>>>>> entity: drop if obs[_N]<64
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/