Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: doing the comparison for pairs of years

From	Navid Asgari <[email protected]>
To	[email protected]
Subject	Re: st: RE: doing the comparison for pairs of years
Date	Sun, 13 May 2012 14:46:58 +0800

It works well, Nick. Thanks for your quick and helpful advice

On Sun, May 13, 2012 at 2:07 PM, Navid Asgari <[email protected]> wrote:
> Thanks Nick,
>
> Yes, I missed your posting... there were some problem with my
> subscription into the statalist...
>
> I am running the code... Thanks a lot!
>
> Navid
>
> On Sat, May 12, 2012 at 11:48 PM, Nick Cox <[email protected]> wrote:
>> You missed my correction at
>>
>> http://www.stata.com/statalist/archive/2012-05/msg00484.html
>>
>> from which the suggested code follows as
>>
>> contract company Year P , zero
>> bysort Company P (Y) : gen new =  _freq > 0 & (_n == 1 |  _freq[_n-1]== 0)
>> tab Company Y if new
>>
>> Do note that if any case "doesn't work" is difficult to respond to
>> without seeing any details of what that means.
>>
>> Nick
>>
>> On Sat, May 12, 2012 at 1:04 PM, Navid Asgari <[email protected]> wrote:
>>> Hi Nick,
>>>
>>> Thanks,
>>>
>>> Yes, I made a mistake... after change it worked.
>>>
>>> Now, I am facing another problem. If I want to do the same thing
>>> (comparing values of "P" across years) for each group of rows (grouped
>>> by a variables called, say, "Company"), the following code doesn't
>>> work:
>>>
>>> contract company Year P , zero
>>> bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]== 0)
>>> tab Company Y if new
>>>
>>>
>>> Sorry for frequent question. I am an Stata newbie
>>>
>>> ---------------------+
>>>     |  company   Year   P |
>>>     |---------------------|
>>>  1. | Company1   1995   A |
>>>  2. | Company1   1995   A |
>>>  3. | Company1   1995   A |
>>>  4. | Company1   1995   A |
>>>  5. | Company1   1995   B |
>>>     |---------------------|
>>>  6. | Company1   1995   C |
>>>  7. | Company1   1995   D |
>>>  8. | Company1   1995   E |
>>>  9. | Company1   1996   A |
>>>  10. | Company1   1996   A |
>>>     |---------------------|
>>>  11. | Company1   1996   A |
>>>  12. | Company1   1996   A |
>>>  13. | Company1   1996   B |
>>>  14. | Company1   1996   C |
>>>  15. | Company1   1996   H |
>>>     |---------------------|
>>>  16. | Company1   1996   M |
>>>  17. | Company2   1993   A |
>>>  18. | Company2   1993   B |
>>>  19. | Company2   1993   G |
>>>  20. | Company2   1993   G |
>>>     |---------------------|
>>>  21. | Company2   1993   K |
>>>  22. | Company2   1993   M |
>>>  23. | Company2   1998   C |
>>>  24. | Company2   1998   K |
>>>  25. | Company2   1998   L |
>>>     |---------------------|
>>>  26. | Company2   1998   M |
>>>     +---------------------+
>>
>>
>> On Sat, May 12, 2012 at 4:53 PM, Nick Cox <[email protected]> wrote:
>>>> My code compares each year with the previous, which is I think exactly what
>>>> you ask, so I don't see any sense in which the logic fails.
>>>>
>>>> I think you need to substantiate your criticism.
>>
>>
>> On 12 May 2012, at 09:27, Navid Asgari <[email protected]> wrote:
>>>>
>>>>> Hi Nick,
>>>>>
>>>>> Thanks for your quick and helpful response,
>>>>>
>>>>> The logic that you suggested works fine for comparison across only two
>>>>> years. However, if I want to compare new "P" values in ,say, 1995 with
>>>>> values of "P" in 1994 and then do the same but comparing only 1996
>>>>> with 1995 and then 1997 with 1996, the logic fails.
>>>>>
>>>>> I was thinking of a "foreach" loop over "Year" can work. But, it does
>>>>> not...
>>>>>
>>>>> What other ways are possible?
>>>>>
>>>>> Thanks,
>>>>> Navid
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> I can't make sense of your -reshape-. Your structure is already -long-
>>>>> and there is just one variable that is -P*-. As it is, the -reshape-
>>>>> command is illegal in the context you give. It seems quite unneeded,
>>>>> so I start afresh.
>>>>>
>>>>> I first read in your dataset.
>>>>>
>>>>> . input      Year str1  P
>>>>>
>>>>>         Year          P
>>>>>  1.  1995   A
>>>>>  2.  1995   B
>>>>>  3.  1995   A
>>>>>  4.  1995   C
>>>>>  5.  1995   D
>>>>>  6.  1995   A
>>>>>  7.  1995   E
>>>>>  8.  1995   A
>>>>>  9.  1996   B
>>>>> 10.  1996   A
>>>>> 11.  1996   A
>>>>> 12.  1996   M
>>>>> 13.  1996   A
>>>>> 14.  1996   H
>>>>> 15.  1996   A
>>>>> 16.  1996   C
>>>>> 17. end
>>>>>
>>>>> Then we reduce the dataset to a set of counts.
>>>>>
>>>>> . contract Year P , zero
>>>>>
>>>>> . l
>>>>>
>>>>>    +------------------+
>>>>>    | Year   P   _freq |
>>>>>    |------------------|
>>>>>  1. | 1995   A       4 |
>>>>>  2. | 1995   B       1 |
>>>>>  3. | 1995   C       1 |
>>>>>  4. | 1995   D       1 |
>>>>>  5. | 1995   E       1 |
>>>>>    |------------------|
>>>>>  6. | 1995   H       0 |
>>>>>  7. | 1995   M       0 |
>>>>>  8. | 1996   A       4 |
>>>>>  9. | 1996   B       1 |
>>>>> 10. | 1996   C       1 |
>>>>>    |------------------|
>>>>> 11. | 1996   D       0 |
>>>>> 12. | 1996   E       0 |
>>>>> 13. | 1996   H       1 |
>>>>> 14. | 1996   M       1 |
>>>>>    +------------------+
>>>>>
>>>>> Then a -P- is new if it wasn't observed the previous year. Notice that
>>>>> I define "new" as including the first time any value of -P- is
>>>>> observed.
>>>>>
>>>>> . bysort P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1] == 0)
>>>>>
>>>>> . l
>>>>>
>>>>>    +------------------------+
>>>>>    | Year   P   _freq   new |
>>>>>    |------------------------|
>>>>>  1. | 1995   A       4     1 |
>>>>>  2. | 1996   A       4     0 |
>>>>>  3. | 1995   B       1     1 |
>>>>>  4. | 1996   B       1     0 |
>>>>>  5. | 1995   C       1     1 |
>>>>>    |------------------------|
>>>>>  6. | 1996   C       1     0 |
>>>>>  7. | 1995   D       1     1 |
>>>>>  8. | 1996   D       0     0 |
>>>>>  9. | 1995   E       1     1 |
>>>>> 10. | 1996   E       0     0 |
>>>>>    |------------------------|
>>>>> 11. | 1995   H       0     1 |
>>>>> 12. | 1996   H       1     1 |
>>>>> 13. | 1995   M       0     1 |
>>>>> 14. | 1996   M       1     1 |
>>>>>    +------------------------+
>>>>>
>>>>> Then we count how many new categories there are each year.
>>>>>
>>>>> . tab Y if new
>>>>>
>>>>>      Year |      Freq.     Percent        Cum.
>>>>> ------------+-----------------------------------
>>>>>      1995 |          7       77.78       77.78
>>>>>      1996 |          2       22.22      100.00
>>>>> ------------+-----------------------------------
>>>>>     Total |          9      100.00
>>>>>
>>>>> The generalization to include -Company- should be something like this,
>>>>> but I didn't test it.
>>>>>
>>>>> contract Company Year P , zero
>>>>> bysort Company P (Y) : gen new =  _n == 1 | (_freq > 0 & _freq[_n-1]
>>>>> == 0) tab Company Y if new
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>> Navid Asgari
>>>>>
>>>>> I have a dataset which looks like this:
>>>>>
>>>>>
>>>>>     Year   P |
>>>>>    |----------|
>>>>>  1. | 1995   A |
>>>>>  2. | 1995   B |
>>>>>  3. | 1995   A |
>>>>>  4. | 1995   C |
>>>>>  5. | 1995   D |
>>>>>    |----------|
>>>>>  6. | 1995   A |
>>>>>  7. | 1995   E |
>>>>>  8. | 1995   A |
>>>>>  9. | 1996   B |
>>>>> 10. | 1996   A |
>>>>>    |----------|
>>>>> 11. | 1996   A |
>>>>> 12. | 1996   M |
>>>>> 13. | 1996   A |
>>>>> 14. | 1996   H |
>>>>> 15. | 1996   A |
>>>>>    |----------|
>>>>> 16. | 1996   C
>>>>>
>>>>> I use the following to count number of new values under variable "P"
>>>>> that exists in the year 1996, but not 1995:
>>>>>
>>>>> gen id = _n
>>>>>>
>>>>>> reshape long P , i(id)
>>>>>> bysort P (Year id) : gen seq = _n
>>>>>
>>>>>
>>>>> Count if Year==1996 & seq==1
>>>>>
>>>>> Now I want to do the same thing for more than 2 successive years (e.g.
>>>>> 1993,1994,1995,1996). So, values of variable "P" in every year will be
>>>>> compared with the value of its previous year (1994 to 1993, then 1995
>>>>> to 1994, and so forth....
>>>>>
>>>>> The complexity of this lies in the fact that this comparison has to be
>>>>> done by each unique value of another variable and the starting year
>>>>> and ending year varies in each group. In fact this is how the
>>>>> structure of the real data looks like:
>>>>>
>>>>>
>>>>>    | Year   P    company |
>>>>>    |---------------------|
>>>>>  1. | 1995   A   Company1 |
>>>>>  2. | 1995   B   Company1 |
>>>>>  3. | 1995   A   Company1 |
>>>>>  4. | 1995   C   Company1 |
>>>>>  5. | 1995   D   Company1 |
>>>>>    |---------------------|
>>>>>  6. | 1995   A   Company1 |
>>>>>  7. | 1995   E   Company1 |
>>>>>  8. | 1995   A   Company1 |
>>>>>  9. | 1996   B   Company1 |
>>>>> 10. | 1996   A   Company1 |
>>>>>    |---------------------|
>>>>> 11. | 1996   A   Company1 |
>>>>> 12. | 1996   M   Company1 |
>>>>> 13. | 1996   A   Company1 |
>>>>> 14. | 1996   H   Company1 |
>>>>> 15. | 1996   A   Company1 |
>>>>>    |---------------------|
>>>>> 16. | 1996   C   Company1 |
>>>>> 17. | 1993   G   Company2 |
>>>>> 18. | 1993   G   Company2 |
>>>>> 19. | 1993   M   Company2 |
>>>>> 20. | 1993   K   Company2 |
>>>>>    |---------------------|
>>>>> 21. | 1993   A   Company2 |
>>>>> 22. | 1993   B   Company2 |
>>>>> 23. | 1994   C   Company2 |
>>>>> 24. | 1994   M   Company2 |
>>>>> 25. | 1994   K   Company2 |
>>>>>    |---------------------|
>>>>> 26. | 1994   L   Company2 |
>>>>>    +---------------------+
>>>>>
>>>>> So for every group under variable company the code will count number
>>>>> of new values of variable "P" in every year that did not exist a year
>>>>> before...
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: RE: doing the comparison for pairs of years
  - From: Navid Asgari <[email protected]>
- Re: st: RE: doing the comparison for pairs of years
  - From: Nick Cox <[email protected]>
- Re: st: RE: doing the comparison for pairs of years
  - From: Navid Asgari <[email protected]>
- Re: st: RE: doing the comparison for pairs of years
  - From: Nick Cox <[email protected]>
- Re: st: RE: doing the comparison for pairs of years
  - From: Navid Asgari <[email protected]>

Prev by Date: Re: st: RE: doing the comparison for pairs of years
Next by Date: Re: st: Scatterplot matrix
Previous by thread: Re: st: RE: doing the comparison for pairs of years
Next by thread: st: estat class for stcox command
Index(es):
- Date
- Thread