Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: doing the comparison for pairs of years
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: RE: doing the comparison for pairs of years
Date
Sat, 12 May 2012 16:48:21 +0100
You missed my correction at
http://www.stata.com/statalist/archive/2012-05/msg00484.html
from which the suggested code follows as
contract company Year P , zero
bysort Company P (Y) : gen new = _freq > 0 & (_n == 1 | _freq[_n-1]== 0)
tab Company Y if new
Do note that if any case "doesn't work" is difficult to respond to
without seeing any details of what that means.
Nick
On Sat, May 12, 2012 at 1:04 PM, Navid Asgari <[email protected]> wrote:
> Hi Nick,
>
> Thanks,
>
> Yes, I made a mistake... after change it worked.
>
> Now, I am facing another problem. If I want to do the same thing
> (comparing values of "P" across years) for each group of rows (grouped
> by a variables called, say, "Company"), the following code doesn't
> work:
>
> contract company Year P , zero
> bysort Company P (Y) : gen new = _n == 1 | (_freq > 0 & _freq[_n-1]== 0)
> tab Company Y if new
>
>
> Sorry for frequent question. I am an Stata newbie
>
> ---------------------+
> | company Year P |
> |---------------------|
> 1. | Company1 1995 A |
> 2. | Company1 1995 A |
> 3. | Company1 1995 A |
> 4. | Company1 1995 A |
> 5. | Company1 1995 B |
> |---------------------|
> 6. | Company1 1995 C |
> 7. | Company1 1995 D |
> 8. | Company1 1995 E |
> 9. | Company1 1996 A |
> 10. | Company1 1996 A |
> |---------------------|
> 11. | Company1 1996 A |
> 12. | Company1 1996 A |
> 13. | Company1 1996 B |
> 14. | Company1 1996 C |
> 15. | Company1 1996 H |
> |---------------------|
> 16. | Company1 1996 M |
> 17. | Company2 1993 A |
> 18. | Company2 1993 B |
> 19. | Company2 1993 G |
> 20. | Company2 1993 G |
> |---------------------|
> 21. | Company2 1993 K |
> 22. | Company2 1993 M |
> 23. | Company2 1998 C |
> 24. | Company2 1998 K |
> 25. | Company2 1998 L |
> |---------------------|
> 26. | Company2 1998 M |
> +---------------------+
On Sat, May 12, 2012 at 4:53 PM, Nick Cox <[email protected]> wrote:
>> My code compares each year with the previous, which is I think exactly what
>> you ask, so I don't see any sense in which the logic fails.
>>
>> I think you need to substantiate your criticism.
On 12 May 2012, at 09:27, Navid Asgari <[email protected]> wrote:
>>
>>> Hi Nick,
>>>
>>> Thanks for your quick and helpful response,
>>>
>>> The logic that you suggested works fine for comparison across only two
>>> years. However, if I want to compare new "P" values in ,say, 1995 with
>>> values of "P" in 1994 and then do the same but comparing only 1996
>>> with 1995 and then 1997 with 1996, the logic fails.
>>>
>>> I was thinking of a "foreach" loop over "Year" can work. But, it does
>>> not...
>>>
>>> What other ways are possible?
>>>
>>> Thanks,
>>> Navid
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> I can't make sense of your -reshape-. Your structure is already -long-
>>> and there is just one variable that is -P*-. As it is, the -reshape-
>>> command is illegal in the context you give. It seems quite unneeded,
>>> so I start afresh.
>>>
>>> I first read in your dataset.
>>>
>>> . input Year str1 P
>>>
>>> Year P
>>> 1. 1995 A
>>> 2. 1995 B
>>> 3. 1995 A
>>> 4. 1995 C
>>> 5. 1995 D
>>> 6. 1995 A
>>> 7. 1995 E
>>> 8. 1995 A
>>> 9. 1996 B
>>> 10. 1996 A
>>> 11. 1996 A
>>> 12. 1996 M
>>> 13. 1996 A
>>> 14. 1996 H
>>> 15. 1996 A
>>> 16. 1996 C
>>> 17. end
>>>
>>> Then we reduce the dataset to a set of counts.
>>>
>>> . contract Year P , zero
>>>
>>> . l
>>>
>>> +------------------+
>>> | Year P _freq |
>>> |------------------|
>>> 1. | 1995 A 4 |
>>> 2. | 1995 B 1 |
>>> 3. | 1995 C 1 |
>>> 4. | 1995 D 1 |
>>> 5. | 1995 E 1 |
>>> |------------------|
>>> 6. | 1995 H 0 |
>>> 7. | 1995 M 0 |
>>> 8. | 1996 A 4 |
>>> 9. | 1996 B 1 |
>>> 10. | 1996 C 1 |
>>> |------------------|
>>> 11. | 1996 D 0 |
>>> 12. | 1996 E 0 |
>>> 13. | 1996 H 1 |
>>> 14. | 1996 M 1 |
>>> +------------------+
>>>
>>> Then a -P- is new if it wasn't observed the previous year. Notice that
>>> I define "new" as including the first time any value of -P- is
>>> observed.
>>>
>>> . bysort P (Y) : gen new = _n == 1 | (_freq > 0 & _freq[_n-1] == 0)
>>>
>>> . l
>>>
>>> +------------------------+
>>> | Year P _freq new |
>>> |------------------------|
>>> 1. | 1995 A 4 1 |
>>> 2. | 1996 A 4 0 |
>>> 3. | 1995 B 1 1 |
>>> 4. | 1996 B 1 0 |
>>> 5. | 1995 C 1 1 |
>>> |------------------------|
>>> 6. | 1996 C 1 0 |
>>> 7. | 1995 D 1 1 |
>>> 8. | 1996 D 0 0 |
>>> 9. | 1995 E 1 1 |
>>> 10. | 1996 E 0 0 |
>>> |------------------------|
>>> 11. | 1995 H 0 1 |
>>> 12. | 1996 H 1 1 |
>>> 13. | 1995 M 0 1 |
>>> 14. | 1996 M 1 1 |
>>> +------------------------+
>>>
>>> Then we count how many new categories there are each year.
>>>
>>> . tab Y if new
>>>
>>> Year | Freq. Percent Cum.
>>> ------------+-----------------------------------
>>> 1995 | 7 77.78 77.78
>>> 1996 | 2 22.22 100.00
>>> ------------+-----------------------------------
>>> Total | 9 100.00
>>>
>>> The generalization to include -Company- should be something like this,
>>> but I didn't test it.
>>>
>>> contract Company Year P , zero
>>> bysort Company P (Y) : gen new = _n == 1 | (_freq > 0 & _freq[_n-1]
>>> == 0) tab Company Y if new
>>>
>>> Nick
>>> [email protected]
>>>
>>> Navid Asgari
>>>
>>> I have a dataset which looks like this:
>>>
>>>
>>> Year P |
>>> |----------|
>>> 1. | 1995 A |
>>> 2. | 1995 B |
>>> 3. | 1995 A |
>>> 4. | 1995 C |
>>> 5. | 1995 D |
>>> |----------|
>>> 6. | 1995 A |
>>> 7. | 1995 E |
>>> 8. | 1995 A |
>>> 9. | 1996 B |
>>> 10. | 1996 A |
>>> |----------|
>>> 11. | 1996 A |
>>> 12. | 1996 M |
>>> 13. | 1996 A |
>>> 14. | 1996 H |
>>> 15. | 1996 A |
>>> |----------|
>>> 16. | 1996 C
>>>
>>> I use the following to count number of new values under variable "P"
>>> that exists in the year 1996, but not 1995:
>>>
>>> gen id = _n
>>>>
>>>> reshape long P , i(id)
>>>> bysort P (Year id) : gen seq = _n
>>>
>>>
>>> Count if Year==1996 & seq==1
>>>
>>> Now I want to do the same thing for more than 2 successive years (e.g.
>>> 1993,1994,1995,1996). So, values of variable "P" in every year will be
>>> compared with the value of its previous year (1994 to 1993, then 1995
>>> to 1994, and so forth....
>>>
>>> The complexity of this lies in the fact that this comparison has to be
>>> done by each unique value of another variable and the starting year
>>> and ending year varies in each group. In fact this is how the
>>> structure of the real data looks like:
>>>
>>>
>>> | Year P company |
>>> |---------------------|
>>> 1. | 1995 A Company1 |
>>> 2. | 1995 B Company1 |
>>> 3. | 1995 A Company1 |
>>> 4. | 1995 C Company1 |
>>> 5. | 1995 D Company1 |
>>> |---------------------|
>>> 6. | 1995 A Company1 |
>>> 7. | 1995 E Company1 |
>>> 8. | 1995 A Company1 |
>>> 9. | 1996 B Company1 |
>>> 10. | 1996 A Company1 |
>>> |---------------------|
>>> 11. | 1996 A Company1 |
>>> 12. | 1996 M Company1 |
>>> 13. | 1996 A Company1 |
>>> 14. | 1996 H Company1 |
>>> 15. | 1996 A Company1 |
>>> |---------------------|
>>> 16. | 1996 C Company1 |
>>> 17. | 1993 G Company2 |
>>> 18. | 1993 G Company2 |
>>> 19. | 1993 M Company2 |
>>> 20. | 1993 K Company2 |
>>> |---------------------|
>>> 21. | 1993 A Company2 |
>>> 22. | 1993 B Company2 |
>>> 23. | 1994 C Company2 |
>>> 24. | 1994 M Company2 |
>>> 25. | 1994 K Company2 |
>>> |---------------------|
>>> 26. | 1994 L Company2 |
>>> +---------------------+
>>>
>>> So for every group under variable company the code will count number
>>> of new values of variable "P" in every year that did not exist a year
>>> before...
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/