Oleksandr Shepotylo
> I have data on soccer game results. I want to generate
> variable that will reflect how a team played in the last 2 games:
> win_streak=sum of points in
> the last 2 games.
>
> Simple example, with 4 teams and 3 rounds:
>
> Round Home_team Away_team Points_home_team Points_away_team
> 1 A B
> 3
> 0
> 1 C D
> 1
> 1
> 2 B D
> 0
> 3
> 2 A C
> 1
> 1
> 3 D A
> 3
> 0
> 3 B C
> 1
> 1
>
> Then I want to create the variable:
>
> Round Team Win_streak
> 3 A 4
> 3 B 0
> 3 C 2
> 3 D 4
>
>
> The problem is that in the data a team could be in column 2
> or column 3
> depending on playing home or away. Also, when I add points
> I should check
> if I look at 4th or 5th column. Therefore, I can not just
> use: by sort team
> (round): egen win_streak= points[_n-1]+points[_n-2].
Oleksandr Talavera posted a solution:
> Try the following:
> ****************
> * soccer.dta has your data
> use soccer, clear
> list
> keep round h_team h_points
> rename h_team team
> rename h_points points
> * additional dataset soccerH that contains home team data
> save soccerH, replace
> use soccer, clear
> keep round a_team a_points
> rename a_team team
> rename a_points points
> append using soccerH
> egen teams=group(team)
> tsset teams round
> g win_streak = L.points + L2.points
> sort round team
> list
Here is a similar approach, but one which
is done in place with any file manipulation:
expand 2
bysort Round Home : gen Team = cond(_n==1, Home, Away)
bysort Round Home : gen points = cond(_n==1, Points_home, Points_away)
encode Team, gen(team)
tsset team Round
gen win_streak = L.points + L2.points
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/