Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Summary statistics for panel data
From
Joe Canner <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: RE: Summary statistics for panel data
Date
Thu, 8 Aug 2013 16:31:18 +0000
John,
If your calculations are primarily based on looking forward or back within a given race day, one way to do this would be to use -reshape wide- to put all of the data for each race into the same observation:
. reshape wide odds fav_win overround, i(meeting date) j(race_no)
Then you can do any number of calculations very easily:
. summ overround4 if fav_win1==1, det
. summ overround4-overround6 if fav_win1==1 | fav_win2==1, det
etc.
There are probably more elegant ways to do this using -bysort- and -egen-, but personally I think this is more flexible and understandable (as a relatively new user of Stata myself).
Regards,
Joe Canner
Johns Hopkins University School of Medicine
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of John Kenny
Sent: Thursday, August 08, 2013 12:14 PM
To: [email protected]
Subject: st: Summary statistics for panel data
Dear Statalist
I'm relatively new to Stata and I cannot find a standard way too solve my problem and I may need to write a .*do file. I'm dealing with a very large data set that has about 20 variable that outlines horse racing results with 711,000 observations.
These are some of the variables that are outlined in the data set.
For each race there is a variable that says where the race is held [ 'Meeting' ], the date and time, ['date', 'time'] , the odds given for the winning horse ['odds'], the race number at that meeting for a given day ['race_no' ], whether the favourite won that race ['fav_win'] and the overround which is signifies the bookmakers profit [ 'overround' ]. These variables are listed as follows:
meeting date time odds fav_win overround race_no
Aintree 24-Oct-04 13:45 5 0 108.330 1
Aintree 24-Oct-04 14:15 3 0 106.053 2
Aintree 24-Oct-04 14:50 14 0 107.303 3
Aintree 24-Oct-04 15:20 1.5 1 106.933 4
Aintree 24-Oct-04 15:55 9 0 112.435 5
Aintree 24-Oct-04 16:30 1.88 0 116.008 6
Aintree 20-Nov-04 12:45 0.57 1 107.706 1
Aintree 20-Nov-04 13:20 2 1 107.996 2
Aintree 20-Nov-04 13:50 10 0 107.218 3
Aintree 20-Nov-04 14:20 7 1 119.689 4
Aintree 20-Nov-04 14:55 1.5 1 106.324 5
Aintree 20-Nov-04 15:25 0.33 1 105.149 6
This list is sorted by meeting date and race_no. There is numerous meetings over a large time period. What I am trying to analyse is the overround by getting the mean and standard deviation depending on the outcomes of previous races at that meeting on a certain day. To be more precise I would like to get mean and standard deviation of the overround for each race depending on whether the favourite won some of the previous races at that meeting on that date.
Examples of this would include getting the mean and standard deviation of the overround for the fourth race (race_no==4) if the favourite won (i.e fav_win==1) the first race (race_no==1) at that same meeting on that day. Another example would be at a given race meeting on a certain date if the favourite wins the first and the second race what is the mean and standard deviation of the overround for the 4th, 5th or 6th race.
What I have tried is using the summarize command and try and get the mean and standard deviation of the 'overround' if 'race_no'==1 & 'favorite'==1. However every combination of variables I tried using it always just got the mean of the 'overround' for race 1 if the favourite won and not the mean of the second race or third race if the favourite won the first race.
Any help would be greatly appreciated as I have been stuck on this for a while.
Thanks in advance.
John
Any Further help on this would be greatly appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/