Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Summary statistics for panel data

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: RE: Summary statistics for panel data
Date	Thu, 8 Aug 2013 18:02:14 +0100

Joe's suggestion is a good one here.

It's also important to understand why John's approach isn't working,
developing a little on Joe's remarks and on some in
http://www.stata.com/statalist/archive/2013-08/msg00225.html in
response to a previous version of this question.

A qualifier such as

if race_no == 1 & favorite == 1

obliges Stata to look in the same observation only (and only in the
same observation), whereas John's question hinges on looking backwards
(and possibly forwards too).

If the qualifier had been say

if race_no[1] == 1 & favorite[1] == 1

then you would be over-riding the obligation. You would be spelling
out that the relevant value was in the first observation. That's just
an example, and -by:- could be used to enforce cross-reference within
blocks of observations. Many -egen- functions do something similar.

That said, run with -reshape-, as once you have -reshape-d it will
probably seem easier to formulate your problems. A warning, however: I
imagine that a -reshape- will produce many missing values, as I know
no rule that every meeting includes the same number of races, so watch
out.

Nick
[email protected]


On 8 August 2013 17:31, Joe Canner <[email protected]> wrote:
> John,
>
> If your calculations are primarily based on looking forward or back within a given race day, one way to do this would be to use -reshape wide- to put all of the data for each race into the same observation:
>
> . reshape wide odds fav_win overround, i(meeting date) j(race_no)
>
> Then you can do any number of calculations very easily:
>
> . summ overround4 if fav_win1==1, det
> . summ overround4-overround6 if fav_win1==1 | fav_win2==1, det
> etc.
>
> There are probably more elegant ways to do this using -bysort- and -egen-, but personally I think this is more flexible and understandable (as a relatively new user of Stata myself).

John Kenny

> I'm relatively new to Stata and I cannot find a standard way too solve my problem and I may need to write a .*do file. I'm dealing with a very large data set that has about 20 variable that outlines horse racing results with 711,000 observations.
>
> These are some of the variables that are outlined in the data set.
> For each race there is a variable that says where the race is held [ 'Meeting' ], the date and time, ['date', 'time'] , the odds given for the winning horse ['odds'], the race number at that meeting for a given day ['race_no' ], whether the favourite won that race ['fav_win'] and the overround which is signifies the bookmakers profit [ 'overround' ].  These variables are listed as follows:
>
> meeting date            time        odds fav_win overround race_no
> Aintree 24-Oct-04       13:45   5       0       108.330 1
> Aintree 24-Oct-04       14:15   3       0       106.053 2
> Aintree 24-Oct-04       14:50   14      0       107.303 3
> Aintree 24-Oct-04       15:20   1.5     1       106.933 4
> Aintree 24-Oct-04       15:55   9       0       112.435 5
> Aintree 24-Oct-04       16:30   1.88    0       116.008 6
> Aintree 20-Nov-04       12:45   0.57    1       107.706 1
> Aintree 20-Nov-04       13:20   2       1       107.996 2
> Aintree 20-Nov-04       13:50   10      0       107.218 3
> Aintree 20-Nov-04       14:20   7       1       119.689 4
> Aintree 20-Nov-04       14:55   1.5     1       106.324 5
> Aintree 20-Nov-04       15:25   0.33    1       105.149 6
>
>
> This list is sorted by meeting date and race_no. There is numerous meetings over a large time period. What I am trying to analyse is the overround by getting the mean and standard deviation depending on the outcomes of previous races at that meeting on a certain day. To be more precise I would like to get mean and standard deviation of the overround for each race depending on whether the favourite won some of the previous races at that meeting on that date.
>
> Examples of this would include getting the mean and standard deviation of the overround for the fourth race (race_no==4) if the favourite won (i.e fav_win==1) the first race (race_no==1) at that same meeting on that day. Another example would be at a given race meeting on a certain date if the favourite wins the first and the second race what is the mean and standard deviation of the overround for the 4th, 5th or 6th race.
>
> What I have tried is using the summarize command and try and get the mean and standard deviation of the 'overround' if 'race_no'==1 & 'favorite'==1. However every combination of variables I tried using it always just got the mean of the 'overround' for race 1 if the favourite won and not the mean of the second race or third race if the favourite won the first race.
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Summary statistics for panel data
  - From: John Kenny <[email protected]>
- st: RE: Summary statistics for panel data
  - From: Joe Canner <[email protected]>

Prev by Date: Re: st: option aweight together with rreg
Next by Date: st: Accessing Stata's parameter estimates from Python
Previous by thread: st: RE: Summary statistics for panel data
Next by thread: st: Assigning pweight for -svy- data, based on data loss not survey design
Index(es):
- Date
- Thread