Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: 5 mil obs - travel time btw 2 places
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: 5 mil obs - travel time btw 2 places
Date
Mon, 2 Dec 2013 21:51:56 +0000
Quite so. You selected observations
if <whatever>
and then took a mean over those observations. Result is a single
number, necessarily, regardless of <whatever>, so long as at least one
observation is selected.
egen avgtime = mean(crselapsedtime), by(origin destination)
would give separate means for origin-destination pairs.
Nick
[email protected]
On 2 December 2013 19:43, Coleman, Greg <[email protected]> wrote:
> Thanks Nick - before I saw your note, I did try this;
>
> sort origin dest
>
> . egen avgtime=mean(crselapsedtime) if origin==origin[_n-1] & dest==dest[_n-1]
> (3535 missing values generated)
>
> BUT, the new var avgtime was the same for every single observation.
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Monday, December 02, 2013 1:33 PM
> To: [email protected]
> Subject: Re: st: 5 mil obs - travel time btw 2 places
>
> When you say "unique", you mean "distinct". On average, these "unique"
> pairs occur about 25,000 times each, not once.
>
> You end with the idea of a 'collapse'. Exactly!
>
> I'd start with looking at -contract- and -collapse- commands.
>
> As a footnote, look also at -groups- (SSC).
>
> Nick
> [email protected]
>
>
> On 2 December 2013 18:24, Coleman, Greg <[email protected]> wrote:
>> Hi Stata gurus -
>>
>> A pretty large data set (for me!) where there are just over 5m obs. Its flight data, where there are 29 variables.
>> 2 of the variables are origin, dest. I am struggling with coming up
>> with various statistics when these 2 are the same, meaning all the rows where origin=JFK and dest=SFO. (example) For instance, count the number of times they occurred (how many flights from JFK to SFO overall), the travel time for each of the trips that occurred, which day of the week is typically prone to delays going to SFO from JFK, etc etc.
>>
>> Can someone give me a hint on how to approach this? I tried foreach loops, while loops, using "by()", but I feel like I am not on track to an efficient method.
>> There are over 200 unique origin and dest throughout the 5m obs, so anyway I can 'collapse' this data so I can makes some graphs would also be great.
>>
>> Thanks!
>> Greg
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/