Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Collapse command
From
Eric Booth <[email protected]>
To
"<[email protected]>" <[email protected]>
Subject
Re: st: Collapse command
Date
Fri, 4 Mar 2011 21:12:50 +0000
<>
I am puzzled by what you mean when you say it just ends. A few questions:
Does Stata print the line "end of do file" line after your last line? If there is no message and you really are working on with ~ 30 G of data on a machine with at least 30G of RAM, I wonder if Stata is still trying to perform the collapse (which could take a while with that many obs). If you don't have 30G of RAM and you opened a 30G dataset, then it could take a really long time (is Stata still trying to run the command (do you see the spinning wheel in the bottom right corner of the Results window (assuming, based on the filepath in your code, that you are using a Mac) , or can you enter more commands in the command window and they run? ).
I suspect you know that it could take a really long time which is why you -set virtual on- (although you should realize what -set virtual on- is (not) doing: http://www.stata.com/statalist/archive/2007-06/msg00875.html )
Does the data properly -collapse- (that is, if you -browse- the data, are they collapsed) and you just don't get the -list- output you expected?
Try reducing the size of your dataset (-drop- or -sample- some of your obs) and then -collapse- it to see if what happens.
Try running this example and see if it works and if so, what is different in this example from your actual dataset (besides the size)?
*******
clear
inp str11(campus) year studentid y_red
"001903001" 2001 1 99
"001903001" 2002 1 90
"001903023" 2001 101 88
"001903023" 2001 100 55
"001903002" 2001 100 199
"001903002" 2002 100 159
end
destring campus, replace
format campus %09.0f
/* note:
assuming you're using AEIS/PEIMS data, you can replace "format campus %20.0f"
with the command above to preserve the leading zeroes in the campus ids,
but I prefer to keep them as string variables
*/
collapse (count) y_red_count=y_red, by(campus year)
list
********
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
On Mar 4, 2011, at 2:33 PM, <[email protected]>
wrote:
> Hello, I have a huge data set with student level data.
>
> I have been trying to collapse the data set at the school level, first time I did it, it worked, but that only one time.
>
> Here's the code:
>
> set mem 30g
>
> set virtual on
>
> use /home/jpellerano/scores_2003-2010_cleaned.dta
>
> destring campus, replace
>
> format campus %20.0f
>
> collapse (count) y_red_count=y_red, by(campus year)
>
> list
>
>
> After reching the collapse line STATA ends the do file.
>
> I'll appreciate any suggestion.
>
> Thanks,
>
> Jose A. Pellerano
> Texas A&M University
> Dpt of Economics
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/