Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Conditional Variable means to new observation


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: Conditional Variable means to new observation
Date   Wed, 11 Sep 2013 17:13:12 +0000

(This by itself creates a new data set, so you don't need the preserve, restore, and append if you don't want to add the summary statistics to the end of the dataset)

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nickolas Lyell
Sent: Wednesday, September 11, 2013 12:56 PM
To: [email protected]
Subject: RE: st: Conditional Variable means to new observation

Actually,  I'm now trying to do a similar process of summations and means but I would like to use a bysort command with it.

My data looks something like this:

fips_num	NAICS	sector		GDP
1001		112	agriculture	240
1001		114	agriculture 	310
...
1002		112	agriculture	105
1002		114	agriculture	112


As such, I have over 3000 counties (id = fips_num) and would like to do what Phil suggested below.  Copied here with the relevant fields:

    preserve
        ds _all, has(type numeric)
        collapse (sum) `r(varlist)' if sector=="agriculture"
        tempfile totals
        save `"`totals'"'
    restore
    append using `"`totals'"'

but I would like to do it bysort fips_num.

Again, I realize this isn't something you normally want to do to a dataset, but seeing as I'm not planning on doing any kind of regression on this data, having the sums as new observations, rather than variables, would make my life much easier.  Optimally, I would then drop all the non-sum variables, so I don't necessarily need the to append the tempfile.  I could, rather, just use a command that would create a new file with all the "sector" sums by fips_num.


Nicholas Lyell
Research Associate
National Association of Counties | NACo
[email protected] | 202.661.8820


-----Original Message-----
From: Nickolas Lyell 
Sent: Wednesday, September 04, 2013 4:28 PM
To: '[email protected]'
Subject: RE: st: Conditional Variable means to new observation

Thanks Phil


Nicholas Lyell
Research Associate
National Association of Counties | NACo
[email protected] | 202.661.8820


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Phil Schumm
Sent: Wednesday, September 04, 2013 4:26 PM
To: [email protected]
Subject: Re: st: Conditional Variable means to new observation

On Sep 4, 2013, at 2:44 PM, Nickolas Lyell <[email protected]> wrote:
> I am still stuck with this problem.  I would like to make a new observation that encompasses Large counties.  I already have a dummy for large counties and would just like to sum all of each year's observations where Large County is equal to 1.
> 
> I want to do this so I can analyze all large counties the same way I am looking at each observation.  I don't plan on doing any regressions on this dataset, merely creating new variables to help me understand the data and I am using Stata because the programmatic approach of a do file makes sense to me when dealing with these very large files.
> 
> Could anyone please help.


While Maarten is in general correct (i.e., you should not try to use Stata like you would a spreadsheet program), it sounds to me like what you want is essentially to create a (possibly large) data table with a row of totals at the bottom.  If so, Stata can certainly oblige (and if you are not already facile with another programming language -- and even if you are -- it would not be unreasonable to use Stata for this task).  Here is probably the simplest approach:

    preserve
        ds _all, has(type numeric)
        collapse (sum) `r(varlist)' if LgMdSm==2
        tempfile totals
        save `"`totals'"'
    restore
    append using `"`totals'"'

Note that the call to -ds- is merely to avoid trying to sum any string variables you might have, which will generate an error.  In addition, may wish to add to this something like

    gen byte totals = (_n==_N)

so that you can easily identify (and exclude) this row later on, if necessary.  But as Maarten said, if you are going to go on to analyze these data in Stata, it would be best to delete this additional row(s) first, to avoid it causing problems.

Finally, note that this approach is quite general, as it will accommodate any of the types of summaries that -collapse- will generate.  And you may want to use -export excel- to move the resulting data table to Excel for visual inspection, where features such as splits will allow you to hold the total row fixed while scrolling through the rows of data.


-- Phil


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index