Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: avoid collapse and yet get matrix of unit specific means in panel data


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: avoid collapse and yet get matrix of unit specific means in panel data
Date   Thu, 14 Sep 2006 18:02:04 +0100

Thanks for the clarification, but what I said still
holds as far as Stata is concerned. Just because 
what you want is a matrix doesn't mean that you 
are best off producing it as a Stata matrix. 

However, if you find -tabstat- convenient and want 
to grab its output as a single matrix, then you are 
not alone and -tabstatmat- on SSC purports to solve this problem. 

Alternatively, -statsmat- from SSC by Kit Baum and friend
may get you there a little more directly. 

As you learn more about Stata programming, these canned
solutions from others will appeal until you learn how
to do it the way you really want yourself. 

Nick 
[email protected] 

Tom Boonen
 
> Thank you for this detailed answer Nick Cox.
> 
> > So, are there other solutions? I am going to guess that the 
> word "matrix"
> > is a red herring here. You do want something that can be 
> displayed as
> > a matrix, but it doesn't sound as if you want to do anything with it
> > qua matrix, as in invert it or get the eigenvalues. If 
> that's wrong, do
> > explain why.
> 
> Unfortunately, I really meant to get the unit means in a 
> "matrix", that is why:
> 
> tabstat  `varlist',  by("`pvar'") statis(mean) save
> 
> seemed so appealing to me. The funny thing is that this command
> actually gets me exactly the matrix that I want on the screen, but I
> cannot grab it and put it into a matrix directly (using something like
> matcell()) without again going for a for loop.
> 
> The reason why I want a matrix is that I for my program I am only
> using Stata as a preprocessors to create matrices from the data, which
> I then will have to pass to a C plugin that implements my estimator
> (which is based on matrix operations and a nested optimization that
> requires quadratic programming which apparently neither Stata nor Mata
> provides).
> 
> I will take a very close at your solution. I guess this will already
> help me a lot. Yet, I think that the tabstat command should be
> appended by a simple matcell() option that allows to grab these type
> of tables into a matrix directly. This would be very fast why to
> create matrices of many forms. Alternatively, one could also store the
> whole tabstat table as a matrix in r() (and not just one r() for each
> row of the table).
> 
> In any case, thank you very much for your help.
> 
> Tom
> 
> 
> 
> On 9/14/06, Nick Cox <[email protected]> wrote:
> > Thanks for the recommendation. The reference mentioned
> > was a few issues back in the Stata Journal:
> >
> > SJ-5-4  pr0018  . . . . . . . . . . . . Suggestions on 
> Stata programming style
> >         . . . . . . . . . . . . . . . . . . . . . . . . . . 
> . . . .  N. J. Cox
> >         Q4/05   SJ 5(4):560--566                            
>      (no commands)
> >         suggestions for good Stata programming style
> >
> > You put your finger on a good point. Truth is, many Stata 
> programmers use
> > -preserve- all the time when they know it is the best 
> solution. With modern
> > machines and small to moderate datasets, the cost of 
> -preserve- and -restore-
> > can be trivial. But when is it the best solution?  The 
> advice is in that list
> > because I have witnessed rather a lot of beginners' 
> programs in which -preserve-
> > was unnecessary and indeed a nuisance. In your case, as you 
> are evidently using -collapse-
> > several times and want to keep relating results to data, it 
> probably is unnecessary.
> >
> > So, are there other solutions? I am going to guess that the 
> word "matrix"
> > is a red herring here. You do want something that can be 
> displayed as
> > a matrix, but it doesn't sound as if you want to do anything with it
> > qua matrix, as in invert it or get the eigenvalues. If 
> that's wrong, do
> > explain why.
> >
> > The most efficient way to get means is by calculating them directly:
> >
> > <your data are sorted appropriately, by virtue of -tsset->
> >
> > foreach v of local varlist {
> >         tempvar mean
> >         clonevar `mean' = `v'
> >         local means "`means' `mean'"
> >         by `pvar' : replace `mean' = sum(`v') / sum(`v' < .)
> >         by `pvar' : replace `mean' = `mean'[_N]
> > }
> >
> > tabdisp `by', c(`means')
> >
> > will then work nicely for a few variables. More generally,
> >
> > by `pvar' : gen byte `tag' = _n == 1
> > list `means' if `tag'
> >
> > will give a basic tabulation that can be beautified.
> >
> > Many Stata programmers would use -egen- for convenience, despite
> > its inefficiency. Note in addition to -egen, mean()- the -egen,
> > tag()- which can be used to display just one observation from
> > each group. You can also learn a lot by looking _inside_ the
> > -egen- functions as they exemplify various basic devices. Know
> > that the code for -egen, foo()- is in _gfoo.ado and can be
> > viewed using
> >
> > . viewsource _gfoo.ado
> >
> > A much longer email could be written comparing these and other
> > solutions to your problem, but I am throwing the baton in
> > the air for others to catch.
> >
> > Nick
> > [email protected]
> >
> > Tom Boonen
> >
> > > I am rather new to programming in stata and just read Nick Cox's
> > > "Suggestions on Stata programming style"  (which I can really
> > > recommand for newbies) in the last issue of the Stata Journal. He
> > > urges programmers to avoid "presere" if possible. This suggestions
> > > makes sense to me, but I am struggeling on how to avoid it.
> > >
> > > My program makes frequent use of the "collapse" command, which of
> > > course changes the user's data, so it has to be restored 
> each time. I
> > > wonder wheter there is an elegant way that obviates using 
> collapse()
> > > (or similarly statsby() which uses collapse()).
> > >
> > > Here is an example of my problem:
> > >
> > > use   http://www.stata-press.com/data/r9/invest2.dta, clear
> > > tsset company time
> > > local pvar "`r(panelvar)'"
> > >
> > > Assume my "varlist" contains variables like invest, 
> market and stock.
> > > The task is to create a matrix that contains for each 
> panel unit (i.e.
> > > company) the means of the varibales in `varlist' over the time
> > > periods. What my program does:
> > >
> > > qui collapse (mean) `varlist', by("`pvar'") fast
> > > qui mkmat `varlist', matrix(`X')
> > >
> > > That works well but I have to use "restore" now to go on. 
> The natural
> > > thing to me was to look for a command like:
> > >
> > > bys "`pvar'": sum `varlist' meansonly
> > >
> > > but the return list does not allow me to grab the means 
> by unit. So I
> > > thought how about:
> > >
> > > tabstat  `varlist',  by("`pvar'") statis(mean) save
> > >
> > > this gets me what i want on the screen, but the return list r()
> > > results only refer to each row of the summary table, not 
> the table as
> > > a whole. I could loop through these individual rows and 
> collect the
> > > matrix of course, but that takes long and seems 
> suboptimal (exp. when
> > > I have a lot of units).
> > >
> > > Rather what I am looking for is something like the table 
> , matcell()
> > > option but for tabstat, i.e. a command that grabs the 
> table displayed
> > > on the screen and puts it in a matrix.
> > >
> > > Any suggestions? One complexcification may be that apart 
> from getting
> > > the means over the time periods in other parts of my 
> program i need to
> > > apply several aggregation functions the sd, quantiles, etc. But I
> > > would be happy if I could for now just find a elegant 
> solution to get
> > > the means.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index