Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: avoid collapse and yet get matrix of unit specific means in panel data


From   "Tom Boonen" <[email protected]>
To   [email protected]
Subject   Re: st: RE: avoid collapse and yet get matrix of unit specific means in panel data
Date   Thu, 14 Sep 2006 12:40:57 -0400

Thank you for this detailed answer Nick Cox.

So, are there other solutions? I am going to guess that the word "matrix"
is a red herring here. You do want something that can be displayed as
a matrix, but it doesn't sound as if you want to do anything with it
qua matrix, as in invert it or get the eigenvalues. If that's wrong, do
explain why.
Unfortunately, I really meant to get the unit means in a "matrix", that is why:

tabstat  `varlist',  by("`pvar'") statis(mean) save

seemed so appealing to me. The funny thing is that this command
actually gets me exactly the matrix that I want on the screen, but I
cannot grab it and put it into a matrix directly (using something like
matcell()) without again going for a for loop.

The reason why I want a matrix is that I for my program I am only
using Stata as a preprocessors to create matrices from the data, which
I then will have to pass to a C plugin that implements my estimator
(which is based on matrix operations and a nested optimization that
requires quadratic programming which apparently neither Stata nor Mata
provides).

I will take a very close at your solution. I guess this will already
help me a lot. Yet, I think that the tabstat command should be
appended by a simple matcell() option that allows to grab these type
of tables into a matrix directly. This would be very fast why to
create matrices of many forms. Alternatively, one could also store the
whole tabstat table as a matrix in r() (and not just one r() for each
row of the table).

In any case, thank you very much for your help.

Tom



On 9/14/06, Nick Cox <[email protected]> wrote:
Thanks for the recommendation. The reference mentioned
was a few issues back in the Stata Journal:

SJ-5-4  pr0018  . . . . . . . . . . . . Suggestions on Stata programming style
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/05   SJ 5(4):560--566                                 (no commands)
        suggestions for good Stata programming style

You put your finger on a good point. Truth is, many Stata programmers use
-preserve- all the time when they know it is the best solution. With modern
machines and small to moderate datasets, the cost of -preserve- and -restore-
can be trivial. But when is it the best solution?  The advice is in that list
because I have witnessed rather a lot of beginners' programs in which -preserve-
was unnecessary and indeed a nuisance. In your case, as you are evidently using -collapse-
several times and want to keep relating results to data, it probably is unnecessary.

So, are there other solutions? I am going to guess that the word "matrix"
is a red herring here. You do want something that can be displayed as
a matrix, but it doesn't sound as if you want to do anything with it
qua matrix, as in invert it or get the eigenvalues. If that's wrong, do
explain why.

The most efficient way to get means is by calculating them directly:

<your data are sorted appropriately, by virtue of -tsset->

foreach v of local varlist {
        tempvar mean
        clonevar `mean' = `v'
        local means "`means' `mean'"
        by `pvar' : replace `mean' = sum(`v') / sum(`v' < .)
        by `pvar' : replace `mean' = `mean'[_N]
}

tabdisp `by', c(`means')

will then work nicely for a few variables. More generally,

by `pvar' : gen byte `tag' = _n == 1
list `means' if `tag'

will give a basic tabulation that can be beautified.

Many Stata programmers would use -egen- for convenience, despite
its inefficiency. Note in addition to -egen, mean()- the -egen,
tag()- which can be used to display just one observation from
each group. You can also learn a lot by looking _inside_ the
-egen- functions as they exemplify various basic devices. Know
that the code for -egen, foo()- is in _gfoo.ado and can be
viewed using

. viewsource _gfoo.ado

A much longer email could be written comparing these and other
solutions to your problem, but I am throwing the baton in
the air for others to catch.

Nick
[email protected]

Tom Boonen

> I am rather new to programming in stata and just read Nick Cox's
> "Suggestions on Stata programming style"  (which I can really
> recommand for newbies) in the last issue of the Stata Journal. He
> urges programmers to avoid "presere" if possible. This suggestions
> makes sense to me, but I am struggeling on how to avoid it.
>
> My program makes frequent use of the "collapse" command, which of
> course changes the user's data, so it has to be restored each time. I
> wonder wheter there is an elegant way that obviates using collapse()
> (or similarly statsby() which uses collapse()).
>
> Here is an example of my problem:
>
> use   http://www.stata-press.com/data/r9/invest2.dta, clear
> tsset company time
> local pvar "`r(panelvar)'"
>
> Assume my "varlist" contains variables like invest, market and stock.
> The task is to create a matrix that contains for each panel unit (i.e.
> company) the means of the varibales in `varlist' over the time
> periods. What my program does:
>
> qui collapse (mean) `varlist', by("`pvar'") fast
> qui mkmat `varlist', matrix(`X')
>
> That works well but I have to use "restore" now to go on. The natural
> thing to me was to look for a command like:
>
> bys "`pvar'": sum `varlist' meansonly
>
> but the return list does not allow me to grab the means by unit. So I
> thought how about:
>
> tabstat  `varlist',  by("`pvar'") statis(mean) save
>
> this gets me what i want on the screen, but the return list r()
> results only refer to each row of the summary table, not the table as
> a whole. I could loop through these individual rows and collect the
> matrix of course, but that takes long and seems suboptimal (exp. when
> I have a lot of units).
>
> Rather what I am looking for is something like the table , matcell()
> option but for tabstat, i.e. a command that grabs the table displayed
> on the screen and puts it in a matrix.
>
> Any suggestions? One complexcification may be that apart from getting
> the means over the time periods in other parts of my program i need to
> apply several aggregation functions the sd, quantiles, etc. But I
> would be happy if I could for now just find a elegant solution to get
> the means.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index