Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -longshape- available from SSC
From
Nick Cox <[email protected]>
To
[email protected]
Subject
st: -longshape- available from SSC
Date
Fri, 7 Oct 2011 18:45:20 +0100
Thanks to Kit Baum, a program -longshape- may now be downloaded from
SSC. Stata 9.2 is required.
-longshape- is a wrapper for -reshape long- to fix a side-effect of
-reshape long- that bites very occasionally. I've been a happy user of
-reshape- for many years until the problem bit me with a particular
kind of data very recently.
To make this concrete, consider ecological data that include
measurements of abundance for several taxa (usually but not
necessarily species) at several sites. This is the result of a
-describe- for a small dataset of this kind from
<http://www.cambridge.org/gb/knowledge/isbn/item5708032/>.
Contains data from dune.dta
obs: 20
vars: 36 10 Aug 2011 18:17
size: 860 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id byte %9.0g
achmil byte %8.0g Achillea millefolium
agrsto byte %8.0g Agrostis stolonifera
airpra byte %8.0g Aira praecox
alogen byte %8.0g Alopecurus geniculatus
antodo byte %8.0g Anthoxanthum odoratum
belper byte %8.0g Bellis perennis
brohor byte %8.0g Bromus hordaceus
chealb byte %8.0g Chenopodium album
cirarv byte %8.0g Cirsium arvense
elepal byte %8.0g Eleocharis palustris
elyrep byte %8.0g Elymus repens
empnig byte %8.0g Empetrum nigrum
hyprad byte %8.0g Hypochaeris radicata
junart byte %8.0g Juncus articulatus
junbuf byte %8.0g Juncus bufonius
leoaut byte %8.0g Leontodon autumnalis
lolper byte %8.0g Lolium perenne
plalan byte %8.0g Plantago lanceolata
poapra byte %8.0g Poa pratensis
poatri byte %8.0g Poa trivialis
potpal byte %8.0g Potentilla palustris
ranfla byte %8.0g Ranunculus flammula
rumace byte %8.0g Rumex acetosa
sagpro byte %8.0g Sagina procumbens
salrep byte %8.0g Salix repens
tripra byte %8.0g Trifolium pratense
trirep byte %8.0g Trifolium repens
viclat byte %8.0g Vicia lathyroides
brarut byte %8.0g Brachythecium rutabulum
calcus byte %8.0g Calliergonella cuspidata
A1 float %9.0g A1 horizon thickness (cm)
moisture byte %8.0g moisture class
management byte %8.0g management
management type
use byte %8.0g use grassland use
manure byte %8.0g manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: id
This is a natural data structure for recording data, but some analyses
require a long structure of taxon X site. To -reshape- such data there
is first a minor problem that the species names need a common prefix.
That is soluble e.g. with -renvars- (SJ) or the much improved -rename-
in Stata 12.
. renvars achmil-calcus, prefix(y)
However, a much bigger problem is evident when we do -reshape-: the
variable labels all disappear. They are really valuable detail, and
typing them in all again does not appeal.
. reshape long y, i(id) j(species) string
(note: j = achmil agrsto airpra alogen antodo belper brarut brohor
calcus chealb cirarv elepal elyrep empnig hyprad junart junbuf leoaut
lol
> per plalan poapra poatri potpal ranfla rumace sagpro salrep tripra trirep viclat)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 20 -> 600
Number of variables 36 -> 8
j variable (30 values) -> species
xij variables:
yachmil yagrsto ... yviclat -> y
-----------------------------------------------------------------------------
Normally with a -reshape long-; that is immaterial, as the bundle of
variables to be reshaped are something like -invest1975- to
-invest2005- and the variable labels, if there are any, don't carry
any important information that is not otherwise available. The main
aim of -longshape- is to carry the variable labels automatically. In
fact, _two_ extra variables are created to give the best of both
worlds, a new string variable with the original variable names and a
new numeric variable whose value labels are the original variable
labels. (Both can be useful for subsequent graphs and tables.)
. u dune, clear
. longshape achmil-calcus, i(id) j(species) y(abundance)
. d
Contains data
obs: 600
vars: 9
size: 12,600 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id byte %9.0g
species byte %24.0g species
_species str6 %9s
abundance byte %8.0g
A1 float %9.0g A1 horizon thickness (cm)
moisture byte %8.0g moisture class
management byte %8.0g management
management type
use byte %8.0g use grassland use
manure byte %8.0g manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: id species
Note: dataset has changed since last saved
. tab species
species | Freq. Percent Cum.
-------------------------+-----------------------------------
Achillea millefolium | 20 3.33 3.33
Agrostis stolonifera | 20 3.33 6.67
Aira praecox | 20 3.33 10.00
Alopecurus geniculatus | 20 3.33 13.33
Anthoxanthum odoratum | 20 3.33 16.67
Bellis perennis | 20 3.33 20.00
Brachythecium rutabulum | 20 3.33 23.33
Bromus hordaceus | 20 3.33 26.67
Calliergonella cuspidata | 20 3.33 30.00
Chenopodium album | 20 3.33 33.33
Cirsium arvense | 20 3.33 36.67
Eleocharis palustris | 20 3.33 40.00
Elymus repens | 20 3.33 43.33
Empetrum nigrum | 20 3.33 46.67
Hypochaeris radicata | 20 3.33 50.00
Juncus articulatus | 20 3.33 53.33
Juncus bufonius | 20 3.33 56.67
Leontodon autumnalis | 20 3.33 60.00
Lolium perenne | 20 3.33 63.33
Plantago lanceolata | 20 3.33 66.67
Poa pratensis | 20 3.33 70.00
Poa trivialis | 20 3.33 73.33
Potentilla palustris | 20 3.33 76.67
Ranunculus flammula | 20 3.33 80.00
Rumex acetosa | 20 3.33 83.33
Sagina procumbens | 20 3.33 86.67
Salix repens | 20 3.33 90.00
Trifolium pratense | 20 3.33 93.33
Trifolium repens | 20 3.33 96.67
Vicia lathyroides | 20 3.33 100.00
-------------------------+-----------------------------------
Total | 600 100.00
That's it really, except that there may be a question: is there, or
will there be, a -wideshape-? Yes and no. I wrote one as a test of the
reversibility of this process, but I won't be making it public. It
isn't useful independently unless you happen to use precisely the kind
of structure that -longshape- produces, which is unlikely. Also,
-longshape- won't willingly perform unless you have -save-d your data,
so unless you wilfully destroy the original dataset you should have no
need to reverse the process.
Nick
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/