Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: AW: -longshape- available from SSC
From
"Kaulisch, Marc" <[email protected]>
To
<[email protected]>
Subject
st: AW: -longshape- available from SSC
Date
Sat, 8 Oct 2011 17:00:32 +0200
Hi Nick,
Thanks for this program, I also encountered the same problems more than once...
One additional comment: I can think of a use-case where -wideshape- might be a good complementary program (although I am not completely sure).
Example:
Wide-data reshaped into long-data
Creating new variables with long-data - for example: clusters after an optimal-matching-analysis (s. -sq- (from ssc)).
Long-data into wide-data in order to use the clusters for further analysis
Of course, I could do a merge of wide and long to add the new vars to the "old" wide data; but intuitively I would do a reshape...
Marc
-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Freitag, 7. Oktober 2011 19:45
An: [email protected]
Betreff: st: -longshape- available from SSC
Thanks to Kit Baum, a program -longshape- may now be downloaded from SSC. Stata 9.2 is required.
-longshape- is a wrapper for -reshape long- to fix a side-effect of -reshape long- that bites very occasionally. I've been a happy user of
-reshape- for many years until the problem bit me with a particular kind of data very recently.
To make this concrete, consider ecological data that include measurements of abundance for several taxa (usually but not necessarily species) at several sites. This is the result of a
-describe- for a small dataset of this kind from <http://www.cambridge.org/gb/knowledge/isbn/item5708032/>.
Contains data from dune.dta
obs: 20
vars: 36 10 Aug 2011 18:17
size: 860 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id byte %9.0g
achmil byte %8.0g Achillea millefolium
agrsto byte %8.0g Agrostis stolonifera
airpra byte %8.0g Aira praecox
alogen byte %8.0g Alopecurus geniculatus
antodo byte %8.0g Anthoxanthum odoratum
belper byte %8.0g Bellis perennis
brohor byte %8.0g Bromus hordaceus
chealb byte %8.0g Chenopodium album
cirarv byte %8.0g Cirsium arvense
elepal byte %8.0g Eleocharis palustris
elyrep byte %8.0g Elymus repens
empnig byte %8.0g Empetrum nigrum
hyprad byte %8.0g Hypochaeris radicata
junart byte %8.0g Juncus articulatus
junbuf byte %8.0g Juncus bufonius
leoaut byte %8.0g Leontodon autumnalis
lolper byte %8.0g Lolium perenne
plalan byte %8.0g Plantago lanceolata
poapra byte %8.0g Poa pratensis
poatri byte %8.0g Poa trivialis
potpal byte %8.0g Potentilla palustris
ranfla byte %8.0g Ranunculus flammula
rumace byte %8.0g Rumex acetosa
sagpro byte %8.0g Sagina procumbens
salrep byte %8.0g Salix repens
tripra byte %8.0g Trifolium pratense
trirep byte %8.0g Trifolium repens
viclat byte %8.0g Vicia lathyroides
brarut byte %8.0g Brachythecium rutabulum
calcus byte %8.0g Calliergonella cuspidata
A1 float %9.0g A1 horizon thickness (cm)
moisture byte %8.0g moisture class
management byte %8.0g management
management type
use byte %8.0g use grassland use
manure byte %8.0g manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: id
This is a natural data structure for recording data, but some analyses require a long structure of taxon X site. To -reshape- such data there is first a minor problem that the species names need a common prefix.
That is soluble e.g. with -renvars- (SJ) or the much improved -rename- in Stata 12.
. renvars achmil-calcus, prefix(y)
However, a much bigger problem is evident when we do -reshape-: the variable labels all disappear. They are really valuable detail, and typing them in all again does not appeal.
. reshape long y, i(id) j(species) string
(note: j = achmil agrsto airpra alogen antodo belper brarut brohor calcus chealb cirarv elepal elyrep empnig hyprad junart junbuf leoaut lol
> per plalan poapra poatri potpal ranfla rumace sagpro salrep tripra
> trirep viclat)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 20 -> 600
Number of variables 36 -> 8
j variable (30 values) -> species
xij variables:
yachmil yagrsto ... yviclat -> y
-----------------------------------------------------------------------------
Normally with a -reshape long-; that is immaterial, as the bundle of variables to be reshaped are something like -invest1975- to
-invest2005- and the variable labels, if there are any, don't carry any important information that is not otherwise available. The main aim of -longshape- is to carry the variable labels automatically. In fact, _two_ extra variables are created to give the best of both worlds, a new string variable with the original variable names and a new numeric variable whose value labels are the original variable labels. (Both can be useful for subsequent graphs and tables.)
. u dune, clear
. longshape achmil-calcus, i(id) j(species) y(abundance)
. d
Contains data
obs: 600
vars: 9
size: 12,600 (99.9% of memory free)
--------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------------------------------------------------
id byte %9.0g
species byte %24.0g species
_species str6 %9s
abundance byte %8.0g
A1 float %9.0g A1 horizon thickness (cm)
moisture byte %8.0g moisture class
management byte %8.0g management
management type
use byte %8.0g use grassland use
manure byte %8.0g manure class
--------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: id species
Note: dataset has changed since last saved
. tab species
species | Freq. Percent Cum.
-------------------------+-----------------------------------
Achillea millefolium | 20 3.33 3.33
Agrostis stolonifera | 20 3.33 6.67
Aira praecox | 20 3.33 10.00
Alopecurus geniculatus | 20 3.33 13.33
Anthoxanthum odoratum | 20 3.33 16.67
Bellis perennis | 20 3.33 20.00
Brachythecium rutabulum | 20 3.33 23.33
Bromus hordaceus | 20 3.33 26.67
Calliergonella cuspidata | 20 3.33 30.00
Chenopodium album | 20 3.33 33.33
Cirsium arvense | 20 3.33 36.67
Eleocharis palustris | 20 3.33 40.00
Elymus repens | 20 3.33 43.33
Empetrum nigrum | 20 3.33 46.67
Hypochaeris radicata | 20 3.33 50.00
Juncus articulatus | 20 3.33 53.33
Juncus bufonius | 20 3.33 56.67
Leontodon autumnalis | 20 3.33 60.00
Lolium perenne | 20 3.33 63.33
Plantago lanceolata | 20 3.33 66.67
Poa pratensis | 20 3.33 70.00
Poa trivialis | 20 3.33 73.33
Potentilla palustris | 20 3.33 76.67
Ranunculus flammula | 20 3.33 80.00
Rumex acetosa | 20 3.33 83.33
Sagina procumbens | 20 3.33 86.67
Salix repens | 20 3.33 90.00
Trifolium pratense | 20 3.33 93.33
Trifolium repens | 20 3.33 96.67
Vicia lathyroides | 20 3.33 100.00
-------------------------+-----------------------------------
Total | 600 100.00
That's it really, except that there may be a question: is there, or will there be, a -wideshape-? Yes and no. I wrote one as a test of the reversibility of this process, but I won't be making it public. It isn't useful independently unless you happen to use precisely the kind of structure that -longshape- produces, which is unlikely. Also,
-longshape- won't willingly perform unless you have -save-d your data, so unless you wilfully destroy the original dataset you should have no need to reverse the process.
Nick
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/