| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: dataset containing duplicate variables names
Hello,
I realise that this is not supposed to happen but I have a dataset
which has several variables with the same name. Some students of mine
inadverdantly created a dataset like this and I have replicated it.
Does anyone know how this can happen?
The problem arises when you export data with long variables names from
EpiData to Stata (using EpiData's export function) and set Stata 6 as
the output version. Why Stata 6...well, this is the default on the
EpiData installation used by our students. If you change the version to
7 or higher this problem doesn't occur.
EpiData apparently knows that Stata 6 variable names should be 8
characters or less and truncates the names of any variables that exceed
this limit but it doesn't then check that all names are unique.
I can replicate the problem with a dataset that, in EpiData, has
variables called longname1 longname2 longname3 and longname4. Once
exported to Stata all the variables are called longname yet still
contain their original data. Although I can see the contents of all 4
variables in list or browse I can only summon the first variable for use
in command (see output below).
What surprised me is that Stata will open the dataset. I assume that
the variable names we see and use are not what Stata uses to refer to
the variables but the mapping between my names and Stata's seems to have
gone very wrong!
Cheers,
Emma
. use dataepi_export_tests2,clear
(Data file created by EpiData based on dataepi_export_tests.rec)
. desc
Contains data from dataepi_export_tests2.dta
obs: 10 Data file created by
EpiData
based on
dataepi_export_tests.rec
vars: 5 05 Apr 2007 11:45
size: 190 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id int %4.0f ID
longname int %4.0f LONGNAME4
longname int %4.0f LONGNAME4
longname str1 %1s LONGNAME4
longname double %16.0f LONGNAME4
-------------------------------------------------------------------------------
Sorted by:
. list
+------------------------------------------------+
| id longname longname longname longname |
|------------------------------------------------|
1. | 1 1 0 a 1 |
2. | 2 0 1 b 6 |
3. | 3 1 0 c 3 |
4. | 4 0 1 d 2 |
5. | 5 1 0 e 5 |
|------------------------------------------------|
6. | 6 0 1 f 7 |
7. | 7 1 0 g 10 |
8. | 8 0 1 h 4 |
9. | 9 1 0 i 10 |
10. | 10 0 1 j 11 |
+------------------------------------------------+
. summ
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
id | 10 5.5 3.02765 1 10
longname | 10 .5 .5270463 0 1
longname | 10 .5 .5270463 0 1
longname | 0
longname | 10 5.843498 3.586145 1.22553 11.0888
. tab longname longname
| LONGNAME4
LONGNAME4 | 0 1 | Total
-----------+----------------------+----------
0 | 5 0 | 5
1 | 0 5 | 5
-----------+----------------------+----------
Total | 5 5 | 10
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/