Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: dataset containing duplicate variables names

From   "Emma Slaymaker" <[email protected]>
To   <[email protected]>
Subject   st: dataset containing duplicate variables names
Date   Thu, 05 Apr 2007 14:06:53 +0100


I realise that this is not supposed to happen but I have a dataset
which has several variables with the same name.  Some students of mine
inadverdantly created a dataset like this and I have replicated it. 
Does anyone know how this can happen?

The problem arises when you export data with long variables names from
EpiData to Stata (using EpiData's export function) and set Stata 6 as
the output version.  Why Stata 6...well, this is the default on the
EpiData installation used by our students.  If you change the version to
7 or higher this problem doesn't occur.

EpiData apparently knows that Stata 6 variable names should be 8
characters or less and truncates the names of any variables that exceed
this limit but it doesn't then check that all names are unique.  

I can replicate the problem with a dataset that, in EpiData, has
variables called longname1 longname2 longname3 and longname4.  Once
exported to Stata all the variables are called longname yet still
contain their original data.  Although I can see the contents of all 4
variables in list or browse I can only summon the first variable for use
in command (see output below).

What surprised me is that Stata will open the dataset.  I assume that
the variable names we see and use are not what Stata uses to refer to
the variables but the mapping between my names and Stata's seems to have
gone very wrong!  



. use dataepi_export_tests2,clear
(Data file created by EpiData based on dataepi_export_tests.rec)

. desc

Contains data from dataepi_export_tests2.dta
  obs:            10                          Data file created by
                                                based on
 vars:             5                          05 Apr 2007 11:45
 size:           190 (99.9% of memory free)
              storage  display     value
variable name   type   format      label      variable label
id              int    %4.0f                  ID
longname        int    %4.0f                  LONGNAME4
longname        int    %4.0f                  LONGNAME4
longname        str1   %1s                    LONGNAME4
longname        double %16.0f                 LONGNAME4
Sorted by:  

. list

     | id   longname   longname   longname   longname |
  1. |  1          1          0          a          1 |
  2. |  2          0          1          b          6 |
  3. |  3          1          0          c          3 |
  4. |  4          0          1          d          2 |
  5. |  5          1          0          e          5 |
  6. |  6          0          1          f          7 |
  7. |  7          1          0          g         10 |
  8. |  8          0          1          h          4 |
  9. |  9          1          0          i         10 |
 10. | 10          0          1          j         11 |

. summ

    Variable |       Obs        Mean    Std. Dev.       Min        Max
          id |        10         5.5     3.02765          1         10
    longname |        10          .5    .5270463          0          1
    longname |        10          .5    .5270463          0          1
    longname |         0
    longname |        10    5.843498    3.586145    1.22553    11.0888

. tab longname longname

           |       LONGNAME4
 LONGNAME4 |         0          1 |     Total
         0 |         5          0 |         5 
         1 |         0          5 |         5 
     Total |         5          5 |        10 

*   For searches and help try:

© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index