Stata | FAQ: Dealing with reports of repeated time values within panel

Home / Resources & support / FAQs / Dealing with reports of repeated time values within panel

How do I deal with a report of repeated time values within panel?

Title		Dealing with reports of repeated time values within panel
Author		Nicholas J. Cox, Durham University, UK Michael Mulcahy, University of Connecticut

Question

I have panel data. I want to exploit the power of xtset (see [XT] xtset), but when I type

            . xtset id time

I get a report of

            repeated time values within panel
            r(451);

What should I do next?

Answer

Panel data are defined by an identifier variable and a time variable. Each combination of identifier and time should occur, at most, once. That is, any such combination might appear either once or not at all, as gaps are allowed in panel data. Thus, the report of "repeated time values within panel" is serious, because Stata is unable to proceed with any commands that depend upon your data being accepted as panel data.

Two common reactions to this report are to suppose that it cannot be true, as you know you have panel data, or that there must be a bug or at least a misunderstanding here. In our experience, the misunderstanding will, on closer inspection, be found embedded in the dataset. Here we discuss various methods for approaching the problem. The underlying idea is that knowing several ways of going further is much better than knowing none. All the methods discussed are also applicable to other problems.

1. Do identifier and time uniquely identify the data?

Observations in panel data are uniquely identified by the combination of identifier and year. Thus isid may be used to check for this, for example,

     . isid id time

With isid, no news is good news. However, if the variables specified do not jointly identify the data, an error message will appear.

The logic of isid may be implemented in other ways. At its heart is the operation

     . bysort id time: assert _N == 1

asserting that each combination of identifier and time is unique. Again, with assert no news is good news. If the statement asserted is not true everywhere that it is tested, an error message will ensue.

2. Check for duplicates

If you have received confirmation of a problem, the next step is to track it down. With a very small dataset, a list or edit of the data may be sufficient, but, even then, a more systematic approach is preferable. Here is what we did in a specific example using the duplicates command, which is a small bundle of tools for investigating possible problems arising from duplicated observations.

The dataset consists of several variables for various cities and years, with identifier id and time variable year. The number of values is 7,813, large enough for a visual scan of the data to be a poor solution. The subcommand duplicates report quantifies the extent of the problem, 26 pairs of values of id and year. The subcommand duplicates list finds that they involve id 467. The subcommand duplicates tag is used to tag the observations to examine more closely. An edit then gives all the details.

 . duplicates report id year

 Duplicates in terms of id year

 --------------------------------------
    copies | observations       surplus
 ----------+---------------------------
         1 |         7787             0
         2 |           26            13
 --------------------------------------

 . duplicates list id year

 Duplicates in terms of id year

   +----------------------------+
   | group:   obs:    id   year |
   |----------------------------|
   |      1   6059   467   1990 |
   |      1   6060   467   1990 |
   |      2   6061   467   1991 |
   |      2   6062   467   1991 |
   |      3   6063   467   1992 |
   |----------------------------|
   |      3   6064   467   1992 |
   |      4   6065   467   1993 |
   |      4   6066   467   1993 |
   |      5   6067   467   1994 |
   |      5   6068   467   1994 |
   |----------------------------|
   |      6   6069   467   1995 |
   |      6   6070   467   1995 |
   |      7   6071   467   1996 |
   |      7   6072   467   1996 |
   |      8   6073   467   1997 |
   |----------------------------|
   |      8   6074   467   1997 |
   |      9   6075   467   1998 |
   |      9   6076   467   1998 |
   |     10   6077   467   1999 |
   |     10   6078   467   1999 |
   |----------------------------|
   |     11   6079   467   2000 |
   |     11   6080   467   2000 |
   |     12   6081   467   2001 |
   |     12   6082   467   2001 |
   |     13   6083   467   2002 |
   |----------------------------|
   |     13   6084   467   2002 |
   +----------------------------+

 . duplicates tag id year, gen(isdup) 

 Duplicates in terms of id year

 . edit if isdup

 . drop isdup

The final edit command reveals the precise problem: two cities, Royal Oak, MI, and Bristol, CT, have been assigned the same identifier. We need to fix that by changing the identifier of one city to something else.

Not all these steps are essential. Some users omit the report. On the other hand, in a large dataset, the list could be lengthy. Either way, duplicates offers various handles for the problem.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do I deal with a report of repeated time values within panel?

Question

Answer

1. Do identifier and time uniquely identify the data?

2. Check for duplicates

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do I deal with a report of repeated time values within panel?

Question

Answer

1. Do identifier and time uniquely identify the data?

2. Check for duplicates

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies