Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: R: st: problem of data management


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: R: st: problem of data management
Date   Mon, 15 Apr 2013 10:04:24 +0100

Same answer from me. From what you tell us, a wide structure would not
make most conceivable analyses easier.

Your example included

ID     GENDER    AGE       TIME
861    woman     50-54     1
861    woman     50-54     1

so that on that information duplicates could be inferred.

I don't think your example included enough detail for us to advise
well on how to create a time (meaning date) variable. Presumably that
depends on other variables that you don't show us. Much depends on
whether you want the date variable to make sense across identifiers.

Nick
[email protected]


On 15 April 2013 09:33, Iodice Federico <[email protected]> wrote:
> Sorry, probably I was not clear in the description of my data management
>  problem,
>
>  the TIME column in the example below is not a time variable, it identifies
>  instead the type of working contract an individual has got: 1 is assigned to
>  Part-time contracts and 2 is assigned to full-time contracts.
>
>  The ID column doesn't have duplicates, but different observations of the
>  same ID in different moments.
>
>  In the ID variable, individuals have several occurrences, but the time when
>  the individual is observed again and again has no regularity, are not
>  defined a priori. We are observing individuals whenever they find a new job.
>  Sometimes, they have many short employment spells. Quite often they find a
>  permanent job and then we do not observe them anymore.
>
>  For that reasons, I thought that it could have been a better solution to
>  transform the dataset in wide format. This would mean dealing with it not as
>  a panel data set, but rather as a cross-section with a longitudinal
>  dimension.
>
>  The alternative hypothesis would be to maintain the long format that the
>  data set naturally has but since the ID variable is repeated only for some
>  cases, for this reason I need to create a time variable to assign a sequence
>  to successive observations of the same individual. This would be important
>  to implement the xtset command. In other words, to maintain the long format
>  and use panel data analysis, I need a command that assigns a
>  growing numerical value to the new time variable. It should be done
>  automatically every time the same individual is observed again. We have
>  several thousands of observations.

>> > Date: Tue, 9 Apr 2013 11:59:24 +0100
>> > From: [email protected]
>> >
>> > You can -xtset- your data using
>> >
>> > xtset id
>> >
>> > but your example indicates that you have duplicates on -id time-, so
>> >
>> > xtset id time
>> >
>> > would fail. I don't think you are telling us enough for it to be clear
>> > whether you just have -duplicates- (see the command of that name) that
>> > should be remove.ì
>> >
>> > I can't see any advantages to your wide data structure (Example 2).
>> > (Many people do call this a format.)
>> >
>> > Nick
>> > [email protected]
>> >
>> >
>> > On 9 April 2013 11:51, Iodice Federico <[email protected]> wrote:
>> >
>> >> I have a problem of data management that I would like to submit to your
>> attention. I’ve an unbalanced panel databank. As you can see from the
>> example below, the variable in column 1 (the ID variable) is repeated only
>> for some cases. In other words, I have the same individual who is repeated
>> several times.
>> >>
>> >> Example 1
>> >>
>> >> ID GENDER AGE TIME
>> >> 861 woman 50-54 1
>> >> 848 woman 25-29 1
>> >> 820 man 50-54 1
>> >> 861 woman 50-54 1
>> >> 820 man 50-54 1
>> >> 860 woman 50-54 1
>> >> 860 woman 50-54 2
>> >> 860 woman 50-54 1
>> >>
>> >> This happens only for some, but not all individuals in the sample. It
>> means that probably the best way of dealing with this dataset is to use it
>> not as a panel, but as a longitudinal data set with repeated observations.
>> The observations that do not repeat themselves, I can treat as staying in
>> the same status.
>> >> In order to use this information, my impression is that I need either:
>> >>
>> >> a) to tell to Stata that this is a panel and treat the data as if it
>> >> were a “long format”; If case a) is the best one, the data is in the long
>> format and I need only to tell to Stata that the same observation is
>> repeated for different periods. Nonetheless, these periods are not fixed.
>> There can be any length, from one week to several years. how to tell to
>> stata that the same observation is repeated several times? How to define the
>> time dimension?
>> >>
>> >> b) or to treat the data as a cross-section with repeated observations. In
>> this case, I need to move the rows that are repeated to the to shift
>> automatically the entire repeated line to the right of the first line in
>> which the variable y appears. An example of case b) is below. My question
>> is: how to move the entire row besides the one where that observation is
>> already defined?
>> >>
>> >> Example 2
>> >>
>> >> ID GENDER AGE TIME ID GENDER AGE TIME ID GENDER AGE TIME
>> >> 861 woman 50-54 1 861 woman 50-54 1
>> >> 848 woman 25-29 1
>> >> 820 man 50-54 1 820 man 50-54 1
>> >> 860 woman 50-54 1 860 woman 50-54 2 860 woman 50-54 1

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index