|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: re: complex data cleaning issue (well, complex for me)
Stephen said
For each employee, I have employee number (e.g., 109123) the start
date (e.g., 07Aug07), start day (e.g., Monday) , end date (e.g.,
09Aug07), end day (e.g., Wednesday), number of hours (e.g., 21), and
number of days (e.g., 3). There is other info as well. Each case of
absenteeism is on one line, with some employees being represented on
one line only (only one recorded absenteeism entry), and other
employees with multiple records of absenteeism, for example being
absent on several different days across the year.
Unfortunately, for many records, there are then 'corrections'. (A
correction could occur for several reasons, such as that entry no
longer being regarded as absenteeism but sick leave). A correction
shows up as a re-entry of that data with negative values for hours
and days. Below is an example of the original entry, plus a correction.
(snip)
I don't see anything complex here. If you -collapse- on employee#,
start date and end date, computing sums of the hours and days field,
everything should work properly.
In Example A, you will get hours = days = 0, and you can screen out
records that have hours = days = 0
In Example B, the first two records will negate each other, and you
will be left with the third.
In Example C, you will again be left with hours = days = 0.
I don't think you need any -duplicates- logic to handle this.
You also don't want to do any reshaping. You can usie mofd() to
generate a month variable, and then just -collapse- on that variable,
generating sums, to make what you have into a monthly time series per
employee. Just two applications of -collapse-. In fact you could
probably get by with one.
One possible wrinkle: the start and end dates might span a month-end,
so you might need to be somewhat canny about that in terms of
classifying each record as belonging to a particular month.
Kit
Kit Baum, Boston College Economics and DIW Berlin
http://ideas.repec.org/e/pba1.html
An Introduction to Modern Econometrics Using Stata:
http://www.stata-press.com/books/imeus.html
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/