Hi,
I am new to Statalist, hope someone can help me!
I am working with a large dataset and have discovered that some of the data
are missing values or have erroneous values. The data is panel data with
observations per individual over a 5 year period. For example:
ID Year Cost
1 1 100
1 2 200
1 3 500
1 4 150
1 5 x
2 1 100
2 2 200
2 3 500
2 4 600
2 5 100
The problem is this: If an individual has a missing / erroneous value for a
particular year, I want to exclude ALL of their observations from the
dataset. In the example patient 1 would be removed from the dataset
entirely. How can this be done through an automated-type process?
Essentially I need a code / method that looks for the anomalous data;
identifies the patient and then removes all of their observations from the
dataset.
Hope you can help,
Murray Lowe