This example suggests various kinds of problems.
Whenever CarManufacturer is empty, you could
pull across the value from the second variable
like this:
replace CarManufacturer = CarModel if mi(CarManufacturer)
but that leaves e.g. "Ford Excursion" as both CarManufacturer
and CarModel, which replaces one problem by another.
I would try another way: concatenate all these into a single
variable, and then start again.
That is
gen Car = CarManufacturer + " " + CarModel + " " + CarEngine
or
egen Car = concat(CarManufacturer CarModel CarEngine), p(" ")
Then two simple clean-ups are to trim spaces
replace Car = trim(Car)
and perhaps to remove isolated periods
replace Car = subinstr(Car, " .", " ",.)
Now it starts getting serious. Two tools that might come
in handy are the -word()- function and the -split- command.
split Car
will -split- the variable into several, each containing
one "word".
tab Car1
or
levels Car1
will expose problems like "318" in obs 8
and the inconsistency between "Alfa" and "Alfra".
You are probably going to end up with a .do
file mixing all sorts of general and detailed
changes.
Nick
[email protected]
[email protected]
I have discovered errors in my dataset, and it seems some of my data
are recorded in the wrong variable. The variable the data should have
been recorded as, is left missing. A few examples: (Missing values
marked as "")
Record CarManufacturer CarModel
CarEngine
1 Ford Mustang
.
2 Chevrolet Starcraft
.
3 Ford Galaxy
.
4 Honda Civic
1.4 I S
5 Toyota Avensis
.
6 "" Ford
Excursion .
7 "" BMW 520 I
Touring 520 I
8 "" 318
BMW
9 BMW 320 I
320 I
10 Alfra Romeo Spider
.
11 "" Alfa Romeo
.
What I wish to do is to search for an expression in each record that
can also be observed as a distinct value in CarManufacturer, and then
replace it into CarManufacturer. I have failed in both creating tests
across records and on an attempt to fetch the unique values of
CarManufacturer into an object which I then can perform checks
against. But then again, I'm no seasoned veteran in this game.
Is there any way of pulling this off in Stata?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/