Let's underline that this can all be done with strings. There is no need to resort to -encode- or otherwise to convert to numeric.
Missing, i.e. empty, strings sort first. Thus after -input- and -trim()-, Martin's code can be slimmed to
bys year Prof (Uni) : replace Uni = Uni[_N] if missing(Uni)
-- without any need for an extra variable.
However, there is no check here for different non-missing values within groups of -year Prof-.
In the same territory, note that -egen, mode()- takes string arguments as well as numeric, so can be used for imputation. However, the direct route that Martin exemplifies has many advantages.
Nick
[email protected]
Martin Weiss
*************
clear*
inp year str10(Uni Prof)
1990 Harvard " S Smith"
1990 "" "S Smith"
1990 UCLA "P Williams"
1990 Yale " K John"
1991 "" "K Evert"
1991 Oxford "K Evert"
1991 "" "K Evert"
end
replace Uni=trim(Uni)
replace Prof=trim(Prof)
compress
gen byte nonmiss=!mi(Uni)
//replace with last obs
bys year Prof (nonmiss): /*
*/ replace Uni=Uni[_N] /*
*/ if nonmiss==0
l, noo sepby(year Prof)
*************
joe j
Thanks. (Your suggestion helped me create a variable that takes a
numeric value, instead of the university name; this is definitely an
improvement.)
This is how the data looks like:
Year University Professor
1990 Harvard S Smith
1990 --------- S Smith
1990 UCLA P Williams
1990 Yale K John
1991 --------- K Evert
1991 Oxford K Evert
What I want is to replace the missing names above, in 1990 with
Harvard and in 1991 with Oxford.
On Thu, Oct 8, 2009 at 11:59 AM, Martin Weiss <[email protected]>
> You should turn the string into a numeric variable via -encode-. Then
-egen-
> can go to work. Also provide an excerpt of your data and show what you
want
> to happen to them...
joe j
> In my data I have a string variable "University", which lists
> university names. In some years the names are missing. Two other
> variables I've are "Professor" and "Year". The same "Professor" and
> "University" can occur multiple times in a year.
>
> The problem I have is that there are quite a few University names that
> are missing. What I want to do is to replace as many missing
> University names as possible, by assuming that: when a professor is
> linked to a university at least once in a year, she is linked to the
> same university during that year - so the missing university name when
> her name occurs again in the same year can be replaced (why there are
> missing university names is a complicated story:)).
> I tried the following in Stata (it's foolish, I know):
>
> bysort year professor: egen University_all=mean(University)
>
> But I get the warning "type mismatch".
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/