On Mar 19, 2009, at 4:13 PM, Ekaterina Hertog wrote:
I have got a dataset which contain dates of birth for individuals and
these dates of birth look as follows: 19560413 and I am trying to
turn them into date variables Stata can recognise.
To explore this issue, let's first create a simple toy dataset:
// Begin part 1 of example
input date_of_birth
19560413
19601223
19550721
19700105
end
list
// End part 1 of example
It is a numeric variable and I have turned it into string.
OK, but we can roll that step into those listed below, rather than
create an extra variable that you likely won't need later anyway.
The problem is that the following approach:
gen birth_date = date(strbirth_date, "DMY")
format birth_date %td
does not work I just get missing values. Presumably that is because
my date variable is not in the order: day - month - year, but rather
year - month - day.
So then do not tell Stata to use the wrong order! Consider:
// Begin part 2 of example
gen birth_date = date(string(date_of_birth,"%8.0f"),"YMD")
format birth_date %td
list
// End part 2 of example
Notice the use of "YMD" -- the order in while the date elements appear
-- rather than "DMY". This is alluded to in -help dates_and_times-
when the "mask" of the -date()- function is mentioned; since only one
example ("MDY") is given for -date()-, one might be forgiven for
thinking that other masks are not possible. Yet your attempted mask
doesn't match the example in the help file... nor is it appropriate
for your data.
I then thought I would redo the variable into a correct order and
first tried to create 3 separate string variables out of each date:
one for year, one for month and one for day.
I tried to do it as follows:
generate strbirth_date= string(date_of_birth, "%08.0f")
gen yob = substr(strbirth_date,1,4)
gen mob = substr(strbirth_date,5,6)
gen dob = substr(strbirth_date,7,8)
As a result 19560413 turned into: yob=1956
mob=0413
dob=13
I do not understand why did the month of birth (mob) did not
transform correctly and what can I do next.
Perhaps you thought Stata was Excel, or some other program(ming
language) in which you specify the starting and ending characters for
your substring extraction? But in -help string_functions-, it is
clearly explained that the first number (n1) in -substr(s, n1, n2)- is
the position from the start of the string, but the second number (n2)
is the *length* of the substring. Hence, the correct way to extract
the date elements you want is:
// Begin part 3 of example
gen yob = substr(string(date_of_birth,"%8.0f"),1,4)
gen mob = substr(string(date_of_birth,"%8.0f"),5,2)
gen dob = substr(string(date_of_birth,"%8.0f"),7,2)
list
// End part 3 of example
I would be very grateful for any advice as to how I can turn my date
variable into a variable Stata10 can recognise,
The date (and string) functions in Stata are powerful, so they are
worth learning. However, to use them correctly, there really is no
substitute for reading the help files (or printed manuals) carefully.
Hope this helps,
Mike
P.S. The specification of the mask for the -date()- function has
changed from lower case in Stata 9 and earlier to upper case in Stata
10 (and, I suspect, later). This can cause older programs that use
the -date()- function, originally written for earlier versions of
Stata, to misbehave or outright fail when run with Stata 10. A
-version 9- command at the start of the program should remedy that
situation, although I haven't checked.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/