Sometimes, datasets contain data that look like dates but are actually stored as strings. We must convert these string dates to actual date variables before we can use them to calculate things like age. Fortunately, Stata's date() function makes this easy.
Let's begin by opening and describing an example dataset from the Stata website.
. use https://www.stata.com/users/youtube/rawdata.dta, clear (Fictitious data based on the National Health and Nutrition Examination Survey) . describe Contains data from https://www.stata.com/users/youtube/rawdata.dta Observations: 1,268 Fictitious data based on the National Health and Nutrition Examination Survey Variables: 10 6 Jul 2016 11:17 (_dta has notes)
Variable Storage Display Value name type format label Variable label |
id str6 %9s Identification Number age byte %9.0g sex byte %9.0g Sex race str5 %9s Race height float %9.0g height (cm) weight float %9.0g weight (kg) sbp int %9.0g Systolic blood pressure (mm/Hg) dbp int %9.0g Diastolic blood pressure (mm/Hg) chol str3 %9s serum cholesterol (mg/dL) dob str18 %18s |
The last variable in the description is named dob, and it does not have a variable label. The Storage type column tells us that dob is stored as an 18-character string variable. Let's list the first five observations to see what it contains.
. list dob in 1/5
dob | |
1. | |
2. | |
3. | August 24, 1943 |
4. | May 08, 1954 |
5. | June 25, 1950 |
The first two observations are empty, but observations three through five are dates. We might guess that the variable name dob stands for date of birth. We could calculate each person's age as of a particular date, but we can't because the dates are stored as strings.
We can convert string dates to date data using Stata's date() function along with the generate command
. generate daten = date(dob, "MDY") (2 missing values generated)
The first argument in the date() function is the variable that contains the date stored as a string. The second argument is the order of the month, day, and year specified with the letters M, D, and Y in double quotes. Our string dates list the month first, the day second, and the year third. So the second argument in our date() function is MDY. Note that the string dates could have looked like "8/24/43" and the date() function would work the same. Let's list the first five observations to check our work.
. list dob daten in 1/5
dob daten | |
1. | . |
2. | . |
3. | August 24, 1943 -5974 |
4. | May 08, 1954 -2064 |
5. | June 25, 1950 -3477 |
The data in daten do not look like dates. They look like unfamiliar integers. This is because the numbers represent the number of days before or after January 1, 1960. So the third observation for daten tells us that August 24, 1943, is 5974 days before January 1, 1960. That is useful information to Stata, but it is difficult for us to interpret dates in the form "-5974". We can make the date data easier to read by formatting them.
. format %tdMonth_DD,_CCYY daten
The syntax of format may look strange at first, but you can watch the video linked below to see how to format dates using Stata's graphical user interface (GUI). Or you can type help datetime to learn more about dates and their formats.
Let's list the first five observations.
. list dob daten in 1/5
dob daten | |
1. | . |
2. | . |
3. | August 24, 1943 August 24, 1943 |
4. | May 08, 1954 May 08, 1954 |
5. | June 25, 1950 June 25, 1950 |
Now our date data looks like our string date data. But we can use our date data to calculate things like age as of January 1, 2000. The example below uses Stata's age() function to calculate age.
. generate age2000 = age(daten, date("1/1/2000", "MDY")) (2 missing values generated) . list dob daten age2000 in 1/5
dob daten age2000 | |
1. | . . |
2. | . . |
3. | August 24, 1943 August 24, 1943 56 |
4. | May 08, 1954 May 08, 1954 45 |
5. | June 25, 1950 June 25, 1950 49 |
You can watch a demonstration of these commands by clicking on the link to the YouTube video below. You can read more about these commands by clicking on the links to the Stata manual entries below.
Watch Data management: How to create a date variable from a date stored as a string.
Read more in the Stata Data Management Reference Manual; see [D] describe, [D] format, [D] generate, and [D] list. In the Stata Functions Reference Manual, see [F] date() and [F] age(). And in the Stata User’s Guide, see [U] dates.