Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: imputing dates into a string date
From
Sergiy Radyakin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: imputing dates into a string date
Date
Fri, 7 Jun 2013 15:47:36 -0400
the discussion so far sounds quite confusing, and I still see no
reason to use regular expressions instead of much more intuitive
-replace-.
Here is my approach (type in Stata and not in the browser):
net from http://www.adeptanalytics.org/radyakin/stata/cleandate/
Best, Sergiy.
. clear
.
. input str10 dirty
dirty
1. "//"
2. "14//"
3. "01/xx/2001"
4. "xx/01/2001"
5. "01/01/xxxx"
6. end
.
. list
+------------+
| dirty |
|------------|
1. | // |
2. | 14// |
3. | 01/xx/2001 |
4. | xx/01/2001 |
5. | 01/01/xxxx |
+------------+
. cleandate dirty, gen(clean) d(15) m(06) y(2012) sep("/")
. list
+-------------------------+
| dirty clean |
|-------------------------|
1. | // 15/06/2012 |
2. | 14// 14/06/2012 |
3. | 01/xx/2001 01/06/2001 |
4. | xx/01/2001 15/01/2001 |
5. | 01/01/xxxx 01/01/2012 |
+-------------------------+
On Fri, Jun 7, 2013 at 8:48 AM, Tim Evans <[email protected]> wrote:
> Nick, Joseph,
>
> Thank you for the comments - and I readily acknowledge the potential bias of imputing dates. The Stata tip is potentially one avenue for me to explore - but unfortunately I don't have access to that.
>
> Best wishes
>
> Tim
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: 07 June 2013 13:12
> To: [email protected]
> Subject: Re: st: imputing dates into a string date
>
> Just to spell out what will be obvious to Joseph and Tim: Sometimes other dates provide constraints on what the dates might be.
> Nick
> [email protected]
>
>
> On 7 June 2013 12:51, Joseph Coveney <[email protected]> wrote:
>> Tim Evans wrote:
>>
>> Some time ago I had a problem with imputing dates into a string
>> variable where the date took the form:
>>
>> XX/01/2012
>>
>> In the thread below a solution was provided which worked great,
>> however, I now have data takes the form:
>>
>> /01/2012
>>
>> To this, I would like to impute a day of "1", but having tried to
>> amend the original code below
>>
>> g dx_clean = subinstr(dx, "XX", "01", 1)
>>
>> to
>>
>> g dx_clean = subinstr(dx, "", "01", 1)
>>
>> The result is that I return the same value i.e.
>> XX/01/2012
>>
>> Does anyone have a suggestion of how I can handle this please?
>>
>> ----------------------------------------------------------------------
>> ----------
>>
>> If you've got missing elements other than just the day, it might be
>> better to use -split-, and impute the days, months and years
>> separately with their different defaults. You can then re-assemble
>> the elements with simple string concatenation (or convert the imputed dates to a Stata date).
>>
>> Joseph Coveney
>>
>> . version 12.1
>>
>> .
>> . clear *
>>
>> . set more off
>>
>> .
>> . input str10 dx
>>
>> dx
>> 1. "01//2001"
>> 2. "/01/2001"
>> 3. "01/01/"
>> 4. end
>>
>> .
>> . split dx, generate(d_) parse(/)
>> variables created as string:
>> d_1 d_2 d_3
>>
>> . replace d_1 = "15" if missing(d_1) // Missing days as approx.
>> midmonth
>> (1 real change made)
>>
>> . replace d_2 = "06" if missing(d_2) // Missing months as approx.
>> midyear
>> (1 real change made)
>>
>> . replace d_3 = "2012" if missing(d_3) // Missing year as most recent
>> full year
>> (1 real change made)
>>
>> .
>> . generate int imputed_dt = date(d_3 + d_2 + d_1, "YMD")
>>
>> . format imputed_dt %tdCCYY-NN-DD
>>
>> .
>> . generate str10 clean_dx = d_1 + "/" + d_2 + "/" + d_3
>>
>> . list dx clean_dx imputed_dt, noobs abbreviate(20)
>>
>> +------------------------------------+
>> | dx clean_dx imputed_dt |
>> |------------------------------------|
>> | 01//2001 01/06/2001 2001-06-01 |
>> | /01/2001 15/01/2001 2001-01-15 |
>> | 01/01/ 01/01/2012 2012-01-01 |
>> +------------------------------------+
>>
>> .
>> . exit
>>
>> end of do-file
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> _DISCLAIMER:
> This email and any files transmitted with it are confidential. If you are not the intended recipient, any reading, printing, storage, disclosure, copying or any other action taken in respect of this e-mail is prohibited and may be unlawful. If you are not the intended recipient, please notify the sender immediately by using the reply function and then permanently delete what you have received.
>
> The information contained in this e-mail may be subject to public disclosure under the Freedom of Information Act 2000. The confidentiality of this e-mail and your reply cannot be guaranteed, unless the information is legally exempt from disclosure.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/