Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: problem using -clock- with military time
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: problem using -clock- with military time
Date
Sat, 2 Jun 2012 09:30:21 +0100
There are many other ways of tackling this problem. Here are a few
more comments. Others should be able to suggest yet more.
The question was posed as one of inserting a "0" after the space
whenever the second part of the date is too short, i.e. three digits
not four.
That means we should focus on identifying the space and inserting the
"0", which in Stata just means changing " " to " 0", as there isn't an
"insert in string" function. (There isn't a "delete from string"
function, either: both can be just special cases of -subinstr()-.)
The assumption is that there should be precisely one space.
replace Arrive = trim(itrim(Arrive))
does our best to make that so. -trim()- removes any leading or
trailing spaces, while -itrim()- reduces all multiple internal spaces
to single spaces. That -itrim()- didn't appear in the previous
posting. I feel comfortable with making any such changes as they can't
affect the meaning of a date string. Those concerned with absolute
data integrity should work with a copy of the original variable.
We should check that there is precisely one space. After what we have
just done, and in any case, that would mean that there are precisely
two words. In Stata, words are whatever are separated by spaces
(except that " " and `" "' bind tighter than spaces separate), so
"frog toad" are two words, and so are "123 456" and "2011/04/06 1630".
Stata has a -wordcount()- function, so we can go
assert wordcount(Arrived) == 2
asserts that that is so, and you will get an error message if it
isn't. (The principle is, very much, "No news is good news", but if
there is bad news, there are fixes needed.) Many Stata beginners would
do here something like this
gen nwords = wordcount(Arrived)
tab nwords
but for problems like this you don't need a new variable and you can
insist Stata does the checking. (Conversely, there are more open-ended
problems in which looking at the patterns shown by the table is
exactly the right thing to do.) As there can be only two words
replace Arrived = subinstr(Arrived, " ", " 0", 1) if
length(word(Arrived, 2)) == 3
is an alternative to what was posted previously.
Another way to think about it is that it appears that there are two
kinds of date, long and short, so we could work with
-length(Arrived)-, which should be 15 or 14. For problems like this, I
tend to copy and paste examples and feed them to -display-, as in
. di length("2011/04/06 1630")
15
because Stata is better at counting than I am. So -if length(Arrived)
== 14- identifies short dates that need fixing.
Nick
On Sat, Jun 2, 2012 at 12:00 AM, Nick Cox <[email protected]> wrote:
> clear
> input str15 ArrivedOnPCU
> "2011/04/06 1630"
> "2010/07/18 700"
> "2011/09/06 400"
> "2011/06/23 130"
> end
> replace Arrived = trim(Arrived)
> replace Arrived = subinstr(Arrived, " ", " 0", 1) if
> length(word(Arrived, -1)) == 3
> list
>
> This example boosts my prejudice that few parts of Stata are so
> unfairly overlooked as the basic string functions. See also
>
> Cox, N.J. 2011. Speaking Stata: Fun and fluency with functions. The
> Stata Journal 11(3): 460-471
>
> Abstract. Functions are the unsung heroes of Stata. This column is a
> tour of functions that might easily be missed or underestimated, with
> a potpourri of tips, tricks, and examples for a wide range of basic
> problems.
>
> for a review.
>
> On Fri, Jun 1, 2012 at 11:39 PM, Steve Nakoneshny <[email protected]> wrote:
>
>> I have been provided with a dataset containing date and time variables in string format. I wish to convert these to SIF type using the -clock- function, however I have run into a small problem given that the times are formatted as military time (sadly without the leading zero). The code -gen double pcutime = clock(ArrivedOnPCU, "YMDhm")- executes imperfectly.
>>
>> After formatting pcutime to %tc, I can see that some of the times translate imperfectly:
>>
>> ArrivedOnPCU pcutime
>> 2011/04/06 1630 06apr2011 16:30:00
>> 2010/07/18 700 .
>> 2011/09/06 400 .
>> 2011/06/23 130 23jun2011 13:00:00
>>
>> If I manually edit the second obs to read as "2010/07/18 0700" and -replace pcutime = clock(ArrivedOnPCU, "YMDhm"), pcutime displays 18jul2010 07:00:00. It is pretty obvious to me that I'm choosing the wrong mask in the clock function to fail to account for both the missing values in pcutime as well as the incorrect times (i.e. 0130 translating to 13:00).
>>
>> I've tried a various permutations of hm/HM/HHMM/hhmm to try to adjust, but to no avail. Can anybody suggest a better mask for me to use? Or perhaps some relatively simple means of inserting a leading "0" into the time portion of the string prior to using -clock-?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/