Today at the Davis R Users’ Group, Bonnie Dixon gave a tutorial on the various ways to handle dates and times in R. Bonnie provided this great script which walks through essential classes, functions, and packages. Here it is piped through knitr::spin
. The original R script can be found as a gist here.
Date/time classes
Three date/time classes are built-in in R, Date, POSIXct, and POSIXlt.
Date
This is the class to use if you have only dates, but no times, in your data.
create a date:
## [1] "2012-07-22"
non-standard formats must be specified:
## [1] "2011-04-20"
## [1] "2010-10-06"
see list of format symbols:
calculations with dates:
find the difference between dates:
## Time difference of 459 days
## Time difference of 65.57 weeks
Add or subtract days:
## [1] "2011-04-30"
## [1] "2011-04-10"
create a vector of dates and find the intervals between them:
## [1] "2010-07-22" "2011-04-20" "2012-10-06"
## Time differences in days
## [1] 272 535
create a sequence of dates:
## [1] "2012-07-22" "2012-07-29" "2012-08-05" "2012-08-12" "2012-08-19"
## [6] "2012-08-26"
## [1] "2012-07-22" "2012-08-05" "2012-08-19" "2012-09-02" "2012-09-16"
## [6] "2012-09-30"
## [1] "2012-07-22" "2012-08-05" "2012-08-19" "2012-09-02" "2012-09-16"
## [6] "2012-09-30"
see the internal integer representation
## [1] 15543
## Time difference of 15543 days
POSIXct
If you have times in your data, this is usually the best class to use.
create some POSIXct objects:
## [1] "2013-07-24 23:55:26 PDT"
## [1] "2013-07-25 08:32:07 PDT"
specify the time zone:
## [1] "2010-12-01 11:42:03 GMT"
some calculations with times
compare times:
## [1] TRUE
Add or subtract seconds:
## [1] "2013-07-24 23:55:56 PDT"
## [1] "2013-07-24 23:54:56 PDT"
find the difference between times:
## Time difference of 8.611 hours
automatically adjusts for daylight savings time:
## Time difference of 7.611 hours
Get the current time (in POSIXct by default):
## [1] "2014-02-10 18:26:01 PST"
see the internal integer representation:
## [1] 1.375e+09
## attr(,"tzone")
## [1] ""
## Time difference of 1.375e+09 secs
POSIXlt
This class enables easy extraction of specific componants of a time. (“ct” stand for calender time and “lt” stands for local time. “lt” also helps one remember that POXIXlt objects are lists.)
create a time:
## [1] "2013-07-24 23:55:26"
## $sec
## [1] 26
##
## $min
## [1] 55
##
## $hour
## [1] 23
##
## $mday
## [1] 24
##
## $mon
## [1] 6
##
## $year
## [1] 113
##
## $wday
## [1] 3
##
## $yday
## [1] 204
##
## $isdst
## [1] 1
## sec min hour mday mon year wday yday isdst
## 26 55 23 24 6 113 3 204 1
extract componants of a time object:
## [1] 26
## [1] 3
truncate or round off the time:
## [1] "2013-07-24"
## [1] "2013-07-24 23:55:00"
chron
This class is a good option when you don’t need to deal with timezones. It requires the package chron
.
## Loading required package: chron
##
## Attaching package: 'chron'
##
## The following objects are masked from 'package:lubridate':
##
## days, hours, minutes, seconds, years
create some times:
## [1] (07/24/13 23:55:26)
## [1] (07/25/13 08:32:07)
extract just the date:
## day
## 07/24/13
compare times:
## [1] TRUE
add days:
## [1] (08/03/13 23:55:26)
calculate the differene between times:
## [1] 08:36:41
## Time difference of 8.611 hours
does not adjust for daylight savings time:
## [1] 08:36:41
Detach the chron
package as it will interfere with lubridate
later in this script.
Summary of date/time classes
- When you just have dates, use Date.
- When you have times, POSIXct is usually the best,
- but POSIXlt enables easy extraction of specific components
- and chron is simplest when you don’t need to deal with timezones and daylight savings time.
Manipulating times and dates
lubridate
This package is a wrapper for POSIXct with more intuitive syntax.
create a time:
## [1] "2013-07-24 23:55:26 UTC"
## [1] "2013-07-25 08:32:00 UTC"
## [1] "2013-07-25 04:00:00 UTC"
## [1] "2013-07-26 UTC"
some manipulations: extract or reassign componants:
## [1] 2013
## [1] 30
## [1] Wed
## Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat
## [1] 23
## [1] "UTC"
## [1] "2013-07-25 08:32:07 UTC"
converting to decimal hours can facilitate some types of calculations:
## [1] 23.92
Lubridate distinguishes between four types of objects: instants, intervals, durations, and periods. An instant is a specific moment in time. Intervals, durations, and periods are all ways of recording time spans.
Dates and times parsed in lubridate are instants:
## [1] TRUE
round an instant:
## [1] "2013-07-24 23:55:00 UTC"
## [1] "2013-07-25 UTC"
get the current time or date as an instant:
## [1] "2014-02-10 18:26:02 PST"
## [1] "2014-02-10"
Note that lubridate uses UTC time zones as default.
see an instant in a different time zone:
## [1] "2013-07-24 16:55:26 PDT"
change the time zone of an instant (keeping the same clock time):
## [1] "2013-07-24 23:55:26 PDT"
some calculations with instants. Note that the units are seconds:
## Time difference of 8.611 hours
## [1] TRUE
## [1] "2013-07-24 23:55:56 UTC"
An interval is the span of time that occurs between two specified instants.
## [1] 2013-07-24 23:55:26 UTC--2013-07-25 08:32:07 UTC
Check whether a certain instant occured with a specified interval:
## [1] TRUE
## [1] FALSE
determine whether two intervals overlap:
## [1] 2013-07-25 06:03:00 UTC--2013-07-25 20:23:00 UTC
## [1] TRUE
A duration is a time span not anchored to specific start and end times. It has an exact, fixed length, and is stored internally in seconds.
create some durations:
## [1] "600s (~10 minutes)"
## [1] "432000s (~5 days)"
## [1] "31536000s (~365 days)"
## [1] "31001s (~8.61 hours)"
arithmatic with durations:
## [1] "2013-07-24 23:45:26 UTC"
## [1] "475200s (~5.5 days)"
## [1] 0.01935
A period is a time span not anchored to specific start and end times, and measured in units larger than seconds with inexact lengths. create some periods:
## [1] "21d 0H 0M 0S"
## [1] "4H 0M 0S"
arithmatic with periods:
## [1] "2013-08-16 UTC"
## [1] "6m 12d 0H 0M 0S"
## estimate only: convert to intervals for accuracy
## [1] 0.108
Calculating mean clock times
Say we have a vector of clock times in decimal hours, and we want to calculate the mean clock time.
## [1] 23.9 0.5 22.7 0.1 23.3 1.2 23.6
## [1] 13.61
The clock has a circular scale, which ends where it begins, so we need to use circular statistics. (For more info on circular statistics see http://en.wikipedia.org/wiki/Mean_of_circular_quantities.)
Get the package, psych.
## [1] 23.9
An example of using times and dates in a data frame
Here is a data frame with a week of hypothetical times of going to bed and getting up for one person, and the total amount of time sleep time obtained each night according to a sleep monitoring device.
sleep <- data.frame(bed.time = ymd_hms("2013-09-01 23:05:24", "2013-09-02 22:51:09",
"2013-09-04 00:09:16", "2013-09-04 23:43:31", "2013-09-06 00:17:41", "2013-09-06 22:42:27",
"2013-09-08 00:22:27"), rise.time = ymd_hms("2013-09-02 08:03:29", "2013-09-03 07:34:21",
"2013-09-04 07:45:06", "2013-09-05 07:07:17", "2013-09-06 08:17:13", "2013-09-07 06:52:11",
"2013-09-08 07:15:19"), sleep.time = dhours(c(6.74, 7.92, 7.01, 6.23, 6.34,
7.42, 6.45)))
sleep
## bed.time rise.time sleep.time
## 1 2013-09-01 23:05:24 2013-09-02 08:03:29 24264s (~6.74 hours)
## 2 2013-09-02 22:51:09 2013-09-03 07:34:21 28512s (~7.92 hours)
## 3 2013-09-04 00:09:16 2013-09-04 07:45:06 25236s (~7.01 hours)
## 4 2013-09-04 23:43:31 2013-09-05 07:07:17 22428s (~6.23 hours)
## 5 2013-09-06 00:17:41 2013-09-06 08:17:13 22824s (~6.34 hours)
## 6 2013-09-06 22:42:27 2013-09-07 06:52:11 26712s (~7.42 hours)
## 7 2013-09-08 00:22:27 2013-09-08 07:15:19 23220s (~6.45 hours)
We want to calculate sleep efficiency, the percent of time in bed spent asleep.
## bed.time rise.time sleep.time efficiency
## 1 2013-09-01 23:05:24 2013-09-02 08:03:29 24264s (~6.74 hours) 75.2
## 2 2013-09-02 22:51:09 2013-09-03 07:34:21 28512s (~7.92 hours) 90.8
## 3 2013-09-04 00:09:16 2013-09-04 07:45:06 25236s (~7.01 hours) 92.3
## 4 2013-09-04 23:43:31 2013-09-05 07:07:17 22428s (~6.23 hours) 84.2
## 5 2013-09-06 00:17:41 2013-09-06 08:17:13 22824s (~6.34 hours) 79.3
## 6 2013-09-06 22:42:27 2013-09-07 06:52:11 26712s (~7.42 hours) 90.9
## 7 2013-09-08 00:22:27 2013-09-08 07:15:19 23220s (~6.45 hours) 93.7
Now let’s calculate the mean of each column:
## Error: 'x' must be numeric
## [1] 23.6
## [1] 7.559
## [1] 6.873
## [1] 86.63
We can also plot sleep duration and efficiency across the week:
par(mar = c(5, 4, 4, 4))
plot(round_date(sleep$rise.time, "day"), sleep$efficiency, type = "o", col = "blue",
xlab = "Morning", ylab = NA)
par(new = TRUE)
plot(round_date(sleep$rise.time, "day"), sleep$sleep.time/3600, type = "o",
col = "red", axes = FALSE, ylab = NA, xlab = NA)
axis(side = 4)
mtext(side = 4, line = 2.5, col = "red", "Sleep duration")
mtext(side = 2, line = 2.5, col = "blue", "Sleep efficiency")
More resources on times and dates
date and time tutorials for R:
- http://www.stat.berkeley.edu/classes/s133/dates.html
- http://science.nature.nps.gov/im/datamgmt/statistics/r/fundamentals/dates.cfm
- http://en.wikibooks.org/wiki/R_Programming/Times_and_Dates
lubridate:
time zone and daylight saving time info:
- http://www.timeanddate.com/
- http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
- http://www.twinsun.com/tz/tz-link.htm
- Also see the R help file at ?Sys.timezone