The Python standard library’s handling of dates and times is notoriously awful. By default, timezones are largely ignored and doing any kind of semi-useful operation on a
datetime object is unintuitive and time-consuming (just try converting a naive
datetime to a timestamp).
If you’re using Mongo as your datastore, you’ll probably already have some awareness that Mongo exposes BSON ISO dates as Python
datetime objects. However, the details of how this works are somewhat non-obvious, especially when you consider naive vs timezone-aware
So I decided to dig in and document what actually happens much for my own benefit as everyone else’s.
So, what happens?
Pymongo uses BSON to encode / decode documents which it saves / retrieves from the underlying data store.
BSON applies the following rules when it encounters a field which is a Python date time:
Internally BSON encodes a
datetime field as a unix timestamp - i.e. milliseconds since the Epoch. In order to do this it uses the following code:
if isinstance(value, datetime.datetime): if value.utcoffset() is not None: value = value - value.utcoffset() millis = int(calendar.timegm(value.timetuple()) * 1000 + value.microsecond / 1000) return BSONDAT + name + struct.pack("<q", millis)
So in a nutshell:
- If the given
datetimeis timezone-aware, BSON will apply the UTC offset for that timezone to the
datetimeobject so that when it is converted to an Epoch timestamp it is using the equivalent UTC time.
- If the given
datetimeis naive, BSON assumes the
datetimeis in UTC and directly converts it to an Epoch timestamp. This means you need to be vary careful about saving naive
datetimes which are not in UTC.
Given that a BSON date is just an Epoch timestamp, BSON (and to some extent Mongo) then simply needs to create a
datetime from that timestamp when it encounters such a field in a loaded document.
This is the code it uses to do this (note that this only applies as of pymongo 1.7):
def _get_date(data, position, as_class, tz_aware, uuid_subtype): millis = struct.unpack("<q", data[position:position + 8]) diff = millis % 1000 seconds = (millis - diff) / 1000 position += 8 if tz_aware: dt = EPOCH_AWARE + datetime.timedelta(seconds=seconds) else: dt = EPOCH_NAIVE + datetime.timedelta(seconds=seconds) return dt.replace(microsecond=diff * 1000), position
So in a nutshell:
tz_aware=Trueis specified (more on this below), the returned
datetimeis created as timezone-aware and is set up with UTC as the time zone.
tz_aware=Falseis specified, then you returned a naive
By default, pymongo will pass
tz_aware=False to BSON when asking it to decode documents, so your loaded documents will contain naive
datetime objects. To get PyMongo to return timezone-aware
datetime objects, you’ll need to initialize your
MongoClient object with
Pymongo/BSON’s approach to
datetime objects as outlined above, though respecting timezone information (by applying the UTC offset), essentially discards that timezone information when saving to the database. That means that if retaining the timezone of a given
datetime is important to you, you’ll need to store it separately and re-apply it to the
datetime when it is loaded back out of the database (check out the excellent Arrow library to make this super simple).
comments powered by Disqus