It will be even better if we can move into one, blessed, solution under the pydata umbrella (or similar). Regarding interoperating, it will be great. It has worked really nice for us, working in most of the common cases even for numpy arrays. In spite of this limitations, we chose wrapping because we want to support quantities even if NumPy is not installed. In any case, as was mentioned before in the thread Custom dtypes and Duck typing will be great for this. Therefore any function that call numpy.asanyarray will erase the information that this is a quantity (See my issue here numpy/numpy#4072). Wrapping (instead of subclassing) adds another issue: some Numpy functions do not recognize a Quantity object as an array. We also mention these limitations in the Pint documentation.
It would be nice to see some code using xarray and units (as if this was an already implemented I do agree with your views. We were convinced that our current design was right when we wrote the first code using it. It was was working fine but the performance and memory hit was too large. Hence why I think custom dtypes would really be the ideal When we prototyped Pint we tried putting Quantity objects inside numpy array. If you had an actual (parametric) dtype for units (e.g., Quantity), then you would get all those dtype agnostic methods for free, which would make your life as an implementer much easier. But something like Quantity really only needs to override compute operations so that they can propagate dtypes - there shouldn't be a need to override methods like concatenate. Once we have all that duck-array stuff, then yes, you certainly could write all a duck-array Quantity type that can wrap generic duck-arrays. But the worst part is that the lack of standard interfaces means that we lose the possibility of composing different arrays backends with your Quantity type - it will only be able to wrap dask or numpy arrays, not sparse matrices or bolt arrays or some other type yet to be invented. Likewise, we need overrides for standard numpy array utility functions like concatenate. Until _numpy_ufunc_ lands, we can't override operations like np.sqrt in a way that is remotely feasible for dask.arrays (we can't afford to load big arrays into memory). So far, so good - but with the current state of duck array typing in NumPy, it's really hard to be happy with this. This suggests that tagging on the outside is the better approach. It may be that in this case one would not use Quantity proper, but rather just the parts of units where the real magic happens: in the Unit class (which does the unit conversion) and in quantity_helpers.py (which tells what unit conversion is necessary for a given It would certainly be possible to extend dask.array to handle units, in either of the ways you suggest.Īlthough you could allow Quantity objects inside dask.arrays, I don't like that approach, because static checks like units really should be done only once when arrays are constructed (akin to dtype checks) rather than at evaluation time, and for every chunk. I think the parts that truly wrap might be separated from those that override ndarray methods, and would be willing to implement that if there is a good reason (like making dask quantities possible.).
This would also not seem to be that hard, given that astropy's Quantity is really just a wrapper around ndarray that carries a Unit instance. as one who thinks unit support is probably the single best thing astropy has (and is co-maintainer of astropy.units), I thought I'd pipe in: why would it be a problem that astropy's Quantity is an ndarray subclass? I must admit not having used dask arrays, but since they use ndarray internally for the pieces, shouldn't the fact that Quantity has the same interface/methods, make it relatively easy to swap ndarray for Quantity internally? I'd be quite happy to help think about this (surely it cannot be as bad as it is for MaskedArray -).Īlternatively, maybe it is easier to tag on the outside rather than the inside.