In this work we show how these new object storage systems can be combined with Python libraries, such as xarray and Dask for distributed analysis and Zarr for data storage in object stores, that allow computations to be easily parallelized without scalability restrictions.
Specific technologies: PyTorch, Zarr, Xarray, Dask, Apache Airflow, Conda, Docke; Location: A day's train ride from London. We are all working remotely at the moment. For the foreseeable future, we are not planning to come to an office in London every day!
Dataset.to_zarr ([store, mode, synchronizer, …]) Write dataset contents to a zarr group. save_mfdataset (datasets, paths[, mode, …]) Write multiple datasets to disk as netCDF files simultaneously. Dataset.to_array ([dim, name]) Convert this dataset into an xarray.DataArray: Dataset.to_dataframe Convert this dataset into a pandas.DataFrame.