Using Xarray
As of rex v0.2.99
, you can read NREL data files
using the popular open-source library xarray. You can learn
more about the benefits of using xarray
here.
Basic Usage
Opening a single file
To read in an NREL data file, simply pip install NREL-rex
and then supply engine="rex"
to the xarray
open_dataset
function:
import os
import xarray as xr
from rex import TESTDATADIR
WTK_2012_FP = os.path.join(TESTDATADIR, 'wtk', 'ri_100_wtk_2012.h5')
ds = xr.open_dataset(WTK_2012_FP, engine="rex")
ds
<xarray.Dataset> Size: 32MB
Dimensions: (time: 8784, gid: 200)
Coordinates:
time_index (time) datetime64[ns] 70kB ...
* time (time) datetime64[ns] 70kB 2012-01-01 ... 2012-12-31T...
* gid (gid) int64 2kB 0 1 2 3 4 5 ... 194 195 196 197 198 199
latitude (gid) float32 800B ...
longitude (gid) float32 800B ...
country (gid) |S13 3kB ...
state (gid) |S4 800B ...
county (gid) |S22 4kB ...
timezone (gid) int16 400B ...
elevation (gid) int16 400B ...
offshore (gid) int16 400B ...
Data variables:
pressure_0m (time, gid) uint16 4MB ...
pressure_100m (time, gid) uint16 4MB ...
pressure_200m (time, gid) uint16 4MB ...
temperature_100m (time, gid) int16 4MB ...
temperature_80m (time, gid) int16 4MB ...
winddirection_100m (time, gid) uint16 4MB ...
winddirection_80m (time, gid) uint16 4MB ...
windspeed_100m (time, gid) uint16 4MB ...
windspeed_80m (time, gid) uint16 4MB ...
Lazy Loading - rex
Note that the operation above (reading a file using xr.open_dataset
)
did not immediately load arrays into memory. This is because the rex
backend provides lazily-loaded arrays.
Note
This is not dask
, since we did not specify the chunks=...
parameter in xr.open_dataset
.
You can verify this for yourself by checking the private _data
attribute:
print(ds["windspeed_100m"].variable._data)
MemoryCachedArray(array=CopyOnWriteArray(array=LazilyIndexedArray(array=<rex.external.rexarray.RexArrayWrapper object at 0x7fdea431dae0>, key=BasicIndexer((slice(None, None, None), slice(None, None, None))))))
You can see that we have a LazilyIndexedArray
instance around the custom
RexArrayWrapper
object. Notably, there is no data shown.
You can get the data to load into memory like so, although be careful as loading this for the full-sized WIND/NSRDB files will load tens or hundreds of GB into memory:
print(ds["windspeed_100m"].data)
[[ 7.25 7.13 6.9 ... 8.7 8.66 8.45]
[ 8.02 7.7 8.12 ... 6.02 5.98 6.51]
[10.23 9.76 9.82 ... 7.15 7.51 7.69]
...
[ 8.74 8.78 9.19 ... 11.97 12.17 12.43]
[10.34 10.33 10.41 ... 12.87 12.9 13. ]
[10.34 10.43 10.74 ... 14.77 14.85 14.82]]
Now if we check the _data
attribute again, we can see that the data values
have been loaded:
print(ds["windspeed_100m"].variable._data)
MemoryCachedArray(array=NumpyIndexingAdapter(array=array([[ 7.25, 7.13, 6.9 , ..., 8.7 , 8.66, 8.45],
[ 8.02, 7.7 , 8.12, ..., 6.02, 5.98, 6.51],
[10.23, 9.76, 9.82, ..., 7.15, 7.51, 7.69],
...,
[ 8.74, 8.78, 9.19, ..., 11.97, 12.17, 12.43],
[10.34, 10.33, 10.41, ..., 12.87, 12.9 , 13. ],
[10.34, 10.43, 10.74, ..., 14.77, 14.85, 14.82]], shape=(8784, 200))))
Operations on these arrays are not lazy and will cause them to get loaded into memory:
print(ds["windspeed_80m"] * 2)
<xarray.DataArray 'windspeed_80m' (time: 8784, gid: 200)> Size: 14MB
array([[12.96, 12.82, 12.46, ..., 17.2 , 17.16, 16.76],
[14.2 , 13.58, 14.5 , ..., 11.82, 11.24, 11.78],
[18.64, 17.32, 17.26, ..., 14.18, 14.96, 15.32],
...,
[16.04, 16.2 , 17.02, ..., 23.8 , 24.22, 24.74],
[18.92, 18.92, 18.9 , ..., 25.56, 25.64, 25.86],
[18.9 , 19.18, 19.74, ..., 29.3 , 29.46, 29.42]], shape=(8784, 200))
Coordinates:
time_index (time) datetime64[ns] 70kB ...
* time (time) datetime64[ns] 70kB 2012-01-01 ... 2012-12-31T23:00:00
* gid (gid) int64 2kB 0 1 2 3 4 5 6 7 ... 193 194 195 196 197 198 199
latitude (gid) float32 800B ...
longitude (gid) float32 800B ...
country (gid) |S13 3kB ...
state (gid) |S4 800B ...
county (gid) |S22 4kB ...
timezone (gid) int16 400B ...
elevation (gid) int16 400B ...
offshore (gid) int16 400B ...
Lazy Loading - dask
We can also request that our data be read in lazily using dask.
The easiest way to do this is to provide a chunks=...
parameter in xr.open_dataset
:
ds_dask = xr.open_dataset(WTK_2012_FP, engine="rex", chunks="auto")
ds_dask
<xarray.Dataset> Size: 32MB
Dimensions: (time: 8784, gid: 200)
Coordinates:
time_index (time) datetime64[ns] 70kB dask.array<chunksize=(8784,), meta=np.ndarray>
* time (time) datetime64[ns] 70kB 2012-01-01 ... 2012-12-31T...
* gid (gid) int64 2kB 0 1 2 3 4 5 ... 194 195 196 197 198 199
latitude (gid) float32 800B dask.array<chunksize=(200,), meta=np.ndarray>
longitude (gid) float32 800B dask.array<chunksize=(200,), meta=np.ndarray>
country (gid) |S13 3kB dask.array<chunksize=(200,), meta=np.ndarray>
state (gid) |S4 800B dask.array<chunksize=(200,), meta=np.ndarray>
county (gid) |S22 4kB dask.array<chunksize=(200,), meta=np.ndarray>
timezone (gid) int16 400B dask.array<chunksize=(200,), meta=np.ndarray>
elevation (gid) int16 400B dask.array<chunksize=(200,), meta=np.ndarray>
offshore (gid) int16 400B dask.array<chunksize=(200,), meta=np.ndarray>
Data variables:
pressure_0m (time, gid) uint16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
pressure_100m (time, gid) uint16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
pressure_200m (time, gid) uint16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
temperature_100m (time, gid) int16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
temperature_80m (time, gid) int16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
winddirection_100m (time, gid) uint16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
winddirection_80m (time, gid) uint16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
windspeed_100m (time, gid) uint16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
windspeed_80m (time, gid) uint16 4MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
We can immediately tell that dask is being used because the data are represented by dask arrays. Operations on this dataset are lazy:
print(ds_dask["windspeed_100m"].mean())
<xarray.DataArray 'windspeed_100m' ()> Size: 8B
dask.array<mean_agg-aggregate, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>
We can see that no values has been given. To run the computation, we have to call the .compute()
method:
print(ds_dask["windspeed_100m"].mean().compute())
<xarray.DataArray 'windspeed_100m' ()> Size: 8B
array(7.65926428)
For more information on using dask with xarray, see this guide.
Opening Multiple Files
You can use xr.open_mfdataset
to open multiple NREL data files at once.
Important
By default, xarray
does not assume that the coordinate data (i.e. meta variables)
match across files. As a result, it will try to load all coordinates and compare them
during the concatenation step. This process involves significant I/O and can drastically
increase the runtime of xr.open_mfdataset
calls. Since the underlying assumption in
rex-style data is that the meta remains consistent across files, we can tell xarray
to skip this validation by passing compat="override"
and coords="minimal"
to the
xr.open_mfdataset
call.
import os
import xarray as xr
from rex import TESTDATADIR
WTK_FPS = os.path.join(TESTDATADIR, 'wtk', 'ri_100_wtk_20*.h5')
ds = xr.open_mfdataset(WTK_FPS, engine="rex", compat="override", coords="minimal")
ds
<xarray.Dataset> Size: 63MB
Dimensions: (time: 17544, gid: 200)
Coordinates:
time_index (time) datetime64[ns] 140kB dask.array<chunksize=(8784,), meta=np.ndarray>
* time (time) datetime64[ns] 140kB 2012-01-01 ... 2013-12-31...
* gid (gid) int64 2kB 0 1 2 3 4 5 ... 194 195 196 197 198 199
latitude (gid) float32 800B dask.array<chunksize=(200,), meta=np.ndarray>
longitude (gid) float32 800B dask.array<chunksize=(200,), meta=np.ndarray>
country (gid) |S13 3kB dask.array<chunksize=(200,), meta=np.ndarray>
state (gid) |S4 800B dask.array<chunksize=(200,), meta=np.ndarray>
county (gid) |S22 4kB dask.array<chunksize=(200,), meta=np.ndarray>
timezone (gid) int16 400B dask.array<chunksize=(200,), meta=np.ndarray>
elevation (gid) int16 400B dask.array<chunksize=(200,), meta=np.ndarray>
offshore (gid) int16 400B dask.array<chunksize=(200,), meta=np.ndarray>
Data variables:
pressure_0m (time, gid) uint16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
pressure_100m (time, gid) uint16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
pressure_200m (time, gid) uint16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
temperature_100m (time, gid) int16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
temperature_80m (time, gid) int16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
winddirection_100m (time, gid) uint16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
winddirection_80m (time, gid) uint16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
windspeed_100m (time, gid) uint16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
windspeed_80m (time, gid) uint16 7MB dask.array<chunksize=(8784, 200), meta=np.ndarray>
The shape of time
indicates that two years of data have been loaded. You can also verify this directly:
print(ds.time[[0, -1]])
<xarray.DataArray 'time' (time: 2)> Size: 16B
array(['2012-01-01T00:00:00.000000000', '2013-12-31T23:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
time_index (time) datetime64[ns] 16B dask.array<chunksize=(2,), meta=np.ndarray>
* time (time) datetime64[ns] 16B 2012-01-01 2013-12-31T23:00:00
Attributes:
standard_name: time
long_name: time
calendar: proleptic_gregorian
time_zone: UTC
Remote Files
You can also use xarray
to open remote files directly. If you do not have
access to the NREL HPC, this guide is for you. See instructions
here
on where to find these file paths on S3 and HSDS.
Files on S3
For files on S3, you do not need to do anything beyond just providing the S3 filepath:
import xarray as xr
ds = xr.open_dataset("s3://nrel-pds-nsrdb/current/nsrdb_1998.h5", engine="rex")
ds
<xarray.Dataset> Size: 2TB
Dimensions: (time: 17520, gid: 2018267)
Coordinates:
time_index (time) datetime64[ns] 140kB ...
* time (time) datetime64[ns] 140kB 1998-01-01 ... 1998...
* gid (gid) int64 16MB 0 1 2 ... 2018264 2018265 2018266
latitude (gid) float32 8MB ...
longitude (gid) float32 8MB ...
elevation (gid) int16 4MB ...
timezone (gid) int16 4MB ...
country (gid) |S36 73MB ...
state (gid) |S31 63MB ...
county (gid) |S51 103MB ...
Data variables: (12/26)
air_temperature (time, gid) int16 71GB ...
alpha (time, gid) uint8 35GB ...
aod (time, gid) uint16 71GB ...
asymmetry (time, gid) int8 35GB ...
cld_opd_dcomp (time, gid) uint16 71GB ...
cld_press_acha (time, gid) uint16 71GB ...
... ...
ssa (time, gid) uint8 35GB ...
surface_albedo (time, gid) uint8 35GB ...
surface_pressure (time, gid) uint16 71GB ...
total_precipitable_water (time, gid) uint8 35GB ...
wind_direction (time, gid) uint16 71GB ...
wind_speed (time, gid) uint16 71GB ...
Attributes:
version: 3.2.2
Just like before, the data is lazy-loaded, so reading in the file does not take too long. However, once you start processing the data, it will need to be downloaded, which can be time consuming.
Files on HSDS
A more performant option is to use HSDS (see this guide on setting up your own local hsds server):
import xarray as xr
ds = xr.open_dataset("/nrel/nsrdb/v3/nsrdb_1998.h5", engine="rex", hsds=True)
ds
<xarray.Dataset> Size: 2TB
Dimensions: (time: 17520, gid: 2018392)
Coordinates: (12/13)
time_index (time) datetime64[ns] 140kB ...
* time (time) datetime64[ns] 140kB 1998-01-01 ... 1998...
* gid (gid) int64 16MB 0 1 2 ... 2018389 2018390 2018391
latitude (gid) float32 8MB ...
longitude (gid) float32 8MB ...
elevation (gid) float32 8MB ...
... ...
country (gid) |S30 61MB ...
state (gid) |S30 61MB ...
county (gid) |S30 61MB ...
urban (gid) |S30 61MB ...
population (gid) int32 8MB ...
landcover (gid) int16 4MB ...
Data variables: (12/25)
air_temperature (time, gid) int8 35GB ...
alpha (time, gid) int16 71GB ...
aod (time, gid) int16 71GB ...
asymmetry (time, gid) int16 71GB ...
cld_opd_dcomp (time, gid) int16 71GB ...
cld_reff_dcomp (time, gid) int16 71GB ...
... ...
ssa (time, gid) int16 71GB ...
surface_albedo (time, gid) int16 71GB ...
surface_pressure (time, gid) int16 71GB ...
total_precipitable_water (time, gid) int16 71GB ...
wind_direction (time, gid) int16 71GB ...
wind_speed (time, gid) int16 71GB ...
Attributes:
Version: 3.0.6
Opening Multiple Remote Files
xr.open_mfdataset
does not support the wildcard (*
) syntax for remote files,
so to open multiple files on S3, you have to list them out explicitly:
import xarray as xr
files = [
"s3://nrel-pds-nsrdb/current/nsrdb_1998.h5",
"s3://nrel-pds-nsrdb/current/nsrdb_1999.h5",
]
ds = xr.open_mfdataset(files, engine="rex", compat="override", coords="minimal")
ds
<xarray.Dataset> Size: 3TB
Dimensions: (time: 35040, gid: 2018267)
Coordinates:
time_index (time) datetime64[ns] 280kB dask.array<chunksize=(17520,), meta=np.ndarray>
* time (time) datetime64[ns] 280kB 1998-01-01 ... 1999...
* gid (gid) int64 16MB 0 1 2 ... 2018264 2018265 2018266
latitude (gid) float32 8MB dask.array<chunksize=(2018267,), meta=np.ndarray>
longitude (gid) float32 8MB dask.array<chunksize=(2018267,), meta=np.ndarray>
elevation (gid) int16 4MB dask.array<chunksize=(2018267,), meta=np.ndarray>
timezone (gid) int16 4MB dask.array<chunksize=(2018267,), meta=np.ndarray>
country (gid) |S36 73MB dask.array<chunksize=(2018267,), meta=np.ndarray>
state (gid) |S31 63MB dask.array<chunksize=(2018267,), meta=np.ndarray>
county (gid) |S51 103MB dask.array<chunksize=(2018267,), meta=np.ndarray>
Data variables: (12/26)
air_temperature (time, gid) int16 141GB dask.array<chunksize=(2000, 500), meta=np.ndarray>
alpha (time, gid) uint8 71GB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
aod (time, gid) uint16 141GB dask.array<chunksize=(2000, 500), meta=np.ndarray>
asymmetry (time, gid) int8 71GB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
cld_opd_dcomp (time, gid) uint16 141GB dask.array<chunksize=(2000, 500), meta=np.ndarray>
cld_press_acha (time, gid) uint16 141GB dask.array<chunksize=(2000, 500), meta=np.ndarray>
... ...
ssa (time, gid) uint8 71GB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
surface_albedo (time, gid) uint8 71GB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
surface_pressure (time, gid) uint16 141GB dask.array<chunksize=(2000, 500), meta=np.ndarray>
total_precipitable_water (time, gid) uint8 71GB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
wind_direction (time, gid) uint16 141GB dask.array<chunksize=(2000, 500), meta=np.ndarray>
wind_speed (time, gid) uint16 141GB dask.array<chunksize=(2000, 500), meta=np.ndarray>
Attributes:
version: 3.2.2
Due to technical limitations, you cannot use xr.open_mfdataset
to open multiple files
via HSDS. Instead, you can use the rex.open_mfdataset_hsds
function, which does
accept wildcard inputs:
import xarray as xr
from rex import open_mfdataset_hsds
ds = open_mfdataset_hsds("/nrel/nsrdb/v3/nsrdb_199*.h5", compat="override", coords="minimal")
ds
<xarray.Dataset> Size: 3TB
Dimensions: (time: 35040, gid: 2018392)
Coordinates: (12/13)
time_index (time) datetime64[ns] 280kB dask.array<chunksize=(17520,), meta=np.ndarray>
* time (time) datetime64[ns] 280kB 1998-01-01 ... 1999...
* gid (gid) int64 16MB 0 1 2 ... 2018389 2018390 2018391
latitude (gid) float32 8MB dask.array<chunksize=(252299,), meta=np.ndarray>
longitude (gid) float32 8MB dask.array<chunksize=(252299,), meta=np.ndarray>
elevation (gid) float32 8MB dask.array<chunksize=(12856,), meta=np.ndarray>
... ...
country (gid) |S30 61MB dask.array<chunksize=(12856,), meta=np.ndarray>
state (gid) |S30 61MB dask.array<chunksize=(12856,), meta=np.ndarray>
county (gid) |S30 61MB dask.array<chunksize=(12856,), meta=np.ndarray>
urban (gid) |S30 61MB dask.array<chunksize=(12856,), meta=np.ndarray>
population (gid) int32 8MB dask.array<chunksize=(12856,), meta=np.ndarray>
landcover (gid) int16 4MB dask.array<chunksize=(12856,), meta=np.ndarray>
Data variables: (12/25)
air_temperature (time, gid) int8 71GB dask.array<chunksize=(2688, 744), meta=np.ndarray>
alpha (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
aod (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
asymmetry (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
cld_opd_dcomp (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
cld_reff_dcomp (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
... ...
ssa (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
surface_albedo (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
surface_pressure (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
total_precipitable_water (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
wind_direction (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
wind_speed (time, gid) int16 141GB dask.array<chunksize=(2688, 372), meta=np.ndarray>
Attributes:
Version: 3.0.6
The object returned by this function is a standard xarray.Dataset
, so you can plug it directly into
your analysis workflow.
Parallel Computing with dask
Although your computations are lazy if you load your data with dask
, they still only run on a single
process (or thread pool) by default (see here
for more info on the dask
schedulers). In order to quickly and easily parallelize your computations,
you can use dask-distributed.
To start off, install the required library:
$ pip install distributed --upgrade
Next, you should start a dask
client that controls your parallelization scheme:
from dask.distributed import Client
client = Client(n_workers=4, memory_limit='10GB')
In this example, we have told dask that we would like our computations to take
up 4 cores and a maximum of 10GB of memory per worker. Once this client is
running, you can write your data analysis code as normal. Any dask
computations you do will be performed in chunks using 4 processes:
import os
import xarray as xr
from rex import TESTDATADIR
WTK_2012_FP = os.path.join(TESTDATADIR, 'wtk', 'ri_100_wtk_2012.h5')
ds = xr.open_dataset(WTK_2012_FP, engine="rex", chunks="auto")
ds["windspeed_100m"].mean(dim="time").compute()
<xarray.DataArray 'windspeed_100m' (gid: 200)> Size: 2kB
array([6.87944558, 6.93480874, 6.99835383, 6.93864071, 6.94729167,
7.10710155, 7.20108265, 7.16832536, 7.17782559, 7.32525729,
7.31806466, 7.33072291, 7.28809312, 7.22210041, 7.26382286,
7.22113616, 7.07780852, 7.08420082, 7.06300888, 7.29566712,
7.39969945, 7.45076844, 7.43958447, 7.43899704, 7.46535405,
7.42845401, 7.29795651, 7.19493852, 7.05814663, 6.88442168,
6.85640938, 6.85859176, 7.05063525, 7.09214253, 7.14200478,
7.23776412, 7.31980874, 7.27634107, 7.23299977, 7.30763434,
7.36336179, 7.29896175, 7.14903347, 7.01208447, 6.89744877,
6.75741234, 6.70825137, 6.7998816 , 6.95533584, 6.86520833,
6.89120105, 6.97641507, 7.03485883, 7.18894353, 7.24894923,
7.16365437, 6.94202641, 6.83164959, 7.01116462, 7.16896744,
7.12047473, 6.9952163 , 6.87737705, 6.76693875, 6.66649021,
6.64727345, 6.77149704, 6.85299522, 7.09365551, 6.93533356,
6.93294627, 7.01532673, 7.03204235, 7.06432377, 7.13546903,
7.18949567, 7.31379098, 7.25667464, 7.01673725, 6.76937158,
6.77199567, 7.02523338, 7.18999203, 7.13933629, 7.02185109,
6.8873224 , 6.74311362, 6.63313183, 6.56818306, 6.56203097,
6.55499772, 7.82230419, 7.6846209 , 7.01713912, 7.38534608,
7.27920993, 7.14663138, 7.10545879, 7.03550091, 6.89493739,
6.97185109, 7.14635246, 7.99603256, 7.87955373, 8.10967896,
8.04715392, 7.94161544, 7.81445811, 8.21366576, 8.16868056,
8.10710155, 8.00840392, 7.90002732, 7.6702015 , 8.24766735,
8.20477117, 8.14225638, 8.05135246, 7.95503985, 7.78786544,
8.31860997, 8.29575364, 8.25664276, 8.19562955, 8.11789617,
8.02574112, 7.91357013, 8.36183402, 8.35696949, 8.3436612 ,
8.32018784, 8.28632969, 8.23200137, 8.17405738, 8.09175319,
8.00542919, 7.86773224, 8.4832434 , 8.4748133 , 8.44929986,
8.42680328, 8.40229508, 8.38972336, 8.37218807, 8.3485633 ,
8.32098588, 8.27343579, 8.23153119, 8.16400159, 8.08327641,
7.98567851, 7.80577186, 8.50659722, 8.50146403, 8.50739982,
8.51652436, 8.51465733, 8.51314094, 8.49275273, 8.47009904,
8.44045765, 8.42123406, 8.39814663, 8.37284381, 8.34804303,
8.30322746, 8.27097791, 8.21462773, 8.13777778, 8.05820355,
7.94375455, 7.79726207, 8.52080829, 8.52497837, 8.52421107,
8.53359403, 8.53315574, 8.53251138, 8.53214367, 8.53220856,
8.53655624, 8.52449681, 8.50425888, 8.4733857 , 8.45109517,
8.42566826, 8.39944331, 8.37924408, 8.33734403, 8.31360087,
8.26787568, 8.20503074, 8.13375228, 8.04971881, 7.95212659,
8.56010474, 8.55197177, 8.54179417, 8.54154599, 8.53991234])
Coordinates:
* gid (gid) int64 2kB 0 1 2 3 4 5 6 7 ... 193 194 195 196 197 198 199
latitude (gid) float32 800B 41.96 41.98 42.0 41.9 ... 40.91 40.93 40.95
longitude (gid) float32 800B -71.79 -71.79 -71.78 ... -71.79 -71.78 -71.78
country (gid) |S13 3kB b'United States' b'United States' ... b'None'
state (gid) |S4 800B b'RI' b'RI' b'RI' ... b'None' b'None' b'None'
county (gid) |S22 4kB b'Providence' b'Providence' ... b'None' b'None'
timezone (gid) int16 400B -5 -5 -5 -5 -5 -5 -5 -5 ... -5 -5 -5 -5 -5 -5 -5
elevation (gid) int16 400B 202 178 174 195 201 170 187 ... 0 0 0 0 0 0 0
offshore (gid) int16 400B 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 1
Remember that in order for the computations to be distributed using dask
, you must
load your data into dask
arrays. The easiest way to do this is to specify chunks=...
when your read in the data (as we’ve done above).
Case Studies
Once you have opened the file with xarray
, you can take full advantage of the library’s
powerful features to perform data analysis tasks. Check out some examples of this below: