You can use multiprocessing to parallelize the downloads and conversion to Arrow by passing num_proc= to load_dataset.
3 Likes
You can use multiprocessing to parallelize the downloads and conversion to Arrow by passing num_proc= to load_dataset.