I processed the datasets into several shards. If I want to load them as one piece I can do concatenation but it will take some time to index all of the files. Is there a quicker way to load the dataset like a memory mapping from several dataset shards?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| How to concatenate 100s of small datasets into a very large dataset? *Without* loading into memory? | 1 | 465 | May 18, 2023 | |
| [urgent]Can you reconstruct datasets using the cache file (.arrow file)? | 5 | 1103 | August 27, 2021 | |
| How to save datasets as distributed with save_to_disk? | 1 | 2529 | November 15, 2022 | |
| `load_dataset` results in OOM | 0 | 191 | June 25, 2024 | |
| [Bug?] Datasets map and concatenation after sharding OOM | 1 | 42 | September 4, 2024 |