When importing large datasets into a Big Table it is necessary to upload the data in stages. This eliminates the volatility of long-running requests and allows for higher performance by parallelizing the upload process. This process is called “stashing”.

Please read our introduction to stashing to understand how stashing works before looking at bulk imports specifically.

Overview

The general sequence of operations for a large import is:

1

Create stash identifier

Create a stable stash ID that will associate all subsequent data uploads.

2

Upload data subsets

Upload data in stages of 500 to 1,000 rows, using the stash ID from step one.

3

Finalize import

Once all data has been uploaded, use the stash ID to import the full dataset as a single atomic table operation.

Create Stash ID

To simplify the coordination, parallelization, and idempotency of the upload process, the stash ID is a value that you create from the relevant information of your domain.

For instance, a daily import process might have a stash ID of 20240501-import. Or, an import specific to a single customer might have a stash ID of customer-381-import.

You are responsible for ensuring that the stash ID is unique and stable across associated uploads.

Upload Data

Once you have a stable stash ID, you can use the stash data endpoint to upload the data in chunks.

Chunks can be sent in parallel to speed up the upload of large datasets. Use the same stash ID across uploads to ensure the final data set is complete, and use the serial to control the order of the chunks within the stash.

As an example, the following stash requests will create a final dataset consisting of the two rows identified by the stash ID 20240501-import. The trailing parameters of 1 and 2 in the request path are the serial IDs. The data in serial 1 will come first in the stash, and the data in serial 2 will come second, even if the requests are processed in a different order.

[
    {
        "Name": "Alex",
        "Age": 30,
        "Birthday": "2024-07-03T10:24:08.285Z"
    },
]
The above is just an example. In practice, you should include more than one row per stash chunk, and if your complete dataset is only 2 rows, you do not need to use stashing at all. See Limits for guidance.

Finalize Import

Once all the data to be imported has been uploaded, you can use the stash ID in one of Glide API’s table endpoints to import the full dataset in a single atomic operation.

Create New Table

To create a table with the data of the stash ID 20240501-import you can use the create table endpoint with the stashID reference of 20240501-import instead of the actual row values.

{
    "name": "New Table",
    "schema": {
        "columns": [ ... ]
    },
    "rows": {
        "$stashID": "20240501-import"
    }
}

Add Rows to Table

To add data to an existing table you can use the add rows to table endpoint with the stashID reference of 20240501-import instead of the actual row values.

{
    "$stashID": "20240501-import"
}

Overwrite Table

To reset an existing table’s data you can use the overwrite table endpoint with the stashID reference of 20240501-import instead of the actual row values.

{
    "$stashID": "20240501-import"
}