Bulk Import
Importing large datasets into a Big Table
When importing large datasets into a Big Table it is necessary to upload the data in stages. This eliminates the volatility of long-running requests and allows for higher performance by parallelizing the upload process. This process is called “stashing”.
Please read our introduction to stashing to understand how stashing works before looking at bulk imports specifically.
Overview
The general sequence of operations for a large import is:
Create stash identifier
Create a stable stash ID that will associate all subsequent data uploads.
Upload data subsets
Upload data in stages of 500 to 1,000 rows, using the stash ID from step one.
Finalize import
Once all data has been uploaded, use the stash ID to import the full dataset as a single atomic table operation.
Create Stash ID
To simplify the coordination, parallelization, and idempotency of the upload process, the stash ID is a value that you create from the relevant information of your domain.
For instance, a daily import process might have a stash ID of 20240501-import
. Or, an import specific to a single customer might have a stash ID of customer-381-import
.
You are responsible for ensuring that the stash ID is unique and stable across associated uploads.
Upload Data
Once you have a stable stash ID, you can use the stash data endpoint to upload the data in chunks.
Chunks can be sent in parallel to speed up the upload of large datasets. Use the same stash ID across uploads to ensure the final data set is complete, and use the serial to control the order of the chunks within the stash.
As an example, the following stash requests will create a final dataset consisting of the two rows identified by the stash ID 20240501-import
. The trailing parameters of 1
and 2
in the request path are the serial IDs. The data in serial 1
will come first in the stash, and the data in serial 2
will come second, even if the requests are processed in a different order.
Finalize Import
Once all the data to be imported has been uploaded, you can use the stash ID in one of Glide API’s table endpoints to import the full dataset in a single atomic operation.
Create New Table
To create a table with the data of the stash ID 20240501-import
you can use the create table endpoint with the stashID
reference of 20240501-import
instead of the actual row values.
Add Rows to Table
To add data to an existing table you can use the add rows to table endpoint with the stashID
reference of 20240501-import
instead of the actual row values.
Overwrite Table
To reset an existing table’s data you can use the overwrite table endpoint with the stashID
reference of 20240501-import
instead of the actual row values.
Was this page helpful?