Chunking arrays in JavaScript
I recently ran into an issue inserting large data into an SQLite database on my react native app. I had purposely created a large mock data set to test what would happen should the user attempt something similar.
And I instantly ran into an SQLite limit, specifically number 9.
QueryFailedError:
too many SQL variables (code 1 SQLITE_ERROR):, while compiling:
INSERT INTO "table_name"("Id", "columnOne", "columnTwo") VALUES (...)
Apparently, this is to protect against excessive memory allocation. You can read more about it in the link above.
The problematic query:
// TypeORM
await connection
.createQueryBuilder()
.insert()
.into("table_name")
.values(largeArray.map((item) => item))
.execute()
What we can do here is chunk our data. We can then run multiple queries instead of just one to insert the same amount of data.
To illustrate the data that we currently have, I'll use an array of strings:
const beforeChunking = ["1", "2", "3", "4", "5"]
What we want to have is an array of arrays with a specified chunk size. For example with a chunk size of 2
:
const afterChunking = [["1", "2"], ["3", "4"], ["5"]]
We can go about achieving this by utilizing splice()
. We create a function that takes in the data to chunk and the required chunk size.
const getArrayAsChunks = (array, chunkSize) => {
let result = []
let data = array.slice(0)
while (data[0]) {
result.push(data.splice(0, chunkSize))
}
return result
}
splice()
will remove elements from data
the size of chunkSize
. The remaining elements will move backwards in the array so everything works out.
Now we can run our insert query(ies) like so:
const chunksArray = getArrayAsChunks(largeArray, 100)
chunksArray.map(async (oneChunk) => {
await connection
.createQueryBuilder()
.insert()
.into("table_name")
.values(oneChunk.map((item) => item))
.execute()
})
This took less than two seconds to insert ~2000 rows on a low-end android phone, so I left it at that. But one performance improvement we can make here is to define the SQL query ourselves, rather than let our ORM handle that; generating the query many times over incurs a cost.