Serving Binary Files from SQLite
My blog software (owl-blogs) uses a single SQLite database to store everything, including all files uploaded. I'm aware that storing large files in a relational database isn't best practice. It started out as a placeholder implementation, but I liked the idea to have a single file I can backup.
One reason against storing binary blobs in relational databases often stated is read performance, but I didn't find any benchmarks supporting this claim. Therefore I built a small test setup to see the difference between serving binary files out of a SQLite database vs serving from the file system directly.
As my blog is written in Go, I created the a simple server similar to my blog. It uses sqlx and go-sqlite3 for the database handling and net/http
for the static file server
package main
import (
"log"
"net/http"
"github.com/jmoiron/sqlx"
_ "github.com/mattn/go-sqlite3"
)
type sqlBinaryFile struct {
Data []byte `db:"data"`
}
type sqlHandler struct {
Db *sqlx.DB
}
func (h *sqlHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
id := r.PathValue("filename")
var sqlFile sqlBinaryFile
h.Db.Get(&sqlFile, "SELECT data FROM files WHERE id = ?", id)
w.Write(sqlFile.Data)
}
func main() {
db := sqlx.MustOpen("sqlite3", "files.db")
sql := &sqlHandler{Db: db}
fs := http.StripPrefix("/dir/", http.FileServer(http.Dir("./static")))
http.Handle("/dir/", fs)
http.Handle("/sqlite/{filename}", sql)
log.Print("Listening on :3000...")
err := http.ListenAndServe(":3000", nil)
if err != nil {
log.Fatal(err)
}
}
As a test set I created 2000 files between 200kb and 4MB in size using a simple python script:
import os
import random
for i in range(2000):
os.system(f"head -c {(random.randint(200, 4000))}K </dev/urandom > static/{i:05d}.bin")
The SQLite database was created with this script:
import os
import sqlite3
os.remove("files.db")
con = sqlite3.connect("files.db")
cur = con.cursor()
cur.execute("CREATE TABLE files( id VARCHAR(255) PRIMARY KEY, data BLOB NOT NULL )")
for f in os.listdir("static"):
print(f)
data = open("static/" + f, "rb").read()
cur.execute("INSERT INTO files(id, data) VALUES (?, ?)", (f, data))
con.commit()
To benchmark the server I created two files listing all file URLs (one for sqlite, one fot filesystem) and used siege to run the benchmark with this configuration.
siege -f urls_sqlite.txt -c 1 -b --time=10s -j
The test was executed on my laptop:
- CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
- CPU max MHz: 4600,0000
- Memory: 16 GB
I ran the test with different concurrency and plotted the results:
For a low throughput system (such as my blog) the difference between SQLite and the filesystem is small enough to not care about. The possible throughput (transaction/second) of the filesystem is ~2.3 times higher. The response time grows slower with increased concurrency.
For the time being I will stick with my SQLite solution. Once my blog gets really popular I can easily change the implementation of the binary repository.