Libove Blog

Personal Blog about anything - mostly programming, cooking and random thoughts

Serving Binary Files from SQLite

My blog software (owl-blogs) uses a single SQLite database to store everything, including all files uploaded. I'm aware that storing large files in a relational database isn't best practice. It started out as a placeholder implementation, but I liked the idea to have a single file I can backup.

One reason against storing binary blobs in relational databases often stated is read performance, but I didn't find any benchmarks supporting this claim. Therefore I built a small test setup to see the difference between serving binary files out of a SQLite database vs serving from the file system directly.

As my blog is written in Go, I created the a simple server similar to my blog. It uses sqlx and go-sqlite3 for the database handling and net/http for the static file server

package main

import (
	"log"
	"net/http"

	"github.com/jmoiron/sqlx"
	_ "github.com/mattn/go-sqlite3"
)

type sqlBinaryFile struct {
	Data []byte `db:"data"`
}

type sqlHandler struct {
	Db *sqlx.DB
}

func (h *sqlHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	id := r.PathValue("filename")
	var sqlFile sqlBinaryFile
	h.Db.Get(&sqlFile, "SELECT data FROM files WHERE id = ?", id)
	w.Write(sqlFile.Data)
}

func main() {

	db := sqlx.MustOpen("sqlite3", "files.db")
	sql := &sqlHandler{Db: db}

	fs := http.StripPrefix("/dir/", http.FileServer(http.Dir("./static")))
	http.Handle("/dir/", fs)
	http.Handle("/sqlite/{filename}", sql)

	log.Print("Listening on :3000...")
	err := http.ListenAndServe(":3000", nil)
	if err != nil {
		log.Fatal(err)
	}
}

As a test set I created 2000 files between 200kb and 4MB in size using a simple python script:

import os
import random

for i in range(2000):
    os.system(f"head -c {(random.randint(200, 4000))}K </dev/urandom > static/{i:05d}.bin")

The SQLite database was created with this script:

import os
import sqlite3

os.remove("files.db")
con = sqlite3.connect("files.db")
cur = con.cursor()

cur.execute("CREATE TABLE files( id VARCHAR(255) PRIMARY KEY, data BLOB NOT NULL )")

for f in os.listdir("static"):
    print(f)
    data = open("static/" + f, "rb").read()
    cur.execute("INSERT INTO files(id, data) VALUES (?, ?)", (f, data))

con.commit()

To benchmark the server I created two files listing all file URLs (one for sqlite, one fot filesystem) and used siege to run the benchmark with this configuration.

siege -f urls_sqlite.txt -c 1 -b --time=10s -j

The test was executed on my laptop:

  • CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  • CPU max MHz: 4600,0000
  • Memory: 16 GB

I ran the test with different concurrency and plotted the results:

Two plots comparing transactions and response time between SQLite and filesystem

For a low throughput system (such as my blog) the difference between SQLite and the filesystem is small enough to not care about. The possible throughput (transaction/second) of the filesystem is ~2.3 times higher. The response time grows slower with increased concurrency.

For the time being I will stick with my SQLite solution. Once my blog gets really popular I can easily change the implementation of the binary repository.

#hosting #sqlite #go #benchmark #server