I recently added hashtag support to owl-blogs. The initial reason for this was to to make post more discoverable via ActivityPub, but I found it helpful to further categories posts.
The implementation was quiet simple. I use the markdown renderer goldmark which has a plugin for hashtags.
As tags are also part of microformats2, I wanted to mark the hashtags accordingly.
This is currently not possible with the hashtag extension.
I've extended this to allow adding arbitrary attributes to the link tag (Related Pull Request).
Until this is merged into the main repository I'll use my own version, which can be done by adding a replace directive to the go.mod
My blog software (owl-blogs) uses a single SQLite database to store everything, including all files uploaded. I'm aware that storing large files in a relational database isn't best practice. It started out as a placeholder implementation, but I liked the idea to have a single file I can backup.
One reason against storing binary blobs in relational databases often stated is read performance, but I didn't find any benchmarks supporting this claim. Therefore I built a small test setup to see the difference between serving binary files out of a SQLite database vs serving from the file system directly.
As my blog is written in Go, I created the a simple server similar to my blog. It uses sqlx and go-sqlite3 for the database handling and net/http for the static file server
package main
import (
"log"
"net/http"
"github.com/jmoiron/sqlx"
_ "github.com/mattn/go-sqlite3"
)
type sqlBinaryFile struct {
Data []byte `db:"data"`
}
type sqlHandler struct {
Db *sqlx.DB
}
func (h *sqlHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
id := r.PathValue("filename")
var sqlFile sqlBinaryFile
h.Db.Get(&sqlFile, "SELECT data FROM files WHERE id = ?", id)
w.Write(sqlFile.Data)
}
func main() {
db := sqlx.MustOpen("sqlite3", "files.db")
sql := &sqlHandler{Db: db}
fs := http.StripPrefix("/dir/", http.FileServer(http.Dir("./static")))
http.Handle("/dir/", fs)
http.Handle("/sqlite/{filename}", sql)
log.Print("Listening on :3000...")
err := http.ListenAndServe(":3000", nil)
if err != nil {
log.Fatal(err)
}
}
As a test set I created 2000 files between 200kb and 4MB in size using a simple python script:
import os
import random
for i in range(2000):
os.system(f"head -c {(random.randint(200, 4000))}K </dev/urandom > static/{i:05d}.bin")
The SQLite database was created with this script:
import os
import sqlite3
os.remove("files.db")
con = sqlite3.connect("files.db")
cur = con.cursor()
cur.execute("CREATE TABLE files( id VARCHAR(255) PRIMARY KEY, data BLOB NOT NULL )")
for f in os.listdir("static"):
print(f)
data = open("static/" + f, "rb").read()
cur.execute("INSERT INTO files(id, data) VALUES (?, ?)", (f, data))
con.commit()
To benchmark the server I created two files listing all file URLs (one for sqlite, one fot filesystem) and used siege to run the benchmark with this configuration.
siege -f urls_sqlite.txt -c 1 -b --time=10s -j
The test was executed on my laptop:
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
CPU max MHz: 4600,0000
Memory: 16 GB
I ran the test with different concurrency and plotted the results:
For a low throughput system (such as my blog) the difference between SQLite and the filesystem is small enough to not care about.
The possible throughput (transaction/second) of the filesystem is ~2.3 times higher.
The response time grows slower with increased concurrency.
For the time being I will stick with my SQLite solution.
Once my blog gets really popular I can easily change the implementation of the binary repository.