I recently added hashtag support to owl-blogs. The initial reason for this was to to make post more discoverable via ActivityPub, but I found it helpful to further categories posts.
The implementation was quiet simple. I use the markdown renderer goldmark which has a plugin for hashtags.
As tags are also part of microformats2, I wanted to mark the hashtags accordingly.
This is currently not possible with the hashtag extension.
I've extended this to allow adding arbitrary attributes to the link tag (Related Pull Request).
Until this is merged into the main repository I'll use my own version, which can be done by adding a replace directive to the go.mod
A nice and simple solution to add comments to your blog.
I've decided to go the bit more complicated route and added #ActivityPub support to my blog directly. Any interaction will show up below the posts. However this requires a backend and will not work with a static site generator.
Thumbnails use the same file format as their parent files assuming the user already chose the best format for their images. The thumbnails are created with a width of 620px, equal to the content width of the blogs main body. If the image is already small enough the image data is simply copied.
The URL of a thumbnail can simply be generated by replacing /media/ with /thumbnail/.
I will still write some tests and see if any errors occur on my blog before merging this feature into the main branch.
My blog software (owl-blogs) uses a single SQLite database to store everything, including all files uploaded. I'm aware that storing large files in a relational database isn't best practice. It started out as a placeholder implementation, but I liked the idea to have a single file I can backup.
One reason against storing binary blobs in relational databases often stated is read performance, but I didn't find any benchmarks supporting this claim. Therefore I built a small test setup to see the difference between serving binary files out of a SQLite database vs serving from the file system directly.
As my blog is written in Go, I created the a simple server similar to my blog. It uses sqlx and go-sqlite3 for the database handling and net/http for the static file server
package main
import (
"log"
"net/http"
"github.com/jmoiron/sqlx"
_ "github.com/mattn/go-sqlite3"
)
type sqlBinaryFile struct {
Data []byte `db:"data"`
}
type sqlHandler struct {
Db *sqlx.DB
}
func (h *sqlHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
id := r.PathValue("filename")
var sqlFile sqlBinaryFile
h.Db.Get(&sqlFile, "SELECT data FROM files WHERE id = ?", id)
w.Write(sqlFile.Data)
}
func main() {
db := sqlx.MustOpen("sqlite3", "files.db")
sql := &sqlHandler{Db: db}
fs := http.StripPrefix("/dir/", http.FileServer(http.Dir("./static")))
http.Handle("/dir/", fs)
http.Handle("/sqlite/{filename}", sql)
log.Print("Listening on :3000...")
err := http.ListenAndServe(":3000", nil)
if err != nil {
log.Fatal(err)
}
}
As a test set I created 2000 files between 200kb and 4MB in size using a simple python script:
import os
import random
for i in range(2000):
os.system(f"head -c {(random.randint(200, 4000))}K </dev/urandom > static/{i:05d}.bin")
The SQLite database was created with this script:
import os
import sqlite3
os.remove("files.db")
con = sqlite3.connect("files.db")
cur = con.cursor()
cur.execute("CREATE TABLE files( id VARCHAR(255) PRIMARY KEY, data BLOB NOT NULL )")
for f in os.listdir("static"):
print(f)
data = open("static/" + f, "rb").read()
cur.execute("INSERT INTO files(id, data) VALUES (?, ?)", (f, data))
con.commit()
To benchmark the server I created two files listing all file URLs (one for sqlite, one fot filesystem) and used siege to run the benchmark with this configuration.
siege -f urls_sqlite.txt -c 1 -b --time=10s -j
The test was executed on my laptop:
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
CPU max MHz: 4600,0000
Memory: 16 GB
I ran the test with different concurrency and plotted the results:
For a low throughput system (such as my blog) the difference between SQLite and the filesystem is small enough to not care about.
The possible throughput (transaction/second) of the filesystem is ~2.3 times higher.
The response time grows slower with increased concurrency.
For the time being I will stick with my SQLite solution.
Once my blog gets really popular I can easily change the implementation of the binary repository.