Libove Blog

Personal Blog about anything - mostly programming, cooking and random thoughts


Link:Situational Awareness - The Decade Ahead

Link: https://situational-awareness.ai/

Thesis arguing that #AGI will be reached in the next decade. Have not yet read the full text.

Problems I see with the argumentation so far:

The Data Wall

The data wall is an (hard) open problem. Larger models either will need massively more data or be orders of magnitude more efficient with the given data.

More data is not available at that scale. The latest models are already trained on virtually the entire internet. They argue that a strategy similar to AlphaGo, where the model created it's own training data by playing against itself, could be deploy for #GenAI. I find this implausible as generating intelligent text beneficial in trying already requires a more capable AI.

Similarily being more efficient with data is still an open problem, but I don't know enough about this to evaluate how likely this is to happen.

Additionally, going forward, any new datasets will be poisoned by AI output as they are already used on massive scale to create "content". Research suggests that training on these data degrades the performance of models.

Large Scale Computing

Even is the data wall is broken scaling the models will need massive computing power consuming ever more resources and energy. This will only be tolerated by the general public as long as this is overall beneficial to them. There are already industries (mainly creative ones) being massively disrupted by the current generation of GenAI. Philosophy Tube in Here's What Ethical AI Really Means, shows how strikes and collective action can be tools to prevent the development of more powerful AI systems.

#AI



Thumbnails for owl-blogs

I'm currently working on the thumbnail implementation for #owl-blogs and deployed the feature branch to my blog.

Thumbnails use the same file format as their parent files assuming the user already chose the best format for their images. The thumbnails are created with a width of 620px, equal to the content width of the blogs main body. If the image is already small enough the image data is simply copied.

The URL of a thumbnail can simply be generated by replacing /media/ with /thumbnail/.

I will still write some tests and see if any errors occur on my blog before merging this feature into the main branch.

#dev #blog


Serving Binary Files from SQLite

My blog software (owl-blogs) uses a single SQLite database to store everything, including all files uploaded. I'm aware that storing large files in a relational database isn't best practice. It started out as a placeholder implementation, but I liked the idea to have a single file I can backup.

One reason against storing binary blobs in relational databases often stated is read performance, but I didn't find any benchmarks supporting this claim. Therefore I built a small test setup to see the difference between serving binary files out of a SQLite database vs serving from the file system directly.

As my blog is written in Go, I created the a simple server similar to my blog. It uses sqlx and go-sqlite3 for the database handling and net/http for the static file server

package main

import (
	"log"
	"net/http"

	"github.com/jmoiron/sqlx"
	_ "github.com/mattn/go-sqlite3"
)

type sqlBinaryFile struct {
	Data []byte `db:"data"`
}

type sqlHandler struct {
	Db *sqlx.DB
}

func (h *sqlHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	id := r.PathValue("filename")
	var sqlFile sqlBinaryFile
	h.Db.Get(&sqlFile, "SELECT data FROM files WHERE id = ?", id)
	w.Write(sqlFile.Data)
}

func main() {

	db := sqlx.MustOpen("sqlite3", "files.db")
	sql := &sqlHandler{Db: db}

	fs := http.StripPrefix("/dir/", http.FileServer(http.Dir("./static")))
	http.Handle("/dir/", fs)
	http.Handle("/sqlite/{filename}", sql)

	log.Print("Listening on :3000...")
	err := http.ListenAndServe(":3000", nil)
	if err != nil {
		log.Fatal(err)
	}
}

As a test set I created 2000 files between 200kb and 4MB in size using a simple python script:

import os
import random

for i in range(2000):
    os.system(f"head -c {(random.randint(200, 4000))}K </dev/urandom > static/{i:05d}.bin")

The SQLite database was created with this script:

import os
import sqlite3

os.remove("files.db")
con = sqlite3.connect("files.db")
cur = con.cursor()

cur.execute("CREATE TABLE files( id VARCHAR(255) PRIMARY KEY, data BLOB NOT NULL )")

for f in os.listdir("static"):
    print(f)
    data = open("static/" + f, "rb").read()
    cur.execute("INSERT INTO files(id, data) VALUES (?, ?)", (f, data))

con.commit()

To benchmark the server I created two files listing all file URLs (one for sqlite, one fot filesystem) and used siege to run the benchmark with this configuration.

siege -f urls_sqlite.txt -c 1 -b --time=10s -j

The test was executed on my laptop:

  • CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  • CPU max MHz: 4600,0000
  • Memory: 16 GB

I ran the test with different concurrency and plotted the results:

Two plots comparing transactions and response time between SQLite and filesystem

For a low throughput system (such as my blog) the difference between SQLite and the filesystem is small enough to not care about. The possible throughput (transaction/second) of the filesystem is ~2.3 times higher. The response time grows slower with increased concurrency.

For the time being I will stick with my SQLite solution. Once my blog gets really popular I can easily change the implementation of the binary repository.

#hosting #sqlite #go #benchmark #server



Punk

Punk

Black and white profile picture of an alpaca. Its hair partially covers its eyes.

#photography