Libove Blog

Personal Blog about anything - mostly programming, cooking and random thoughts

owl-blogs 0.3.2

I've worked on some smaller features and improvements for owl-blogs.

Main Features:

  • Lists and Tags now have RSS Feeds.
  • Nicer 404 Pages

For development I took the time to setup the "end-to-end" tests using go test instead of the previous pytest setup. This vastly simplifies testing and its much quicker.

To test #ActivityPub functionality I use a small mock server, which content can be controlled during testing.

#owlblogs



Rust GTK: Button with Custom Icon with gtk-rs using Embedded Images

This guide is written fro gtk-rs and GTK 4.6

For my DungeonPlanner project I wanted to use custom icons for the tool buttons. Unfortunately the documentation for this is rather slim. As an additional constraint, the image data should be embedded in the binary as I like to compile and ship a single binary file (as long as this is possible).

Resulting tool bar in DungeonPlanner with custom icons

The nearest approach I could find was to create buttons with downloaded images. The snippet is for GTK3 so it had to be adjusted for GTK4, but it contained the right function names to know what to search for :).

This is the final code I ended up with:

let button = Button::new();
let bytes = include_bytes!("../../assets/icons/add_chamber.png");
let g_bytes = glib::Bytes::from(&bytes.to_vec());
let stream = MemoryInputStream::from_bytes(&g_bytes);
let pixbuf = Pixbuf::from_stream(&stream, Cancellable::NONE).unwrap();
let texture = Texture::for_pixbuf(&pixbuf);
let image = Image::from_paintable(Some(&texture));
button.set_child(Some(&image));

We start by creating a button. Instead of using the ButtonBuilder as you would normally do, I'm just creating an "empty" button. It should be possible to still use the builder, as we are just replacing the child content at the end.

let button = Button::new();

Next we need to load our image data. As I want my images to be embedded in the binary I use the include_bytes! macro. The raw bytes are then turned into a glib:Bytes struct and finally into a MemoryInputStream. The stream is needed to parse the image data.

let bytes = include_bytes!("../../assets/icons/add_chamber.png");
let g_bytes = glib::Bytes::from(&bytes.to_vec());
let stream = MemoryInputStream::from_bytes(&g_bytes);

The next goal is to create an Image object containing our embedded image. With GTK 4.6 we could still use Image::from_pixbuf but this will be deprecated in GTK 4.12. Instead we have to do an extra step and create a Texture and use Image::from_paintable. The texture can simply be created from a Pixbuf, which is created by using the Pixbuf::from_stream function

let pixbuf = Pixbuf::from_stream(&stream, Cancellable::NONE).unwrap();
let texture = Texture::for_pixbuf(&pixbuf);
let image = Image::from_paintable(Some(&texture));

Finally we can set the child of our button to the image and our icon button is done. The same approach also works for ToggleButton.

button.set_child(Some(&image));

#gtk #gtk4 #rust #gtkrs


Missed Shot at Artificial General Intelligence

The stated goal of OpenAI is "to ensure that artificial general intelligence benefits all of humanity". With the release of ChatGPT, they might have missed humanity’s only shot at creating an Artificial General Intelligence (AGI).

It's all about data

The performance ("intelligence") of large language models (LLMs) mainly depends on the scale of its training data and the size of the model [1]. To create a better LLM you need to increase its size, and train it on more data. The architecture and configuration of an LLM can almost be neglected in comparison.

However, the intelligence of a model does not grow linearly with its size [1]. There is a diminishing return on increasing the size of LLMs. If you increase the size of model and training set by 10x, you will only get an increase in performance as the previous 10x increase accomplished. This explains why vast amounts of data are required to effectively train large language model.

ChatGPT 3 was trained on 500 billion tokens [2]. There are no official numbers (that I could find) on how much data was used to train ChatGPT 4, but rumors in the AI community state that it was trained on 13 trillion tokens. With this numbers the performance step from 3 to 4 took an 26x increase in data.

The estimation for the size of publicly available, de duplicated data is 320 trillion tokens [4]. This is 24.6x more data than ChatGPT4 was (likely) trained on. If these numbers are correct, the performance of LLMs will only increase as much as it increased from ChatGPT3 to ChatGPT4 before we ran out of data.

I doubt that this will be enough to reach AGI level intelligence.

Poisoned Data

Now you might say "we produce more data every day, models can get better in the future". We could just wait some decades, train new models from time to time and see a gradual increase in performance. And one day we suddenly have an AGI. But the release of ChatGPT created a problem. It poisoned any data collected after November 30, 2022.

The release of ChatGPT was the wet dream of every spammer, bot operator, troll and wannabe influencer. Suddenly you could create seemingly high quality content, indistinguishable from human written text, at virtually zero cost. Ever since the internet gets filled with LLM generated content. And this is a problem for all future trainings.

Models that are trained on their own generations (or data created by other models) start to forget, there performance declines [3]. Therefore you have to avoid training on AI generated content, otherwise the increase in data may decrease your performance. As it is virtually impossible to clean a dataset from AI generated text, any data collected after Nov. 2022 should be avoided. Maybe you can still use a few more months or years of data, but at some point more data will hurt the model more than it helps.

The publicly available data that we got now is all we will get to train an AGI. If we need more data it will have to collected at an exorbitant price to ensure it's not poisoned by AI generated data.

We reached a dead end

The size of current training sets and potentially available data shows that we will not reach AGI levels with the current state of the art approaches. Due to data poisoning we will not get substantially more usable data. Thereby AGI is (currently) not achievable. Opening pandora's box of accessible generative AI may killed our chance of creating an artificial general intelligence.

If we want to build an AGI we will have to do it with the data we have now.

#AI #AGI #LLM #GenAI


Szechuan Pfeffer Udon Nudeln

Servings: 1 Portion , Prep Time:

Ingredients

  • 100 g Udon Nudeln
  • Handvoll Sojaschnetzel
  • 1 kleine Zucchini
  • 1 kleine Karotte
  • 1 EL Szechuan Pfeffer
  • 1 Knoblauchzehe
  • 1 cm Ingwer
  • 1 EL schwarze Bohnenpaste
  • 1 EL Sojasauce
  • 1 EL Agavendicksaft
  • 200 ml Wasser

Instructions

Dieses Rezept war ein bisschen frei Schnauze, ist aber wirklich gut geworden. Die Mengen sind im nachhinein geschätzt, also abschmecken und gegebenfalls anpassen.

Szechuan Pfeffer Udon Nudeln

  • Sojaschnetzeln kochen
  • Knoblauch und Ingwer fein hacken
  • Alle Gewürze und Wasser mischen
  • Karotten und Zucchini in dünne Scheiben schneiden
  • Nudeln nach Packung kochen
  • Sojaschnetzel anbraten
  • Zucchini und Karotten für 2 Minuten mitanbraten
  • Mit Gewürzmischung ablöschen
  • Sauce reduzieren und Nudeln unterheben

#vegan #nudeln #udon #zucchini #karotten #szechuan #szechuanpfeffer


Random Tables for DnD 5e

List of random tables I've created to quickly create shops or treasures when DMing.

Spells

Magic Items

Armor

Rings

Rods

Staves

Weapons

Wondrous Items

Generators

Gamemaster Guide Tables

#DnD #5e #random #generator #tabletop


Nudeln mit Erbsen

Servings: 3 Portionen , Prep Time:

Ingredients

  • 250g Nudeln
  • 200g TK Erbsen
  • 200ml vegane Sahne
  • 1 Zwiebel
  • 1 Knoblauchzehe
  • 50g Hefeflocken
  • 1 EL Olivenöl
  • 1 EL Tomatenmark
  • 1 TL Paprikapulver
  • 1 TL Oregano
  • Salz und Pfeffer

Instructions

Ein einfaches und schnelles Nudelgericht ideal für Kleinkinder. Schmeckt gut mit veganem Parmesan.

  • Zwiebel würfeln und Knoblauch fein hacken
  • Nudeln kochen
  • Zwiebeln und Knoblauch in Olivenöl glasig anbraten
  • Tomatenmark für 2-3 Minuten mitanbraten
  • Erbsen, Hefeflocken und Gewürze hinzugeben, kurz mit anbraten
  • Sahne und 100ml Wasser hinzugeben und gut verrühren
  • Sauce unter gelegentlichem Rühren aufkochen
  • Nudeln abgießen und alles gut durchmischen
  • Mit Salz und Pfeffer abschmecken

#vegan #rezept #nudeln #erbsen


Veganer Pesto Nudelsalat mit Rucola

Servings: 4 Portionen , Prep Time:

Ingredients

  • 500g Fusilli Nudeln
  • 250g Cherry Tomaten
  • 125g Rucola
  • 1 Glas veganes grünes Pesto
  • 2 EL Agavendicksaft
  • 2 EL Senf, mittelscharf
  • 50g veganer Parmesan (optional)

Instructions

Die vegane Version meines Lieblings-Nudelsalats.

  • Nudeln kochen und abkühlen lassen oder mit kalten Wasser abkühlen
  • Rucola waschen und abtropfen lassen
  • Tomaten halbieren oder vierteln
  • Nudeln, Tomaten, Pesto, Agavendicksaft und Senf vermischen
  • Mit veganem Parmesan abschmecken (optional)
  • Rucola untermischen

#vegan #rezept #nudelsalat #einfach


Microformat Categories for Hashtags in owl-blogs

I recently added hashtag support to owl-blogs. The initial reason for this was to to make post more discoverable via ActivityPub, but I found it helpful to further categories posts. The implementation was quiet simple. I use the markdown renderer goldmark which has a plugin for hashtags.

As tags are also part of microformats2, I wanted to mark the hashtags accordingly. This is currently not possible with the hashtag extension.

I've extended this to allow adding arbitrary attributes to the link tag (Related Pull Request). Until this is merged into the main repository I'll use my own version, which can be done by adding a replace directive to the go.mod

replace go.abhg.dev/goldmark/hashtag => github.com/H4kor/goldmark-hashtag v0.0.0-20240619193802-bec327f5be38

#go #dev #owlblogs #markdown #indieweb