My own algorithm turned out to be quiet slow when compared to networkx.
For this reason I also reimplemented the networkx algorithm, but with multithreading support.
The most important feature for me was the ability to load the graph from a binary file.
While networkx used to much data while ingesting the graph data, I can effortlessly write it to file and load it in graph_force.
At the moment this library fulfils my needs, but with publishing I commit to maintaining it.
Maybe this is useful to someone else.
Creative usage of web assembly as a "universal" binary, running on every machine.
I'm currently using both Rust and Go. Will keeps this in mind if I ever want to combine them :)
I just realised I never wrote about this small project I created back in 2020.
The show Community had a bunch of Twitter account for the characters on the show.
They wrote tweets in characters every now and then, which added some additional character interactions to the show.
A few times the twitter interactions were also referenced in the show.
I've scraped the tweets of all (known) accounts and created a website to browse all community tweets.
I tried to assign the tweets to episodes based on the airing dates.
With the current acquisition of Twitter by Musk and the following "turbulences" it might be a good idea to check the project again to see if more data can be preserved.
It would be a shame if this piece of my favorite show disappeared forever.
The show also add a homepage for the fictional community college.
Sadly the website is no longer available but it's in the internet archive.
Sadly the videos of the A/V Department did not survive.
After scraping "all" Mastodon instances, I wanted to visualize the graph of instances.
My expectation is that this is a (quiet dense) social graph.
To bring order in such a graph a Force-directed graph model can be used.
I previously used the networkx implementation for this.
However the peers graph I currently have is too large, with 24007 nodes and over 80 million edges.
When trying to import this into networkx I simply run out of memory (16GB).
After asking for advise on Mastodon, I tried out gephi, which also ran out of memory before loading the entire graph.
Because I still want to visualize this graph, I've decided to write my on graph layouting program.
For the first implementation I followed the slides of a lecture at KIT.
This gave me some janky but promising results, as I was able to load my graph and an iteration only required ~10 seconds.
To validate my implementation I created a debug graph, consisting of 2000 nodes with 4 clusters of different sizes.
After this first implementation I took pen and paper and thought about the problem a bit.
This lead to an improved version, with a simpler model leading to faster execution times and quicker convergence.
Embedding the mastodon instances graph, is still challenging.
The algorithm creates oscillation in the graph, which I suspect are introduced by one (or multiple) large cliques.
I will post an update soon.
Over the weekend I've decided to explore the Fediverse a bit, especially Mastodon.
As the network is decentralized, my first step was to create a list of "instances" (servers running mastodon).
Luckily mastodon has an API endpoint from which you get a list of peers for an instance.
Using this API I was able to start with a single instance and find over 89503 mastodon servers (only ~24000 of these also worked and exposed their peers).
For my first steps I used requests.
As this was too slow, with many servers not responding, I switched to aiohttp to run multiple concurrent requests.
I used a many loop which started new request tasks and waited for these tasks to finish.
Whenever a request task finished I wrote the result in an SQLite database for later analyses and started another request task.
This achieved a good throughput and crawled the "entire" mastodon world in a few hours.
I might add a cleaned up version of the script in the future.
Things I learned:
first time seriously working with asyncio and aiohttp in python.
asyncio.wait returns a done and pending. I used this to process done requests and afterwards replaced my tasks list with the pending return value.
sqlite does not work well with asyncio. This is why stored the results in the main loop.
I found it easiest to catch all error in my request function and return a tuple with an success indicator (return result, success) instead of allowing raised errors in the the async task.
I just finished implementing IndieAuth, which allows me to login to sites and tools, which support this, using my blog.
The main reason I've implemented this is because I want to use Micropub to write and publish.
At the moment creating an new entry is a bit cumbersome.
It's ok for longer posts but it prevents me from doing things like bookmarks or check ins.
If you want to implement IndieAuth yourself I recommend using the living standard document directly.
I've started using the wiki page as reference and later realised that it was outdated (the outdated content has been removed in the meantime).
IndieAuth is basically OAuth2 with a few additions and conventions. If you are already familiar with OAuth2 you probably not have many problems implementing this.
My (first) implementation can be found in this pull request.
Before I continue with Micropub, I have to do some refactoring as the code is still a bit rough around the edges.
The books play in a future version of the solar system where virtually everything is turned into smart matter and the boundaries between "reality" and the virtual world are blurred to the point where they almost no longer matter. The series tells the story of "Jean le Flambeur" and "Mieli", who travel the solar system to fulfil their duties to a Goddess. The journey takes them to Mars, Earth and Saturn.