Link:
Link: https://orgmode.org/Want to check this out, but don't know if I'm willing to learn Emacs for this.
Personal Blog about anything - mostly programming, cooking and random thoughts
Want to check this out, but don't know if I'm willing to learn Emacs for this.
Extended the editor to allow replies. Still have to add a scraper to fill in the title of the referenced page automatically.
I'm taking part in Advent of Code 2022 and post my solutions on GitHub.
This is my first article written with an editor on my website!
Up until now I wrote all my articles either directly on the server or on my PC. For this I use a convoluted system, partially working on the server, partially on my PC with git and ssh in between. As this process is tedious it keeps me from writing short updates.
The online editor is still every limited but I will extend it as needed. Additionally I've adjusted some of the internal structure of the blog to allow for different types of post. This will allow me to also write short notes, which will not show up in the main feed.
I've published my graph embedding library graph_force on GitHub and PyPi. I wrote about the process of building this a few days ago.
My own algorithm turned out to be quiet slow when compared to networkx. For this reason I also reimplemented the networkx algorithm, but with multithreading support. The most important feature for me was the ability to load the graph from a binary file. While networkx used to much data while ingesting the graph data, I can effortlessly write it to file and load it in graph_force.
At the moment this library fulfils my needs, but with publishing I commit to maintaining it. Maybe this is useful to someone else.
Creative usage of web assembly as a "universal" binary, running on every machine. I'm currently using both Rust and Go. Will keeps this in mind if I ever want to combine them :)
I just realised I never wrote about this small project I created back in 2020. The show Community had a bunch of Twitter account for the characters on the show. They wrote tweets in characters every now and then, which added some additional character interactions to the show. A few times the twitter interactions were also referenced in the show.
I've scraped the tweets of all (known) accounts and created a website to browse all community tweets. I tried to assign the tweets to episodes based on the airing dates. With the current acquisition of Twitter by Musk and the following "turbulences" it might be a good idea to check the project again to see if more data can be preserved. It would be a shame if this piece of my favorite show disappeared forever.
The source and prepared datasets can be found on GitHub.
The show also add a homepage for the fictional community college. Sadly the website is no longer available but it's in the internet archive. Sadly the videos of the A/V Department did not survive.
After scraping "all" Mastodon instances, I wanted to visualize the graph of instances. My expectation is that this is a (quiet dense) social graph. To bring order in such a graph a Force-directed graph model can be used. I previously used the networkx implementation for this. However the peers graph I currently have is too large, with 24007 nodes and over 80 million edges. When trying to import this into networkx I simply run out of memory (16GB). After asking for advise on Mastodon, I tried out gephi, which also ran out of memory before loading the entire graph.
Because I still want to visualize this graph, I've decided to write my on graph layouting program. For the first implementation I followed the slides of a lecture at KIT. This gave me some janky but promising results, as I was able to load my graph and an iteration only required ~10 seconds. To validate my implementation I created a debug graph, consisting of 2000 nodes with 4 clusters of different sizes.
After this first implementation I took pen and paper and thought about the problem a bit. This lead to an improved version, with a simpler model leading to faster execution times and quicker convergence.
Embedding the mastodon instances graph, is still challenging. The algorithm creates oscillation in the graph, which I suspect are introduced by one (or multiple) large cliques. I will post an update soon.
Update:
Bonus image of the florentine families graph:
Over the weekend I've decided to explore the Fediverse a bit, especially Mastodon. As the network is decentralized, my first step was to create a list of "instances" (servers running mastodon). Luckily mastodon has an API endpoint from which you get a list of peers for an instance. Using this API I was able to start with a single instance and find over 89503 mastodon servers (only ~24000 of these also worked and exposed their peers).
For my first steps I used requests. As this was too slow, with many servers not responding, I switched to aiohttp to run multiple concurrent requests.
I used a many loop which started new request tasks and waited for these tasks to finish. Whenever a request task finished I wrote the result in an SQLite database for later analyses and started another request task. This achieved a good throughput and crawled the "entire" mastodon world in a few hours.
I might add a cleaned up version of the script in the future.
asyncio
and aiohttp
in python.asyncio.wait
returns a done
and pending
. I used this to process done requests and afterwards replaced my tasks list with the pending
return value.return result, success
) instead of allowing raised errors in the the async task.I just finished implementing IndieAuth, which allows me to login to sites and tools, which support this, using my blog. The main reason I've implemented this is because I want to use Micropub to write and publish. At the moment creating an new entry is a bit cumbersome. It's ok for longer posts but it prevents me from doing things like bookmarks or check ins.
If you want to implement IndieAuth yourself I recommend using the living standard document directly. I've started using the wiki page as reference and later realised that it was outdated (the outdated content has been removed in the meantime). IndieAuth is basically OAuth2 with a few additions and conventions. If you are already familiar with OAuth2 you probably not have many problems implementing this. My (first) implementation can be found in this pull request.
Before I continue with Micropub, I have to do some refactoring as the code is still a bit rough around the edges.
Things I learned while implementing IndieAuth: