Rogue-Bench
After a couple months of work, she's finally done. Rogue-Bench is my contribution to the already overflowing list of LLM benchmarks.
This is an extension of my previous work here. So check that out first for some context.
After a couple months of work, she's finally done. Rogue-Bench is my contribution to the already overflowing list of LLM benchmarks.
This is an extension of my previous work here. So check that out first for some context.
Ever wonder if LLMs can play Rogue? I sure did!
It seems like a match made in heaven. Fully text based game meets text enjoying model. In this post, I detail a little bit about putting together a little package to make this possible.
Don't get your hope up though. I don't do any long running benchmarks.
Check out the code for this post here.
I'm a fan of the Ibis framework. To me, dataFrame APIs feel great and leaving pure Python to write a SQL query in a nasty triple-quoted string does not.
In this post, I motivate and give a proof of concept for an Ibis "compatibility checker" where, given an Ibis expression, the checker will tell you which backends it can be run on.
This post isn't meant to sell Ibis, just motivate why I think it could be useful for a specific set of people.
Code for this post can be found here.
This post is an exploration into the Library of Babel. Specifically, trying to get an idea of how many readable pages of books there are. I used "runnable Python program" as a proxy for readability.
In this post, there are slight spoilers for the novella A Short Stay in Hell by Steven L. Peck. So, buyer beware.
Code for this post can be found here.