bioRxiv first said

When endlessly scrolling through my Twitter feed during lock-down I stumbled upon the bot New New York Times (@NYT_first_said). The idea of the bot is simple, effective and at times hilarious. Every time a word appears in the New York Times that has never been published there before, the bot tweets that word. The result is a jumble of funny made-up words, slang, onomatopoeias, and jokes. Here’s an example:

I instantly loved this idea for it’s simplicity and effectiveness. My first instinct was to steal it, and that is what I did. Scientists also often use words they made up themselves or are extremely niche so I thought it would be funny to apply this idea to bioRxiv. For those of you who don’t know: bioRxiv is the most popular pre-print server in the field of biological sciences which receives over 3.000 monthly submissions of research articles. Going through all these publications would be intractable, but the abstracts are easily accessible using a web crawler (I used biorxiv-retriever). I set up a Linux server and had it crawl all abstracts ever published on bioRxiv, extract the words, and build a library of unique words used in these abstracts. When selecting words I had to do some clean-up like removing interpunction, also some selection criteria had to be imposed. Biological scientists use a lot of acronyms of genes, proteins, brain regions, etc. These acronyms are not particularly funny so I excluded all words that have at least one capital letter, since acronyms are almost always written in capitals this got rid of them. I also excluded words with more than one hyphen to exclude words like “to-be-determined”. With this word library in hand, I set up a cron job on my server that runs a Python script every day which crawls the new abstracts of that day, cleans up the words, and checks them against the library. Because there are usually a lot of new words each day, the script randomly picks five to tweet with a variable delay of up to four hours between the tweets. And that’s it, it works! Here is the very first tweet it produced:

If you like this Twitter bot you can give it a follow on @bioRxiv_first and you can find all the code on my Github repository.