The Most Useless Dataset

The prompt was simple: create an open-source data set and publish it to GitHub.

In an effort to make the world a less serious place the dataset that I’ve tried to create is an entirely useless one: it is a dataset of the people who have viewed the dataset.

That was the intention, at least. In it’s current state it’s barely a proof-of-concept, although I think I have carved most of the foundations for the project out at this stage.

The code lives here, and the (currently broken) resulting website lives here.

This project is a total experiment in the workings of git, GitHub, and servers. I’m running a node server on Heroku, which uses a GitHub repo of your server files to build your server.

The server is also running a node-based wrapper for git, so it sees itself as a git repo, a branch of the original. The server fetches data from the GitHub repo that it is built off (the original) and updates its local files based on that, while on the client side the raw.txt data is rendered on screen.

Finally, once a user navigates to the site, the server makes a change to its local raw.txt based on user information, commits it to its local branch, and pushes that to the remote branch (main branch/origin/master?). I have hidden my credentials in the environment variables of the server to allow  for authorised pushing.


The problem is it doesn’t really work that well. Partly because the way I’m updating files is a bit slow, and there seem to be chunks of data missing out, but mostly because I think something is going wrong with Heroku, node.js, git, and their interactions. The process works fine when I run it locally, but once I push to Heroku, although I receive no errors or crashes, the GitHub files do not update.

I’ve learned a lot about navigating git in the command line, so I don’t consider this a complete failure, but I’m a little disappointed that the final step in the process was the one that broke everything…