Example papers I worked on:
[ESA1], [CPM1], [SEA1].
During that time I learned a lot about programming and computer architecture, but what I aim to share in this blog is some experience on running the experiments. I will mostly focus on the less known issues that arise often. The idea is to be brief, bring attention to some specific technique or issue, share some scripts or code snippets, and give links for further reading. Most importantly, however, I want to highlight where in the process of implementing/running the code this specific issue arises, how to avoid it, or take advantage of it, without dwelling on the technical details.
I chose to cover these topics because there is quite a lot to say and yet the materials on this topic are very scattered over the internet. In addition, I hope this will provide some forum for discussion and I will learn something in the process as well. Theory blogs seem to be covered quite well these days but it seems to me there is comparatively little discussion on experimental algorithmics.
Examples of things I would like to cover are: out of order execution (= reducing the cost of cache misses), page cache (i.e., why running experiments involving streaming/EM algorithms requires extra care), restricting the amount of RAM using GRUB options (and how to deal with it, when altering GRUB settings is not in your power), controlling the number of threads (commands: numactl and taskset), dealing with NUMA systems (i.e., how to get the same time measurement every time), measuring peak RAM allocation, measuring peak disk space allocation, cache-aligned memory allocation, managing data structures with complicated allocation patterns, asynchronous I/O (+ conditional variables), and a bit about choosing the right buffer size when doing sequential disk scans.
The material comes from the experience of writing code and running experiments in C++ on Linux/PC platform, but I think a lot of this transfers well to Mac/OSX. In many cases, the snippets I share can also be found inside the implementations of my code that can be found on my Github profile (most of the code I've written it still, however, available on the PADS webpage. I am in the process of updating them and migrating to github). For more updates on current activities, check out my Twitter page.
The material comes from the experience of writing code and running experiments in C++ on Linux/PC platform, but I think a lot of this transfers well to Mac/OSX. In many cases, the snippets I share can also be found inside the implementations of my code that can be found on my Github profile (most of the code I've written it still, however, available on the PADS webpage. I am in the process of updating them and migrating to github). For more updates on current activities, check out my Twitter page.