The issues identified in of part one of this post led to yet another search for solutions to the problem of making (especially passive) measurement repeatable. Of course, this has been done before, but I took as an initial principle that the social aspects of the problem must be solved socially, and worked from there. What emerged was a set of requirements and an architecture for a computing environment and set of associated administrative processes which allows analysis of network traffic data while minimizing risk to the privacy of the network’s end users as well as ensuring spatial and temporal repeatability of the experiment. For lack of a better name I decided to call an instance of a collection of data using this architecture an analysis vault.
The key principle behind this architecture is if data can be open, it should be; if not, then everything else must be.
I spent quite a lot of time in 2014 thinking about the following problem: if I hand you a paper that claims something about the Internet, based on data I cannot show you because I am bound by a nondisclosure agreement due to corporate confidentiality or user privacy issues, generated by code which is ostensibly available under an open-source license but which is neither intended to run outside my environment, nor tested to ensure it will produce correct results in all cases, nor maintained to ensure it is compatible with newer versions of the compiler, interpreter, or libraries it requires, what reason have I given you to believe what I say?
I’ve been reading Tom Standage’s “Writing on the Wall” of late, which I can heartily recommend. It’s less subtle than “The Victorian Internet”, which counts among my favorite books of all time, but that was written before Twitter, and Twitter’s made us all less subtle, I think. What strikes me about his new book is not his thesis — that the “social media revolution” is nothing really new, just the application of new technology to our apparently instinctive love of gossip — but how well it illustrates that much of the present public policy debate over new media technology is very, very old.
The QoF TCP-performance-aware IPFIX flow meter I’ve been working on, on and off, for about a year, now seems to produce halfway plausible results and hardly crashes at all anymore, which means it’s time to follow the path of real artists immemorial and ship it already: see here, or if you’re really serious about it, just track master on github.