I’m off to New York in a couple of weeks to present a paper at PAM (which I mentioned here, though sadly the flashy automated demo I was hoping to build was a bit optimistic). The question: “is it safe to turn on ECN on client machines by default, completing the end to end deployment of a simple fifteen year old protocol to give us a better way to signal network congestion than simply dropping packets on the floor?” The answer is: “define safe.” Our key findings:
The issues identified in of part one of this post led to yet another search for solutions to the problem of making (especially passive) measurement repeatable. Of course, this has been done before, but I took as an initial principle that the social aspects of the problem must be solved socially, and worked from there. What emerged was a set of requirements and an architecture for a computing environment and set of associated administrative processes which allows analysis of network traffic data while minimizing risk to the privacy of the network’s end users as well as ensuring spatial and temporal repeatability of the experiment. For lack of a better name I decided to call an instance of a collection of data using this architecture an analysis vault.
The key principle behind this architecture is if data can be open, it should be; if not, then everything else must be.
Part one of this post painted a somewhat bleak picture of the state of Internet measurement as a science. The dreariness will continue later this month in part two. And yet there seems to be quite a lot of measuring the Internet going on. It can’t all be that bad, can it?
I spent quite a lot of time in 2014 thinking about the following problem: if I hand you a paper that claims something about the Internet, based on data I cannot show you because I am bound by a nondisclosure agreement due to corporate confidentiality or user privacy issues, generated by code which is ostensibly available under an open-source license but which is neither intended to run outside my environment, nor tested to ensure it will produce correct results in all cases, nor maintained to ensure it is compatible with newer versions of the compiler, interpreter, or libraries it requires, what reason have I given you to believe what I say?