Abstract
Raha is the first general tool that can analyze probable degradation of traffic engineered networks under arbitrary failures and traffic shifts to prevent outages. Raha addresses a significant gap in prior work which consider only (1) $\le k$ failures; (2) specific traffic engineering schemes; and (3) the maximum impact of failures irrespective of the network design point.
Our insight is to formulate the problem in terms of heuristic analysis, where one seeks to maximize the performance gap between the network design point (i.e., the network with no failures) and the network under failures. We invent techniques that allow us to exploit the mechanisms within these tools to encode the problem into components which can handle them. We present extensive experiments on Microsoft’s production network and those of Topology Zoo that demonstrate Raha is scalable and can effectively solve the problem. We use Raha to propose capacity augments that allow operators to mitigate potential problems and avoid future outages. Our results show Raha can find $\ge 2\times$ higher degradations compared to those tools that only consider up to $2$ failures.
BibTeX Citation
Coming soon!