Routing (COS 598A)Today: Root-Cause Analysis Jennifer Rexford /~jrex/teaching/spring2005 Tuesdays/Thursdays 11:00am-12:20pm Outline Network troubleshooting Motivation work troubleshooting Investigating from the edge vs. inside Active probing Traceroute Mapping IP addresses to AS numbers Passive monitoring Analyzing BGP update streams Identifying location and cause of routing change Limitations of the approach Network Troubleshooting “Why can’t I reach ?” “Why is the performance bad?” Reachability Problems: What Could be Wrong? End-host problem Web server down DNS server down, or misconfigured Forwarding-path problem Packet filter or firewall restricting access Mismatch in Maximum Transmission Unit (MTU) Routing problem User or server disconnected from Blackhole dropping all packets Persistent loop Performance Problem: What Could be Wrong? End-host problems Overloaded Web server Overloaded DNS server Overloaded user machine Forwarding-path problem High round-trip time Link congestion Routing problem Long-term routing instability Transient disruption during convergence Motivation for Troubleshooting Improving performance Detect, diagnose, and fix the problem Pick a path through another provider Pick a different path in any work Establishing accountability Enforce Service Level Agreements Rate service providers Characterizing the Understand causes of performance problems Understand challenges of troubleshooting Troubleshooting Outside vs. Inside Outside: work edge Who: users and researchers, and operators troubleshooting problems outside work Data: ping/traceroute, public feeds of BGP updates, and public measurement platforms Challenges: inference from very limited data Inside: from inside work Who: operators running work Data: SNMP, fault data, traffic measurement, route monitors, and router configuration files Challenges: collecting and joining the data Today Active Probing Pros and Cons of Active Probing Advantages Can run from any en