HE network problems?

Okay, I'll be the first… What's the story with Hurricane Electric today? It's been "browning out" for some time now.

20 Replies

You wouldn't happen to be in the North East, would you?

We've had two other reports from people on Bell Sympatico in Ontario. With info provided by one, I was able to determine the problem was a Telia link in Chicago. I've contacted the HE NOC and they confirmed that the Telia link in Equinix Chicago was saturated and they are working to upgrade the link capacity.

My node is totally unresponsive right now. Actually it comes and goes. Is this what you're seeing? How do you know it's HE? What kind of diagnostic are you using?

No, I'm in Texas, actually. Here's a traceroute to he.net.

# traceroute he.net
traceroute to he.net (216.218.186.2), 30 hops max, 52 byte packets
 1  192.168.1.1 (192.168.1.1)  0.690 ms  0.681 ms  0.886 ms
 2  10.33.128.1 (10.33.128.1)  7.265 ms  7.511 ms  7.544 ms
 3  gig2-1.austtxk-rtr2.austin.rr.com (66.68.0.53)  8.432 ms  8.755 ms  7.943 ms
 4  srp0-0.austtxrdc-rtr2.austin.rr.com (24.27.12.34)  8.657 ms  7.768 ms  7.789 ms
 5  pos10-3.austtxrdc-rtr4.texas.rr.com (66.68.1.106)  63.898 ms  65.673 ms  65.490 ms
 6  son0-1-3.hstqtxl3-rtr1.texas.rr.com (24.93.37.61)  65.750 ms  64.234 ms  63.387 ms
 7  pop1-hou-P0-2.atdn.net (66.185.133.153)  17.900 ms pop1-hou-P0-1.atdn.net (66.185.133.145)  19.072 ms pop1-hou-P0-0.atdn.net (66.185.133.157)  17.586 ms
 8  bb1-hou-P2-0.atdn.net (66.185.150.148)  64.928 ms  62.528 ms  76.737 ms
 9  bb1-atm-P7-0.atdn.net (66.185.152.184)  164.928 ms  276.444 ms  202.914 ms
10  pop2-atm-P0-0.atdn.net (66.185.147.209)  36.849 ms  37.650 ms  36.511 ms
11  Telia-atm.atdn.net (66.185.138.46)  87.784 ms  86.298 ms  83.688 ms
12  dls-bb1-link.telia.net (213.248.80.146)  104.869 ms  103.197 ms  105.358 ms
13  hurricane-113208-dls-bb1.c.telia.net (213.248.92.22)  106.121 ms  104.466 ms  103.746 ms
14  pos5-0.gsr12012.lax.he.net (66.160.184.5)  433.108 ms  379.999 ms  449.236 ms
15  * * *
16  * * *
17  * *                                                                                                  

Just noticed that you have "Location: Austin" in the forums. Could you provide the output of a traceroute from you to your Linode? Even better, if you have a Linux box local to you, the output of "mtr –report $yourlinodesaddress". mtr is available in the mtr-tiny package for Debian and Ubuntu users.

@miallen:

My node is totally unresponsive right now. Actually it comes and goes. Is this what you're seeing? How do you know it's HE? What kind of diagnostic are you using?

That's exactly what I'm seeing, yes; it comes and goes, and when it comes it's really slow (~500ms pings). Tracing that route to he.net makes me believe it's a network congestion issue at the datacenter.

````
$ mtr --report host18.linode.com
HOST: local Loss% Snt Last Avg Best Wrst StDev
1. 192.168.1.1 0.0% 10 0.6 0.8 0.6 1.8 0.4
2. 10.33.128.1 0.0% 10 7.5 16.7 7.4 89.5 25.6
3. gig2-1.austtxk-rtr2.austin.r 0.0% 10 7.9 8.2 7.6 9.9 0.7
4. srp0-0.austtxrdc-rtr2.austin 0.0% 10 8.2 9.6 8.0 12.7 2.0
5. pos10-3.austtxrdc-rtr4.texas 0.0% 10 65.1 64.7 63.4 66.4 1.0
6. son0-0-0.hstqtxl3-rtr1.texas 0.0% 10 105.8 70.9 63.5 105.8 12.9
7. pop1-hou-P0-1.atdn.net 0.0% 10 16.6 16.9 16.2 18.6 0.7
8. bb1-hou-P2-0.atdn.net 10.0% 10 64.7 65.7 63.8 72.3 2.6
9. bb1-atm-P7-0.atdn.net 20.0% 10 84.2 85.4 83.5 87.6 1.4
10. pop2-atm-P0-3.atdn.net 0.0% 10 37.0 38.7 36.8 44.7 2.8
11. Telia-atm.atdn.net 0.0% 10 85.0 87.3 83.5 100.0 5.1
12. dls-bb1-link.telia.net 0.0% 10 103.4 103.8 103.0 105.6 1.0
13. hurricane-113208-dls-bb1.c.t 0.0% 10 103.7 104.0 102.2 112.0 2.8
14. pos5-0.gsr12012.lax.he.net 10.0% 10 389.7 393.7 377.5 413.4 12.1
15. pos3-2.gsr12416.pao.he.net 0.0% 10 130.6 248.6 127.4 344.7 79.7
16. pos5-0.gsr12416.fmt.he.net 10.0% 10 355.2 250.3 154.4 381.4 87.9
17. pos8-0.gsr12012.fmt.he.net 10.0% 10 130.0 132.8 129.2 155.0 8.3
18. host18.fremont.linode.com 10.0% 10 130.2 121.2 78.9 139.2 24.0

````

Well now I'm getting consistent sub-50ms pings to my node. Maybe they've fixed it.

But don't let them think they got away without us noticing, Mike! :-)

Whoops, nope, unresponsive again. I'll not jump to such strange conclusions in the future.

@Xan:

No, I'm in Texas, actually. Here's a traceroute to he.net.

# traceroute he.net
traceroute to he.net (216.218.186.2), 30 hops max, 52 byte packets
 <snip>13  hurricane-113208-dls-bb1.c.telia.net (213.248.92.22)  106.121 ms  104.466 ms  103.746 ms
14  pos5-0.gsr12012.lax.he.net (66.160.184.5)  433.108 ms  379.999 ms  449.236 ms</snip> 

I had assumed hop 13 here was in Chicago based on the path of the traffic from the other customer in Ontario. It seems from your traceroute it is more likely that it is in Equinix, Los Angeles. At any rate, this is the same router that the Ontario customers were having issues with so it looks like your problems are related after all.

Here's the perspective of another server I have in Dallas:

# mtr --report host18.linode.com
HOST: layer0                      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1\. 241.160.92.64.reverse.layere  0.0%    10    0.5   2.3   0.5  16.7   5.1
  2\. 10.1.3.13                     0.0%    10    0.6   0.6   0.6   0.8   0.1
  3\. 216.39.69.53                  0.0%    10    0.5   0.6   0.4   0.8   0.1
  4\. bhr2-po-1.fortworthda1.savvi  0.0%    10    0.4   0.5   0.4   0.6   0.1
  5\. dcr2-so-3-2-0.dallas.savvis.  0.0%    10    1.6   1.6   1.4   1.7   0.1
  6\. dcr1-so-7-2-0.losangeles.sav  0.0%    10   34.1  34.1  33.9  34.9   0.3
  7\. bpr3-so-7-0-0.losangelesequi  0.0%    10   34.2  34.2  34.0  34.3   0.1
  8\. 208.174.196.54                0.0%    10   33.8  38.0  33.8  75.2  13.1
  9\. hurricane-108839-las-bb1.c.t  0.0%    10  359.7 342.5 318.3 359.8  15.3
 10\. ???                          100.0    10    0.0   0.0   0.0   0.0   0.0

````

mtr --report 66.220.1.142

HOST LOSS RCVD SENT BEST AVG WORST
??? 100% 0 16 0.00 0.00 0.00
bogus.union.nj.panjde.comcast.net 0% 16 16 11.37 13.35 22.42
po10-ur02.union.nj.panjde.comcast.net 0% 16 16 11.38 13.11 15.44
po10-ur01.jerseycity.nj.panjde.comcast.net 0% 16 16 12.01 13.50 17.35
po10-ur02.jerseycity.nj.panjde.comcast.net 0% 16 16 11.84 13.76 15.93
po10-ur01.narlington.nj.panjde.comcast.net 0% 16 16 12.75 14.32 17.82
po70-ar01.verona.nj.panjde.comcast.net 0% 16 16 13.13 24.51 178.02
ar01.plainfield.nj.panjde.comcast.net 0% 16 16 13.60 15.59 23.77
GE-2-0-cr01.plainfield.nj.core.comcast.net 0% 16 16 13.46 14.83 17.13
12.118.149.9 0% 16 16 14.38 16.95 23.59
tbr2-p014001.n54ny.ip.att.net 0% 16 16 15.51 17.22 19.74
12.123.0.93 0% 16 16 14.96 16.29 18.34
nyk-b2-link.telia.net 0% 16 16 14.29 17.00 30.07
nyk-bb2-pos1-1-0.telia.net 0% 16 16 14.57 15.81 17.08
chi-bb1-pos7-0-0-0.telia.net 0% 16 16 35.42 38.68 48.66
dls-bb1-pos6-0-0.telia.net 0% 16 16 56.44 61.10 108.59
hurricane-113208-dls-bb1.c.telia.net 0% 16 16 56.19 57.69 58.84
pos5-0.gsr12012.lax.he.net 0% 16 16 89.36 92.35 100.59
??? 100% 0 16 0.00 0.00 0.00
````

@miallen:

My node is totally unresponsive right now. Actually it comes and goes. Is this what you're seeing? How do you know it's HE? What kind of diagnostic are you using?

Traceroute and mtr output is the same as what one would expect with a saturated link. This can be caused by a link not having enough capacity to handle normal surge traffic at the busiest part of the day. Another frequent cause is an outage of one link causing saturation of another link that is still up due to the much greater than normal traffic. Upon finding this I contacted the HE NOC and they confirmed that the problem was indeed a saturated link and they were working with Telia to get it upgraded.

This is the trace from Coral Springs, Florida:

HOST: ?????                      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1\. router.local                  0.0%    10    1.1   1.1   0.9   1.6   0.3
  2\. 10.125.192.1                  0.0%    10    8.3  10.4   6.9  22.1   4.4
  3\. hsrp1-cs.myacc.net            0.0%    10    7.9  10.0   7.5  15.5   2.4
  4\. border5.3-5.advancecable-7.m  0.0%    10    9.5  13.1   9.5  29.4   6.7
  5\. core2.pc1.bbnet1.mia003.pnap  0.0%    10   16.2  11.8   9.6  16.5   2.6
  6\. 12.118.175.81                 0.0%    10    9.3  10.8   9.3  13.0   1.3
  7\. tbr2-p013603.ormfl.ip.att.ne  0.0%    10   26.8  29.3  26.7  38.1   3.5
  8\. tbr1-cl1474.attga.ip.att.net  0.0%    10   27.0  28.5  26.9  33.4   2.0
  9\. ggr2-ge00.attga.ip.att.net    0.0%    10   26.4  44.9  25.8 181.8  48.8
 10\. 192.205.33.42                 0.0%    10   28.7  30.7  25.3  55.8   9.4
 11\. dls-bb1-link.telia.net        0.0%    10   54.4  53.9  47.3  91.1  13.3
 12\. hurricane-113208-dls-bb1.c.t  0.0%    10   48.7  49.1  45.5  60.0   4.0
 13\. pos5-0.gsr12012.lax.he.net   10.0%    10  366.4 349.0 319.9 366.4  15.3
 14\. pos3-2.gsr12416.pao.he.net   10.0%    10  239.0 112.2  88.0 239.0  48.7
 15\. pos5-0.gsr12416.fmt.he.net   10.0%    10  169.0 212.9 122.3 359.2  83.5
 16\. pos10-0.gsr12012.fmt.he.net   0.0%    10   90.8  90.8  87.7 106.2   5.5
 17\. host16.fremont.linode.com    10.0%    10   88.8  89.5  88.5  91.2   1.2

Excuse my ignorance but why do the high latencies show up on the second router after the Telia link?

@raman:

This is the trace from Coral Springs, Florida:

HOST: ?????                      Loss%   Snt   Last   Avg  Best  Wrst StDev
 <snip>12\. hurricane-113208-dls-bb1.c.t  0.0%    10   48.7  49.1  45.5  60.0   4.0
 13\. pos5-0.gsr12012.lax.he.net   10.0%    10  366.4 349.0 319.9  366.4  15.3</snip> 

Excuse my ignorance but why do the high latencies show up on the second router after the Telia link?

Number 12 is a Telia router. Number 13 is an HE router. The latency and packet loss starts at hop 13 and continues. This indicates the problem is the link between these two routers.

@mikegrb:

Number 12 is a Telia router. Number 13 is an HE router. The latency and packet loss starts at hop 13 and continues. This indicates the problem is the link between these two routers.

Got it, I guess the router name got chopped off in the report…

Cheers,

Raman

Just to add to the entertainment, I'm on the crap route as well. Here's the trace from Southern Oregon. Yay Quest for routing this the hard way - I'm right on I-5 and the trunk line to Mae West is buried 30 feet from my house. But no, my packets need to do a Grand Tour, and get chomped by telia's mishap, whatever that is.

traceroute to www.taupehat.com (64.62.231.41), 30 hops max, 40 byte packets
 1  * * *
 2  eugn-dsl-gw01-193.eugn.qwest.net (0.0.0.0)  41.127 ms  40.991 ms  41.097 ms
 3  eugn-agw1.inet.qwest.net (67.42.192.93)  41.322 ms  41.303 ms  56.599 ms
 4  egn-core-01.inet.qwest.net (205.171.150.33)  41.102 ms  41.082 ms  40.921 ms
 5  stl-core-01.inet.qwest.net (205.171.5.125)  48.698 ms  47.446 ms  58.311 ms
 6  svx-core-02.inet.qwest.net (67.14.1.58)  64.940 ms  64.945 ms  72.331 ms
 7  sjp-brdr-01.inet.qwest.net (205.171.214.138)  64.826 ms  64.921 ms  65.471 ms
 8  sjo-bb1-geth1-2-0.telia.net (213.248.86.13)  65.049 ms  64.498 ms  73.814 ms
 9  las-bb1-pos7-0-0-0.telia.net (213.248.80.17)  104.238 ms  76.053 ms  76.322 ms
10  hurricane-108839-las-bb1.c.telia.net (213.248.94.42)  454.248 ms  461.861 ms  506.956 ms
11  pos3-2.gsr12416.pao.he.net (65.19.129.1)  708.738 ms  637.735 ms  458.944 ms
12  pos5-0.gsr12416.fmt.he.net (216.218.229.33)  608.642 ms  475.644 ms  531.779 ms
13  pos8-0.gsr12012.fmt.he.net (66.220.20.138)  458.224 ms  461.637 ms  456.687 ms
14  taupehat.com (64.62.231.41)  451.295 ms  478.868 ms  483.318 ms

I thought I'd add the results from my own Linode at TP in Dallas:

$ mtr --report host18.linode.com
HOST: newton.betadome.net         Loss%   Snt   Last   Avg  Best  Wrst StDev
  1\. up3.linode.com                0.0%    10    0.9   0.7   0.6   0.9   0.1
  2\. vl1.dsr02.dllstx2.theplanet.  0.0%    10    0.7  18.8   0.6 117.3  40.1
  3\. vl22.dsr02.dllstx3.theplanet  0.0%    10    0.9  10.2   0.9  93.2  29.2
  4\. 25.7f.5546.static.theplanet.  0.0%    10    1.1   1.0   0.9   1.1   0.1
  5\. dal-ix.he.net                 0.0%    10    1.0   0.9   0.9   1.1   0.1
  6\. pos5-0.gsr12012.lax.he.net    0.0%    10   35.4  35.4  35.3  35.6   0.1
  7\. pos3-2.gsr12416.pao.he.net    0.0%    10   49.1  98.6  48.9 160.6  48.5
  8\. pos5-0.gsr12416.fmt.he.net    0.0%    10   49.5  82.6  49.4 303.5  81.3
  9\. pos10-0.gsr12012.fmt.he.net   0.0%    10   49.8  49.5  49.4  49.8   0.1
 10\. host18.fremont.linode.com     0.0%    10   49.9  50.0  49.6  50.5   0.3

Something's up at HE, that's for sure.

Any update on this? My site is slow as well…

anubis cdrom # mtr --report host18.linode.com
HOST: anubis                      Loss%   Snt   Last   Avg  Best  Wrst StDev
  1\. 192.168.0.1                   0.0%    10    0.8   1.1   0.8   3.2   0.7
  2\. ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  3\. GE-1-6-ur04.beaverton.or.bve  0.0%    10   10.9  10.3   8.9  11.3   0.8
  4\. 10g-9-2-ur03.beaverton.or.bv  0.0%    10   10.1  13.1   9.0  25.9   5.1
  5\. 10g-9-2-ar01.beaverton.or.bv 50.0%    10   10.3  10.3   9.6  12.2   1.1
  6\. 12.119.199.21                 0.0%    10   32.3  31.6  29.9  34.1   1.2
  7\. 12.127.6.42                   0.0%    10   34.8  34.1  31.5  39.1   2.1
  8\. tbr2-cl10.sffca.ip.att.net    0.0%    10   32.9  32.2  30.6  34.8   1.1
  9\. 12.122.82.169                 0.0%    10   33.3  37.4  30.1  54.3   7.6
 10\. 192.205.33.26                 0.0%    10   35.8  33.5  31.7  35.8   1.2
 11\. las-bb1-pos7-0-0-0.telia.net  0.0%    10   42.7  54.3  40.9 162.4  38.0
 12\. hurricane-108839-las-bb1.c.t 10.0%    10  487.6 467.0 428.4 487.6  19.9
 13\. pos3-2.gsr12416.pao.he.net   10.0%    10  698.3 500.0 431.1 698.3  77.6
 14\. pos5-0.gsr12416.fmt.he.net   10.0%    10  503.3 482.8 440.5 522.5  28.5
 15\. pos10-0.gsr12012.fmt.he.net  10.0%    10  500.8 478.4 449.8 502.2  23.0
 16\. host18.fremont.linode.com    10.0%    10  500.6 476.6 456.5 500.6  20.0

Looks like they fixed it. I was typing very slowly and suddenly it was fast. Just two minutes ago…

# mtr --report 66.220.1.142
HOST                                    LOSS  RCVD SENT    BEST     AVG   WORST
???                                     100%     0   16    0.00    0.00    0.00
ge-2-1-ur01.union.nj.panjde.comcast.net    0%    16   16   11.56   13.15   14.73
po10-ur02.union.nj.panjde.comcast.net     0%    16   16   11.05   13.14   16.52
po10-ur01.jerseycity.nj.panjde.comcast.net    0%    16   16   11.93   13.81   19.66
po10-ur02.jerseycity.nj.panjde.comcast.net    0%    16   16   11.57   19.96  110.82
po10-ur01.narlington.nj.panjde.comcast.net    0%    16   16   11.93   14.28   21.52
po70-ar01.verona.nj.panjde.comcast.net    0%    16   16   12.64   14.78   18.15
ar01.plainfield.nj.panjde.comcast.net     0%    16   16   13.81   15.51   19.57
GE-2-0-cr01.plainfield.nj.core.comcast.net    0%    16   16   13.99   15.49   17.04
12.118.149.9                              0%    16   16   14.25   15.51   16.85
tbr2-p014001.n54ny.ip.att.net             0%    16   16   15.42   17.04   19.49
12.123.0.93                               0%    16   16   14.86   18.43   36.70
nyk-b2-link.telia.net                     0%    16   16   14.83   16.61   25.72
nyk-bb1-pos7-2-0.telia.net                0%    16   16   14.74   16.54   18.19
chi-bb1-pos6-0-0-0.telia.net              0%    16   16   35.28   37.17   41.15
dls-bb1-pos6-0-0.telia.net                0%    16   16   56.30   58.28   64.94
hurricane-113208-dls-bb1.c.telia.net      0%    16   16   56.29   57.77   59.05
pos5-0.gsr12012.lax.he.net                7%    15   16  329.76  349.14  372.88
pos3-2.gsr12416.pao.he.net                0%    16   16  100.80  186.87  364.52
pos5-0.gsr12416.fmt.he.net                0%    16   16   97.25  210.64  462.22
pos10-0.gsr12012.fmt.he.net               7%    15   16   93.03   95.69   98.28
li4-142.members.linode.com                7%    15   16   94.39   96.84  105.06

After the network problems my node (on host56) is off, and I can't boot it. Boot fais with:

xen_linode_boot: failed to get domid
xen_linode_boot: warning - li-network might not have ran

But I can ssh to the host…

How about the other guys in the Xen host? Did you boot?

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct