在12月25日和1月5日之间的12天里,Linode看到了超过100次针对我们基础设施的每个主要部分的拒绝服务攻击,其中一些严重扰乱了数十万Linode客户的服务。我想跟进我之前的更新,提供一些关于我们如何被攻击以及我们正在做什么来阻止它再次发生的深入了解。
基本上,攻击者大致以这样的顺序在我们的堆栈上移动:
- 对我们面向公众的网站的第7层("400个错误请求")攻击
- 对我们的网站、权威名称服务器和其他公共服务的大量攻击
- 对Linode网络基础设施的批量攻击
- 对我们的主机托管供应商的网络基础设施的批量攻击
大多数的攻击都是简单的体积攻击。体积攻击是最常见的分布式拒绝服务(DDoS)攻击类型,在这种攻击中,垃圾流量的大炮被指向一个IP地址,将目标受害者从互联网上抹去。这实际上相当于用一队出租汽车故意造成交通堵塞,这些类型的攻击的普遍性已经在全球范围内造成了数千亿美元的经济损失。
通常情况下,Linode每天都会看到几十次针对我们客户的体积攻击。然而,这些攻击几乎从未影响到更广泛的 Linode 网络,因为我们使用一种工具来保护自己,称为远程触发的黑名单。当一个IP地址被 "封锁 "时,互联网集体同意放弃所有指向该IP地址的流量,防止好的和坏的流量到达该地址。对于像Linode这样拥有数十万个IP的内容网络来说,黑名单是我们武库中的一个钝器,但也是至关重要的武器,使我们有能力 "砍掉一个手指以拯救手"--也就是说,牺牲被攻击的客户,以保持其他客户的在线。
在一种明显但重要的情况下,黑客行为无法作为一种有效的缓解手段:当被攻击的 IP(例如某些关键的基础设施)无法在不连累其他 IP 的情况下下线时。我们通常想到的例子是 "服务器中的服务器",如 API 端点或 DNS 服务器,它们构成了其他基础设施的基础。虽然许多攻击都是针对我们的 "服务器中的服务器",但对我们来说,最难缓解的是直接针对我们和主机托管提供商网络基础设施的攻击。
次要地址
针对我们的网络基础设施的攻击是相对简单的,但缓解它们却不是。作为一个历史遗留问题,我们将客户划分为独立的/24子网,这意味着我们的路由器必须在每个子网内有一个 "辅助 "IP地址,供客户作为他们的网络网关使用。
随着时间的推移,我们的路由器已经积累了数百个这样的二级地址,每个都是潜在的攻击目标。当然,这并不是我们的路由器第一次被直接攻击。通常情况下,我们会采取特殊措施,向我们的上游发送黑洞广告,而不在我们的核心区黑洞,阻止攻击,同时允许客户流量照常通过。然而,我们对有人迅速和不可预测地攻击我们路由器上的许多几十个不同的二级IP的情况毫无准备。这是有几个原因的。首先,缓解对网络设备的攻击需要网络工程师的手动干预,这很慢而且容易出错。其次,我们的上游供应商只能接受有限数量的黑洞广告,以限制错误情况下的潜在损害。
在与攻击者玩了几天猫捉老鼠的游戏后,我们能够与我们的主机托管供应商合作,要么把我们所有的二级地址都黑掉,要么把流量丢在他们的转运供应商网络的边缘,因为那里不可能黑掉。
交叉连接
针对我们的主机托管供应商的攻击同样直接,但更难缓解。一旦我们的路由器不再能够被直接攻击,我们的主机托管合作伙伴和他们的中转供应商就成为下一个合乎逻辑的目标--特别是他们的交叉连接。交叉连接通常可以被认为是互联网上任何两个路由器之间的物理链接。这种物理连接的每一方都需要一个IP地址,以便这两个路由器能够相互通信,而这些IP地址正是攻击的目标。
正如我们自己的基础设施的情况一样,这种攻击方法本身并不新颖。使这种方法如此有效的是攻击的快速性和不可预测性。在我们的许多数据中心,上游网络中的几十个不同的IP被攻击,这需要我们的主机托管合作伙伴和他们的传输供应商之间的关注和协调,这是很难维持的。迄今为止,我们最长的故障--在亚特兰大超过30小时--可以直接归因于Linode工作人员和有时与我们相差四度的人之间频繁的沟通中断。在一些顽固的运输供应商最终承认他们的基础设施受到攻击并成功采取措施阻止攻击后,我们最终能够完全关闭这一攻击载体。
汲取的教训
就个人而言,我们对可能发生这样的事情感到尴尬,我们已经从这次经历中吸取了一些艰难的教训。
第一课:不要依赖中间商事后看来,我们认为,如果我们不依赖我们的主机托管合作伙伴进行IP转接,那么较长时间的停电是可以避免的。这有两个具体原因:首先,在一些情况下,我们被引导相信,我们的主机托管供应商只是拥有比实际更多的IP传输能力。有几次,针对Linode的攻击流量是如此之大,以至于我们的主机托管供应商别无选择,只能暂时取消与Linode网络的对等,直到攻击结束。其次,成功缓解一些更细微的攻击需要来自不同一级供应商的高级网络工程师的直接参与。在周末假期的凌晨4点,我们的主机托管合作伙伴成为我们和能够解决我们问题的人之间的一个额外的、不必要的障碍。
第二课:吸收更大的攻击Linode的IP传输容量管理策略一直很简单:当我们的峰值日利用率开始接近我们整体容量的50%时,那么就是获得更多链接的时候了。这种策略是运营商网络的标准,但我们现在明白,它对像我们这样的内容网络是不够的。为了给这个问题提供一些真实的数字,我们较小的数据中心网络的总IP传输能力为40Gbps。对你们中的许多人来说,这可能是一个很大的容量,但在无法封锁的80Gbps DDoS的背景下,只有20Gbps的余量使我们在攻击期间有严重的数据包丢失。
第三课:让客户知道正在发生什么重要的是,我们要承认我们的失败,在攻击的早期,我们缺乏详细的沟通是一个很大的失败。在危机时期提供详细的技术更新,只能由那些对当前事态有详细了解的人完成。通常情况下,这些人也是正在灭火的人。在事情稳定下来后,我们回顾了我们的公共沟通,我们得出的结论是,我们害怕措辞不当,引起不必要的恐慌,导致我们在状态更新中说得比我们应该说的更含糊。这是错误的,今后,一个指定的技术负责人将负责在这样的重大事件中进行详细的沟通。此外,我们的状态页面现在允许客户通过 "订阅更新 "链接,以电子邮件和短信方式提醒服务问题。
我们的未来比我们的过去更光明
考虑到这些教训,我们想让你知道我们是如何将它们付诸实践的。首先,简单的部分:我们已经通过实施DDoS缓解措施,减轻了对我们面向公众的服务器的攻击威胁。我们的名字服务器现在受到Cloudflare的保护,我们的网站现在受到强大的商业流量清洗设备的保护。此外,我们已经确保在这些假期攻击中实施的紧急缓解技术已经成为永久性的。
这些措施本身就使我们有信心,假期中发生的那类攻击不会再发生。但是,我们仍然需要做更多。因此,今天我很高兴地宣布,Linode将彻底改变我们整个数据中心的连接策略,从主要的区域存在点向我们的每个地点回传200千兆比特的传输和对接能力。
以下是对我们纽瓦克数据中心即将进行的基础设施改进的概述,该中心将率先获得这些能力升级。
这个架构的头条是我们已经开始建立的光传输网络。这些网络将为该地区一些最重要的PoP提供完全不同的路径,使Linode能够访问数百个不同的运营商选项和数千个直接对等的合作伙伴。与我们现有的架构相比,这次升级的好处是显而易见的。我们将控制我们的整个基础设施,直到互联网的最边缘。这意味着,我们将与我们赖以提供服务的运营商建立直接的伙伴关系,而不是依赖中间商提供IP传输。
此外,Linode将使我们目前可用的带宽增加五倍,使我们能够吸收极其巨大的DDoS攻击,直到得到适当缓解。随着未来攻击规模的增长,这个架构将迅速扩展以满足他们的需求,而不需要任何重大的新资本投资。
最后的话
最后,要真诚地道歉。作为一家为我们的客户托管关键基础设施的公司,我们被赋予了保持该基础设施在线的责任。我们希望这个帖子中的透明度和前瞻性思维能够重新获得一些信任。我们也要感谢你的理解和支持的好话。我们中的许多人的假期被这些无情的攻击毁掉了,要试图向我们的亲人解释这是一件困难的事情。来自社区的支持真的很有帮助。我们鼓励你在下面发表你的问题或评论。
评论 (67)
Thanks for your great work. My VPS was running well during these days.
Good postmortem analysis – thanks for being candid.
Thanks for being honest and forthcoming about this and the issues you addressed-both on the technical and PR sides-as well as the steps you are taking to better your company.
Kimo.
You people are awesome and have great stamina. We are satisfied customer from Pakistan.
I’ll never stop buying linodes!!
You guys are are rock stars in my book, and I appreciate the transparency. More tech companies need to live and breath that these days, or else find themselves losing the game to cheaper competitors.
While I haven’t been a fan of how some past incidents were handled, I still give Linode a 5-star rating. Good job!
Things happen. Those of us who network or sysadmin know that when youre fighting fires and figuring out what is going on and fielding calls from angry clients the last thing you have time for is updating everyone. Hell…you may not even know what all is going on for a couple days or more with huge attacks.
This is a good postmortem and your ability to learn and adapt and invest in your own infrastructure is why I love and continue to be a Linode fanatic.
Keep it up you guys. Sorry Christmas was such a bummer.
May the Network be with you!
Can’t thank the Linode team enough for your dedication. The livelyhood of thousands rest in your hands, I feel like this whole event further proves how well qualified you guys are to be doing what you’re doing.
The only part of this that really bothers me is the idea that if I get a DDOS, Linode is just going to blackhole me, and me alone. Doesn’t that mean that I have to give in to ransom demands from attackers?
I really appreciate this. We were waiting for this to take the decision if we will stay in linode or move away, and we are staying.
I strongly agree that being more transparent would have helped a LOT.
I’d like to know, though, when is scheduled the above change in the rest of the datacenters. I’m not using newark right now and would like to know when my datacenter will have it : )
Thanks a lot,
Rodrigo
@Mogden – for people who are attacked regularly, we suggest Cloudflare or others in the DDoS protection market. I’m not sure what the future holds on this subject, but rest assured that it really bothers us too.
Thanks for the update. Any time frame for other datacenters to be updated? My linodes are in Atlanta and we suffered almost three days of downtime.
Cheers
We had 2 linodes, one of them in Atlanta datacenter. We have not experience any issues during holidays, but I was worried though. Thanks for the explanation and amazing work. Honestly hope your family can understand the situation.
Amazing company!
Like Rodrigo, this is a huge thing to us. I was honestly feeling that it was going the usual corporate way with silence and deniability, just waiting for the furore to die down. It really makes a difference to hear not only the details of the response/mitigation activities, which we appreciate, but also acknowledgement of the position we were put into when communication was sparse.
It goes a long way.
Thanks again.
Mark.
Great to hear we could help you get protected.
swiner@cloudflare.com
@mogden – if your the one being ddos’d then you deserve to be blackholed. I dont pay for my linodes for you to be targetted with a ddos and mine linodes taken down!!
Thank you for the analysis and a break down of what took place, and most importantly, thank you for being honest with customers!
Cheers!
I’m obviously a huge fan of Linode, but I wonder if this attack will force them to re-evaluate their “3 strikes” policy towards hosted sites which come under DDoS attack. As this attack should have taught them, it’s indiscriminate, and there’s not a whole lot a small website owner can do to mitigate it. We rely on Linode to be able to deal with this, and punishing the victim is hardly a fair solution.
And attacks started minutes after posting updates. http://status.linode.com/incidents/mkcgnmjmnnln
I’ve a message for Linode especially Chris, please invest more and more on infrastructure if you want to stay in the game otherwise, you’ll be overtaken by heavily funded startups in this domain. We know you have innovative mind and excellent technology but this alone is not sufficient for you to win in this domain. I like performance and flexibility of Linode but moved to DO just because I needed to setup my stuff at Japan and Singapore data-centers and Japan DC is sold out. 3 out of 6 locations are sold out and you are not yet expanding? How will you compete?
Come out of your box and look at your neighbors. It was painful to move to Digital Ocean for me but I had to take this decision. I am still using Linode for some of my stuff will continue using it until I need redundancy or you expand.
There’s nothing that I love more than the amount of technical detail that you provide to us on these cases, and even with some minor updates.
I love being a Linode customer, no DDoS will get that away from me 😀
Thanks for this post, Alex. This was a rough period for everyone involved and affected but I am extremely impressed by Linode making the effort to hopefully prevent the same scenario from happening again.
There were many lessons to be learned from this – both for Linode and for customers.
Linode appears to have realized what they needed to do and that is fantastic. Instead of saying sh*t happens and going about business as usual you are actively working to make sure it doesn’t happen again. Well done.
We (customers) need to cover our own bases too. For anything critical or even slightly important you need to have a plan in place in the event of a Linode outage (regardless of the reason).
I have now split some of my services and are far better placed to recover quickly in the event something like this were to happen again. Linode had always been so reliable that I got complacent. Lesson definitely learned.
In my case my costs have now increased as I am now paying other providers in addition to what I have and will continue to pay Linode, but the ability to keep some important services online is worth it.
Thank you to everyone at Linode for your hard work and for looking out for your customers.
Some of our big clients suffered with the downtime on those days but, with several VPS and more online each day, we never accepted any offer from others players. This kind of behaviour make us confident with the team and give us peace of mind that we’re in good hands.
Thank you for the update and respect with your customers.
Hostcare Internet
Thank you for being open, good luck with your new defences and I hope that you catch up on your family time!
Linode user here. Thanks for the transparency. I wasn’t directly affected but I appreciate the openness on the issue. It’s a welcome change to most companies now. I plan to keep using Linode just because of how cool you all handled the situation. Keep up the good work!
Cloudflare will probably help with your DDoS but they aren’t infallible as any other vendor.. But what happens when they get hit really hard themselves? I’d recommend getting a second DNS provider.
See Also: https://blog.thousandeyes.com/ultradns-ddos-affects-major-web-services/
https://blog.thousandeyes.com/ultradns-outage-october-2015/
I was beginning to wonder if such a note would arrive. The explanation is useful and I’m feeling as though things are safer than before.
Thank you for being transparent about what happened. That was a truly hellish attack. Getting slammed with a sophisticated and highly targeted 80 Gbit DDoS is stressful for any network admin and I’m glad that Linode succeeded in weathering the storm.
I am really impressed with way you have handled this whole situation, your company’s honesty and explanation is more than anyone could have expected. I’m sure there were many hours invested, not only in locating and fixing the problem on top of adding the double protection; but even in your letter to your customers. I hope all your customers are as loyal to your company as you have been with them. Way to step up your game, keep up the good work. Wishes for much more success……
Thank you very much for the detailed breakdown of what went wrong and what you plan to do to prevent this in the future. I have to say though, technical reasons and justifications aside, Linode has a lot to learn in regards to communication. I know you acknowledge that in your blog post but for many people (myself included) it’s too little way too late. It’s taken you 30 days to write a blog post that could’ve been written in hours. For 30 days people have been sitting on the fence wondering exactly what you guys are doing and whether or not they should jump ship. For many people (myself included), the absence of this response and the overall feeling that it has been so long since you said you were going to provide an update, that honestly you were just going to push this to the side and hope it went away, has directly contributed to Linode losing a significant amount of business from us.
I don’t want my response to turn into some Linode bashing post, but I want you to be aware that your failure to provide sufficient information and responses is the biggest problem here – for me, at least. It hit your reputation hard and caused us to lose a significant amount of trust in your company and services. DDoS attacks happen, and we know you guys were working extremely hard to deal with those. You reminded us often enough in your status updates. What we really wanted to know was that the worst was over and that you identified your weaknesses and were addressing those. The longer we had to wait for this information, the less trust we had/have in you.
I’d like to end this on a more positive note. All of the above said, your services are fantastic overall and I’d love to come back to Linode in the future, once you’ve performed all of the changes you have mentioned here. Just please, improve on your communications!
Long-time Linode customer…I wasn’t affected by the outage, but I’m really glad you’ve taken the time to write up what happened. Thanks for being transparent and generally awesome.
Alex, this caught my attention: “… requiring a level of focus and coordination between our colocation partners and their transit providers which was difficult to maintain.”
How did you structure this communication? What tools / technologies did you use or tried to use?
This is a nicely put article. I only have amazing things to say about Linode and its staff. Awesome post!
As a long time customer and a fellow network administrator I just wanted to say that I do really appreciate all your hard work. Respect.
Sounds an exciting project Alex, good luck!
Any news on continued security farces at Linode? and ‘The Best Practices not invented here’ approach.. For example to reset 2FA
—
Should you need us to disable your Two-Factor Authentication, the following information is required:
An image of the front and back of the payment card on file, which clearly shows both the last 6 digits and owner of the card.
An image of the front and back of the matching government-issued photo ID.
—
A) Photoshop CC in 2 mins, you have no idea what my CC should look like.
B) You can’t verify government ID so say 5 minute photoshop.
Woohoo for 2FA, known as 2 f… alls
Thanks for the update, and letting us know that things will be better handled in the future. Both technically and on the communication front.
Any idea who attacked and why?
Linode – you are the best. Thanks for your service.
Thanks for the update. As a long time linode customer, it is appreciated.
For you guys complaining about being kicked out in case of a DDoS, I recommend getting DDoS protection for your linodes. There are a lot of cheap options right there that can be integrated easily.
Some one recommended CloudFlare and they are great. You can also look at Sucuri:
http://sucuri.net/website-firewall/
Or Incapsula:
https://incapsula.com
Both great products and solutions. Stay safe!
200g? this years ddos was 800gbps…
good postmortem. now can you explain what happened with the “leaked” credentials and the fact that we had to reset the passwords.
thank you
These attacks could happen to anyone and any provider. Keep up the good work!
Great article and the right way to handle these kinds of problems. Transparency and constructive retros are the way to go.
I think you did great job considering the size of the attack. That’s why continue to use Linode for my virtual machines. Thank you for your support and keep up the good work.
Thank you for the clear and concise explanation. I look forward to you rolling out your upgrades and continue to be a happy customer with Linode.
Cisco routers, seriously?
Juniper high end routers take a gigantic steaming dump all over Cisco.
@Jake that’s essentially what ASRs are 😉
If you want to do it on the cheap side and be safe, get some cheaper / best equipment from huawei (give them a call). You might think the Chinese cannot be better than Cisco, but Cisco is now also made in China. Also I’m sorry, but you need some Ddos protection (expensive). You cannot just nullroute your costumers… you have to protect them. If the cheap OVH company can do it, why can’t you…
Looks like you guys need to hire someone with real experience in network engineering (worked at ISP level), not just some cheap undergraduate out of university.
You need to rely more on anycast, have reserved capacity, etc.
After reading this, I would not host my sites on linode. You guys look amateur (sorry).
I appreciate this honest insight, but I’ve moved back to a local server since these attacks made access to my Linode difficult or impossible, and always-on, always-accessible was my main reason for moving to Linode in the first place. Sorry, and better luck in the future.
I like the transparency, even delayed. I like that you’re taking steps. I DON’T like that your “security appliances” block ALL ICMP packets including the “Packet Too Big” messages required for path MTU discovery and breaking my ability to access the Manager over my VPN.
Buying blended internet direct from your colo provider is a bad idea (as it seems you have learned the hardway)
You should be getting your transit direct from diverse carriers… this is networking 101
Love the armchair quarterbacks giving their input. Now, for you QBs, where is your massive company you are running and making decisions and learning lessons from? Oh you don’t have one and you don’t work for one? Sit back and let Linode do their job, they are by far the best provider out there. The cost of this type of infrastructure is gigantic and you wanna-be QBs have no idea what it takes to run a business.
Great job Linode. I know I’ve made the right choice by using you.
Excellent. I knew you guys were “on it”. I really appreciate the detail you provided.
Thank you for releasing this honest and detailed report
Regarding CloudFlare, did you shop around for any other DNS DDOS protection services? The reason I ask is because CloudFlare happily caches too many dodgy websites. Some sources that may be of interest:
http://news.netcraft.com/archives/2015/10/12/certificate-authorities-issue-hundreds-of-deceptive-ssl-certificates-to-fraudsters.html (large number of phishing certificates issued by CloudFlare)
http://www.crimeflare.com (non-profit that investigates CloudFlare and its customers)
I appreciate the update, but i find a bit late too.
Also i don’t really get why Mr. Forster signing this post?
And don’t get me wrong, i have nothing against him, i don’t doubt his intentions or knowledge.
But i expected a statement from someone from the top of the food chain . This was also one of my main problems when the events happened, its like nobody cares from the top management, until one of the engineers realized that they can’t be silent anymore.
I still have that feeling, and is pretty alarming .
It’s time to move to IPv6-only internet. Attacking a single address will become impractical if a host can have millions of them changed automatically in an unpredictable way.
Appreciate the info.
It is a minor point, I know, but status.linode.com should either be un-available over https, or have its own cert.
try this in chrome…
https://status.linode.com
Thanks Linode Team for acknowledging your challenges, and courageously taking adaptive actions 🙂
Great job! Didn’t know such a story ongoing since my site was on all the time. Really appreciate all the hard work of LINODE support team!
Thank you for the very interesting update. Best of luck for the future.
I’m also quite curious on who could benefit from such attacks in the first place.
I am using Cloud Flare to protect the blog from DDOS attack, is there any other best application available to replace cloudflare? Is there a way to stop the DDOS or brute force attack for wordpress sites?
Great write up & good to see such honesty and transparency. I think it is important for readers of this to understand that DDoS attacks can affect anyone at any time on any host. Obviously when you are on the receiving end of a nullroute it is not nice, but It’s important to note though that providers do not want for you to have downtime, but if a DDoS directed at you is affecting other customers and you don’t have some form of mitigation, there is seldom any other option than to take this action. As they said, ‘cut off a finger to save the hand’. I’m quite sure that if someone else is being DDoS’d that you would prefer to see them nullrouted than have your own service impacted, so that has to work both ways in my eyes.
It’s important to look at the issue objectively – DDoS attacks are not going to go away and really if you have concerns around protection then this does mean paying for a mitigation service, especially if outages will be more costly than the monthly sub.
@Srinivas – You’ll need a CloudFlare business plan for DDoS attack mitigation. Simply being behind CloudFlare on a free plan won’t give you this protection, and there isn’t another service that I am aware of that provides free DDoS protection without at least having some other paid service. Keep in mind that CloudFlare isn’t an application, but rather a service which is totally separate from your Wordpress sites. If you want to run something locally to stop a brute force attack then have a look at a plugin such as Wordfence, which is very effective. Another good plugin is iQ Block Country which uses GeoLocation – you can lock down your back end to whitelisted countries only. Plugins are not infallible, but they definitely add extra security. Another good way to stop brute force attacks is by not using obvious account names for the administration area of your site…lots of tools will try to brute force on usernames like ‘admin’ – as with any security approach, it’s all about the layers!
As a final note, I do always find it interesting when posts like this attract the critics who dish out ‘advice’ about how X and Y should have already been done, or that they are amateur, etc. I would like to know which fairytale jobs they have at companies that have everything 100% perfect with 100% uptime and 0% chance of outages or attacks…
Fair play Linode, tip of the cap.
Thank you for your honesty and transparency. Very very good post. Thank you for your hard work during the attacks even on holidays. Keep pushing Linode Team!
yeah thank you also for your transparency. I remember what happened, evthg gave tears and I think, as many people, we planned to move to another company. Even some days ago, I compared with AWS, reading their doc for RDS, EC2, ELB, S3 etc, but Linode, even with much less available options and possibilities if we compare to amazon, Linode stay for us a better company, with a great support and reactive, providing faster and cheaper solutions.
I started with Linode 4 years ago, I loved the service and I am not going to go away from you guys. I know how painful firefighting could be, thanks to your team for working so hard. And please do everything that could prevent this from repeating.
Hello,
on the article you said following
“our nameservers are now protected by Cloudflare, and our websites are now protected by powerful commercial traffic scrubbing appliances.”
but seems it is not anymore. did you moved away from cloudflare protection? if yes then why? many hosting giants now rely on cloudflare protection.
Thank you for this update and the recent additional high memory and $5 options.