Is It Broken Everywhere or Just for Me with Omri Sass
Falha ao colocar no Carrinho.
Falha ao adicionar à Lista de Desejos.
Falha ao remover da Lista de Desejos
Falha ao adicionar à Biblioteca
Falha ao seguir podcast
Falha ao parar de seguir podcast
-
Narrado por:
-
De:
Sobre este título
When your website stops working at 3 AM, you need to answer one question fast: Is it my code or is a big cloud provider having problems? Omri Sass from Datadog explains updog.ai, a tool that monitors whether major services like AWS, CloudFlare, and others are actually working. Instead of asking people to report problems like Down Detector does, updog uses real data from thousands of computers to detect when services go down. Omri shares why this took 6 years to build, how they process massive amounts of data with machine learning, and why cloud providers have been strangely upset about these tools existing.
About Omri:
Omri Sass is a Director of Product Management at Datadog, where he leads and supports a team of 25+ product managers driving initiatives across Bits AI SRE, Data Observability, Service Management, and most recently, the launch of updog.ai. Outside of work, Omri is an avid sci-fi reader, a dedicated yoga practitioner, and happily outmatched by his cat.
Show Highlights:
(02:12) What is Updog and How Does It Work
(03:38) Why Knowing If It's a Global Problem Matters
(04:01) The Problem With Testing Every Endpoint Yourself
(05:52) How Datadog Discovered EC2 Outages From Their Own Systems
(10:38) When AWS Regions Go Down and Cascade Failures
(13:13) What Happens When Services Rebuild Completely
(16:29) The Most Important Learning During a 3 AM Incident
(20:11) Why This Took So Long to Build
(23:40) When Datadog Going Down Isn't Critical Path
(25:22) How They Picked Which AWS Services to Monitor
(27:07) What Comes Next for Updog
(30:11) Where to Find Omri and Updog
Links:
Datadog: datadoghq.com
Omir’s LinkedIn: https://www.linkedin.com/in/omri-sass-65632a14/
Sponsored by:
duckbillhq.com