Is It Broken Everywhere or Just for Me with Omri Sass Podcast Por  capa

Is It Broken Everywhere or Just for Me with Omri Sass

Is It Broken Everywhere or Just for Me with Omri Sass

Ouça grátis

Ver detalhes do programa

Sobre este título

When your website stops working at 3 AM, you need to answer one question fast: Is it my code or is a big cloud provider having problems? Omri Sass from Datadog explains updog.ai, a tool that monitors whether major services like AWS, CloudFlare, and others are actually working. Instead of asking people to report problems like Down Detector does, updog uses real data from thousands of computers to detect when services go down. Omri shares why this took 6 years to build, how they process massive amounts of data with machine learning, and why cloud providers have been strangely upset about these tools existing.



About Omri:

Omri Sass is a Director of Product Management at Datadog, where he leads and supports a team of 25+ product managers driving initiatives across Bits AI SRE, Data Observability, Service Management, and most recently, the launch of updog.ai. Outside of work, Omri is an avid sci-fi reader, a dedicated yoga practitioner, and happily outmatched by his cat.


Show Highlights:

(02:12) What is Updog and How Does It Work

(03:38) Why Knowing If It's a Global Problem Matters

(04:01) The Problem With Testing Every Endpoint Yourself

(05:52) How Datadog Discovered EC2 Outages From Their Own Systems

(10:38) When AWS Regions Go Down and Cascade Failures

(13:13) What Happens When Services Rebuild Completely
(16:29) The Most Important Learning During a 3 AM Incident
(20:11) Why This Took So Long to Build
(23:40) When Datadog Going Down Isn't Critical Path
(25:22) How They Picked Which AWS Services to Monitor
(27:07) What Comes Next for Updog
(30:11) Where to Find Omri and Updog


Links:

Datadog: datadoghq.com

Omir’s LinkedIn: https://www.linkedin.com/in/omri-sass-65632a14/

Sponsored by:
duckbillhq.com

Ainda não há avaliações