Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyHenning Jacobs
Talk given at JAX DevOps London on 2019-05-15
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
How to Automate Performance Tuning for Apache SparkDatabricks
Spark has made writing big data pipelines much easier than before. But a lot of effort is required to maintain performant and stable data pipelines in production over time. Did I choose the right type of infrastructure for my application? Did I set the Spark configurations correctly? Can my application keep running smoothly as the volume of ingested data grows over time? How to make sure that my pipeline always finishes on time and meets its SLA?
These questions are not easy to answer even for a handful of jobs, and this maintenance work can become a real burden as you scale to dozens, hundreds, or thousands of jobs. This talk will review what we found to be the most useful piece of information and parameters to look at for manual tuning, and the different options available to engineers who want to automate this work, from open-source tools to managed services provided by the data platform or third parties like the Data Mechanics platform.
Robby Morgan presented on Bazaarvoice's large-scale use of Solr. Bazaarvoice uses Solr to index over 250 million documents and handle up to 10,000 queries per second. They deployed Solr across multiple data centers for high availability. Key lessons included ensuring adequate RAM, simulating performance before large deployments, and challenges with cross-data center replication and schema changes. Overall, Solr provided fast search but real-time updates and elastic scaling required additional work.
This document discusses elastic distributed deep learning training at scale on-premises and in the cloud. It introduces the architecture of elastic distributed training, which combines high performance synchronization techniques like distributed data parallel with session scheduling and elastic scaling to provide flexibility. This allows training jobs to automatically scale up and down resources based on policies while maintaining high performance. It aims to make distributed training transparent to frameworks like TensorFlow and PyTorch.
16 artifacts to capture when there is a production problemTier1 app
Production problems are tricky to troubleshoot if proper diagnostic information isn’t captured. In this session, 16 important artifacts that you need to capture and the effective tools that you can use to analyze those artifacts are discussed.
The document provides information on application performance tuning education. It discusses key performance metrics like TPS and considerations for CPU usage, memory usage, garbage collection. It then summarizes Java/Tomcat performance tuning factors and garbage collection options. The last part discusses Java profiling and troubleshooting tools like JDK tools, HPROF, jhat, jmap, jstack, jstat and jvisualvm. It also provides an example Tomcat shell script configuration for setting JVM options and using profiling agents.
Talk held at DevOps Gathering 2019 in Bochum on 2019-03-13.
Abstract: This talk will address one of the most common challenges of organizations adopting Kubernetes on a medium to large scale: how to keep cloud costs under control without babysitting each and every deployment and cluster configuration? How to operate 80+ Kubernetes clusters in a cost-efficient way for 200+ autonomous development teams?
This talk provides insights on how Zalando approaches this problem with central cost optimizations (e.g. Spot), cost monitoring/alerting, active measures to reduce resource slack, and automated cluster housekeeping. We will focus on how to ingrain cost efficiency in tooling and developer workflows while balancing rigid cost control with developer convenience and without impacting availability or performance. We will show our use case running Kubernetes on AWS, but all shown tools are open source and can be applied to most other infrastructure environments.
Uncover the hidden challenges that plague production environments in this eye-opening session. Join us as we explore the five most common performance problems that emerge in live systems. Gain invaluable insights into detecting these issues early on, before they wreak havoc on your operations. Discover practical solutions that empower you to address these challenges head-on, ensuring optimal performance and seamless user experiences.
How to build a feedback loop in softwareSandeep Joshi
The document discusses how to build a feedback loop using a PID controller in software systems. It begins with an overview of why PID controllers are useful when the system to be controlled can be modeled as a "black box" and the goal is to maintain an output value. It then covers how to implement a PID controller by defining the setpoint, sensor output, control input, and PID calculation. The document provides examples of PID controllers in software systems like Golang garbage collection, Apache Spark, and Linux. It also discusses best practices like tuning parameters and avoiding issues like windup.
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
The Open Street Map project provides invaluable data that keeps driving users toward the PostGIS and PostgreSQL stacks. Loading today’s full Planet data set takes a 120GB XML file and unrolls it into over a terabyte of database data. Crunchy’s benchmark labs have followed the expansion of that Planet data over the last six database releases, as the re-ignition of the CPU wars combined with parallel execution features landing in the database. We’ll take a look at that data evolution, which server configurations worked, and which metrics techniques still matter in the all SSD era.
There are at least 40 to 50 different formats of GC logs. Here, we explained the commonly used GC log formats, tricks, patterns and tools to analyze them effectively.
JCON Online 2021, International Java Community Conference, 07.10.21, Moritz Kammerer (@Moritz Kammerer, Expert Software Engineer at QAware).
== Please download slides in case they are blurred! ===
In his talk we have had a look at how Microservices can be developed with Micronaut. In our slides you can find out if it kept its promise.
High Performance Erlang - Pitfalls and SolutionsYinghai Lu
Presented at Erlang Factory 2016, San Francisco, CA.
Erlang is widely used for building concurrent applications. However, when we push the performance of our Erlang based application to handle millions of concurrent clients, some Erlang scalability issues begin to show and some conventional programming paradigm of Erlang no longer hold. We would like to share some of these issue and how we address them. In addition, we share some of our experience on how to profile an Erlang application to identify bottlenecks.
We will take a deep look at some of the basic mechanisms of Erlang and show how they behave under high load and parallelism, which includes message delivery, process management and shared data structures such as maps and ETS tables. We will demonstrate their limitations and propose techniques to alleviate the issues.
We will also share profiling techniques on how to find those bottlenecks in Erlang applications across different levels. We will share techniques for writing highly performant Erlang applications.
Enterprise application performance - Understanding & LearningsDhaval Shah
This document discusses enterprise application performance, including:
- Performance basics like response time, throughput, and availability
- Common metrics like response time, transactions per second, and concurrent users
- Factors that affect performance such as software issues, configuration settings, and hardware resources
- Case studies where the author analyzed memory leaks, optimized services, and addressed an inability to meet non-functional requirements
- Learnings around heap dump analysis, hotspot identification, and database monitoring
For the Docker users out there, Sematext's DevOps Evangelist, Stefan Thies, goes through a number of different Docker monitoring options, points out their pros and cons, and offers solutions for Docker monitoring. Webinar contains actionable content, diagrams and how-to steps.
The document provides an overview of how to read and understand garbage collection (GC) log lines from different Java vendors and JVM versions. It begins by explaining the parts of a basic GC log line for the OpenJDK GC log format. It then discusses GC log lines for G1 GC and CMS GC in more detail. Finally, it shares examples of GC log formats from IBM JVMs and different levels of information provided. The document aims to help readers learn to correctly interpret GC logs and analyze GC behavior.
This document discusses benchmarking HTTP/2 using the h2load tool. It provides examples of using h2load to test various HTTP/2 configurations and protocols. The document also summarizes several experiments comparing performance of HTTP/2 with different settings, such as with or without domain sharding, combo handling, and different servers like ATS and nghttpx. It concludes that we need to consider server capacity for HTTP/2 deployments and that h2load is not perfect, providing opportunities for contribution.
Talk given at the London AICamp meet up on the 13 July 2023. It's an introduction on building open-source ChatGPT-like chat bots and some of the considerations to have while training/tuning them using Airflow.
Similar to Running a Go App in Kubernetes: CPU Impacts (20)
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...ScyllaDB
In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data ingestion workflows. Instead, we showcase the power of simple yet clever methods that can uncover hidden performance limitations.
Attendees will discover unconventional techniques, including clever logging, targeted instrumentation, and specialized metrics, to pinpoint bottlenecks accurately. Real-world use cases will be presented to demonstrate the effectiveness of these methods. By the end of the session, attendees will be equipped with alternative approaches to identify bottlenecks and optimize their low-latency data ingestion workflows for high throughput.
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
Measuring the Impact of Network Latency at TwitterScyllaDB
Widya Salim and Victor Ma will outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter's network latency.
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...ScyllaDB
BlazingMQ is a new open source* distributed message queuing system developed at and published by Bloomberg. It provides highly-performant queues to applications for asynchronous, efficient, and reliable communication. This system has been used at scale at Bloomberg for eight years, where it moves terabytes of data and billions of messages across tens of thousands of queues in production every day.
BlazingMQ provides highly-available, fault-tolerant queues courtesy of replication based on the Raft consensus algorithm. In addition, it provides a rich set of enterprise message routing strategies, enabling users to implement a variety of scenarios for message processing.
Written in C++ from the ground up, BlazingMQ has been architected with low latency as one of its core requirements. This has resulted in some unique design and implementation choices at all levels of the system, such as its lock-free threading model, custom memory allocators, compact wire protocol, multi-hop network topology, and more.
This talk will provide an overview of BlazingMQ. We will then delve into the system’s core design principles, architecture, and implementation details in order to explore the crucial role they play in its performance and reliability.
*BlazingMQ will be released as open source between now and P99 (exact timing is still TBD)
Noise Canceling RUM by Tim Vereecke, AkamaiScyllaDB
Noisy Real User Monitoring (RUM) data can ruin your P99!
We introduce a fresh concept called ""Human Visible Navigations"" (HVN) to tackle this risk; we focus on the experiences you actually care about when talking about the speed of our sites:
- Human: We exclude noise coming from bots and synthetic measurements.
- Visible: We remove any partial or fully hidden experiences. These tend to be very slow but users don’t see this slowness.
- Navigations: We ignore lightning fast back-forward navigations which usually have few optimisation opportunities.
Adopting Human Visible Navigations provides you with these key benefits:
- Fewer changes staying below the radar
- Fewer data fluctuations
- Fewer blindspots when finding bottlenecks
- Better correlation with business metrics
This is supported by plenty of real world examples coming from the world's largest scale modeling site (6M Monthly visits) in combination with aggregated data from the brand new rumarchive.com (open source)
After attending this session; your P99 and other percentiles will become less noisy and easier to tune!
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...ScyllaDB
In this session, Tanel introduces a new open source eBPF tool for efficiently sampling both on-CPU events and off-CPU events for every thread (task) in the OS. Linux standard performance tools (like perf) allow you to easily profile on-CPU threads doing work, but if we want to include the off-CPU timing and reasons for the full picture, things get complicated. Combining eBPF task state arrays with periodic sampling for profiling allows us to get both a system-level overview of where threads spend their time, even when blocked and sleeping, and allow us to drill down into individual thread level, to understand why.
Performance Budgets for the Real World by Tammy EvertsScyllaDB
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works, what doesn’t, and what we need to improve. In this session, Tammy revisits old assumptions about performance budgets and offers some new best practices. Topics include:
• Understanding performance budgets vs. performance goals
• Aligning budgets with user experience
• Pros and cons of Core Web Vitals
• How to stay on top of your budgets to fight regressions
Using Libtracecmd to Analyze Your Latency and Performance TroublesScyllaDB
Trying to figure out why your application is responding late can be difficult, especially if it is because of interference from the operating system. This talk will briefly go over how to write a C program that can analyze what in the Linux system is interfering with your application. It will use trace-cmd to enable kernel trace events as well as tracing lock functions, and it will then go over a quick tutorial on how to use libtracecmd to read the created trace.dat file to uncover what is the cause of interference to you application.
Reducing P99 Latencies with Generational ZGCScyllaDB
With the low-latency garbage collector ZGC, GC pause times are no longer a big problem in Java. With sub-millisecond pause times there are instead other things in the GC and JVM that can cause application threads to experience unexpected latencies. This talk will dig into a specific use where the GC pauses are no longer the cause of unexpected latencies and look at how adding generations to ZGC help lower the p99 application latencies.
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000XScyllaDB
Linters are a type of database! They are a collection of lint rules — queries that look for rule violations to report — plus a way to execute those queries over a source code dataset.
This is a case study about using database ideas to build a linter that looks for breaking changes in Rust library APIs. Maintainability and performance are key: new Rust releases tend to have mutually-incompatible ways of representing API information, and we cannot afford to reimplement and optimize dozens of rules for each Rust version separately. Fortunately, databases don't require rewriting queries when the underlying storage format or query plan changes! This allows us to ship massive optimizations and support multiple Rust versions without making any changes to the queries that describe lint rules.
Ship now, optimize later"" can be a sustainable development practice after all — join us to see how!
How Netflix Builds High Performance Applications at Global ScaleScyllaDB
We all want to build applications that are blazingly fast. We also want to scale them to users all over the world. Can the two happen together? Can users in the slowest of environments also get a fast experience? Learn how we do this at Netflix: how we understand every user's needs and preferences and build high performance applications that work for every user, every time.
Conquering Load Balancing: Experiences from ScyllaDB DriversScyllaDB
Load balancing seems simple on the surface, with algorithms like round-robin, but the real world loves throwing curveballs. Join me in this session as we delve into the intricacies of load balancing within ScyllaDB Drivers. Discover firsthand experiences from our journey in driver development, where we employed the Power of Two Choices algorithm, optimized the implementation of load balancing in Rust Driver, mitigated cloud costs through zone-aware load balancing and combated the issue of overloading a particular core of ScyllaDB. Be prepared to delve into the practical and theoretical aspects of load balancing, gaining valuable insights along the way.
Interaction Latency: Square's User-Centric Mobile Performance MetricScyllaDB
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run).
However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion).
At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead?
This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.
How to Avoid Learning the Linux-Kernel Memory ModelScyllaDB
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a steep learning curve. Wouldn't it be great to get most of LKMM's benefits without the learning curve?
This talk will describe how to do exactly that by using the standard Linux-kernel APIs (locking, reference counting, RCU) along with a simple rules of thumb, thus gaining most of LKMM's power with less learning. And the full LKMM is always there when you need it!
99.99% of Your Traces are Trash by Paige CruzScyllaDB
Distributed tracing is still finding its footing in many organizations today, one challenge to overcome is the data volume - keeping 100% of your traces is expensive and unnecessary. Enter sampling - head vs tail how do you decide? Let’s look at the design of Sifter and get familiar with why tail-based sampling is the way to enact a cost-effective tracing solution while actually increasing the system’s observability.
Square's Lessons Learned from Implementing a Key-Value Store with RaftScyllaDB
To put it simply, Raft is used to make a use case (e.g., key-value store, indexing system) more fault tolerant to increase availability using replication (despite server and network failures). Raft has been gaining ground due to its simplicity without sacrificing consistency and performance.
Although we'll cover Raft's building blocks, this is not about the Raft algorithm; it is more about the micro-lessons one can learn from building fault-tolerant, strongly consistent distributed systems using Raft. Things like majority agreement rule (quorum), write-ahead log, split votes & randomness to reduce contention, heartbeats, split-brain syndrome, snapshots & logs replay, client requests dedupe & idempotency, consistency guarantees (linearizability), leases & stale reads, batching & streaming, parallelizing persisting & broadcasting, version control, and more!
And believe it or not, you might be using some of these techniques without even realizing it!
This is inspired by Raft paper (raft.github.io), publications & courses on Raft, and an attempt to implement a key-value store using Raft as a side project.
A Deep Dive Into Concurrent React by Matheus AlbuquerqueScyllaDB
Writing fluid user interfaces becomes more and more challenging as the application complexity increases. In this talk, we’ll explore how proper scheduling improves your app’s experience by diving into some of the concurrent React features, understanding their rationales, and how they work under the hood.
The Latency Stack: Discovering Surprising Sources of LatencyScyllaDB
Usually, when an API call is slow, developers blame ourselves and our code. We held a lock too long, or used a blocking operation, or built an inefficient query. But often, the simple picture of latency as “the time a server takes to process a message” hides a great deal of end-to-end complexity. Debugging tail latencies requires unpacking the abstractions that we normally ignore: virtualization, hidden queues, and network behavior.
In this talk, I’ll describe how developers can diagnose more sources of delay and failure by building a more realistic and broad understanding of networked services. I’ll give some real-world cases when high end-to-end latency or elevated failure rates occurred due to factors we ordinarily might not even measure. Some examples include TCP SYN retransmission; virtualization on the client; and surprising behavior from AWS load balancers. Unfortunately, many measurement techniques don’t cover anything but the portion most directly under developer control. But developers can do better by comparing multiple measurements, applying Little’s law, investing in eBPF probes, and paying attention to the network layer.
Understanding API performance to find and fix issues faster ultimately means understanding the entire stack: the client, your code, and the underlying infrastructure.
From its vantage point in the kernel, eBPF provides a platform for building a new generation of infrastructure tools for things like observability, security and networking. These kinds of facilities used to be implemented as libraries, and then in container environments they were often deployed as sidecars. In this talk let's consider why eBPF can offer numerous advantages over these models, particularly when it comes to performance.
Here's what to expect:
- Get Hands On with Einstein Copilot
- Configure Copilot for Sales & Service
- Prompt & Action Building and Simulating
- Deep Dive - CRM AI Copilot
- External & CRM Data Integration with Copilot
- Choose the right LLM/AI
- Prompts, Action Building & Configuration
- Custom Actions Using Apex and External APIs
- AI Copilot for Business Use Cases
- Quantifying Cost, Risk and ROI
Data Integration Basics: Merging & Joining DataSafe Software
Are you tired of dealing with data trapped in silos? Join our upcoming webinar to learn how to efficiently merge and join disparate datasets, transforming your data integration capabilities. This webinar is designed to empower you with the knowledge and skills needed to efficiently integrate data from various sources, allowing you to draw more value from your data.
With FME, merging and joining different types of data—whether it’s spreadsheets, databases, or spatial data—becomes a straightforward process. Our expert presenters will guide you through the essential techniques and best practices.
In this webinar, you will learn:
- Which transformers work best for your specific data types.
- How to merge attributes from multiple datasets into a single output.
- Techniques to automate these processes for greater efficiency.
Don’t miss out on this opportunity to enhance your data integration skills. By the end of this webinar, you’ll have the confidence to break down data silos and integrate your data seamlessly, boosting your productivity and the value of your data.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/07/deploying-large-language-models-on-a-raspberry-pi-a-presentation-from-useful-sensors/
Pete Warden, CEO of Useful Sensors, presents the “Deploying Large Language Models on a Raspberry Pi,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, Warden outlines the key steps required to implement a large language model (LLM) on a Raspberry Pi. He begins by outlining the motivations for running LLMs on the edge and exploring practical use cases for LLMs at the edge. Next, he provides some rules of thumb for selecting hardware to run an LLM.
Warden then walks through the steps needed to adapt an LLM for an application using prompt engineering and LoRA retraining. He demonstrates how to build and run an LLM from scratch on a Raspberry Pi. Finally, he shows how to integrate an LLM with other edge system building blocks, such as a speech recognition engine to enable spoken input and application logic to trigger actions.
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)Muhammad Ali
Exploring SQLite and the Litestack suite of SQLite based tools for Ruby and Rails applications. Litestack offers a SQL database, a cache store, a job queue, a pubsub engine, full text search and performance metrics for your Ruby/Ruby-on-Rails apps
Types of Weaving loom machine & it's technologyldtexsolbl
Welcome to the presentation on the types of weaving loom machines, brought to you by LD Texsol, a leading manufacturer of electronic Jacquard machines. Weaving looms are pivotal in textile production, enabling the interlacing of warp and weft threads to create diverse fabrics. Our exploration begins with traditional handlooms, which have been in use since ancient times, preserving artisanal craftsmanship. We then move to frame and pit looms, simple yet effective tools for small-scale and traditional weaving.
Advancing to modern industrial applications, we discuss power looms, the backbone of high-speed textile manufacturing. These looms, integral to LD Texsol's product range, offer unmatched productivity and consistent quality, essential for large-scale apparel, home textiles, and technical fabrics. Rapier looms, another modern marvel, use rapier rods for versatile and rapid weaving of complex patterns.
Next, we explore air and water jet looms, known for their efficiency in lightweight fabric production. LD Texsol's state-of-the-art electronic Jacquard machines exemplify technological advancements, enabling intricate designs and patterns with precision control. Lastly, we examine dobby looms, ideal for medium-complexity patterns and versatile fabric production.
This presentation will deepen your understanding of weaving looms, their applications, and the innovations LD Texsol brings to the textile industry. Join us as we weave through the history, technology, and future of textile production.
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxSynapseIndia
Your comprehensive guide to RPA in healthcare for 2024. Explore the benefits, use cases, and emerging trends of robotic process automation. Understand the challenges and prepare for the future of healthcare automation
In Deloitte's latest article, discover the impact of India's
three new criminal laws, effective July 1, 2024. These laws, replacing the IPC,
CrPC, and Indian Evidence Act, promise a more contemporary, concise, and
accessible legal framework, enhancing forensic investigations and aligning with
current societal needs.
Learn how these Three New Criminal Laws will shape the
future of criminal justice in India
Read More Deloitte India's Latest Article on Three New
Criminal Laws
https://www2.deloitte.com/in/en/pages/finance/articles/three-new-criminal-laws-in-India.html
Vulnerability Management: A Comprehensive OverviewSteven Carlson
This talk will break down a modern approach to vulnerability management. The main focus is to find the root cause of software risk that may expose your organization to reputation damage. The presentation will be broken down into 3 main area, potential risk, occurrence, and exploitable risk. Each segment will help professionals understand why vulnerability management programs are so important.
Using LLM Agents with Llama 3, LangGraph and MilvusZilliz
RAG systems are talked about in detail, but usually stick to the basics. In this talk, Stephen will show you how to build an Agentic RAG System using Langchain and Milvus.
High Profile Girls call Service Pune 000XX00000 Provide Best And Top Girl Ser...
Running a Go App in Kubernetes: CPU Impacts
1. HOSTED BY
Running a Go App in Kubernetes:
CPU Impacts
Teiva Harsanyi
SRE at Google
2. Teiva Harsanyi
SRE at Google
■ SRE in the Borg ML team
■ 100 Go Mistakes author — 100go.co/book
3. Introduction
● K8s is not straightforward, it's easy to be wrong
● Unveil some Go & k8s complexity
● Discuss the impacts of running a Go app inside k8s
—focus on CPU
5. Go Scheduling
3 key components:
■ G: Goroutine
■ M: OS thread (machine)
■ P: CPU core (processor)
Main actors:
■ OS scheduler: assigns an M on a P
■ Go scheduler: assigns a G on an M
12. GOMAXPROCS
Variable that defines the number of M (OS threads) that can execute
user-level Go code simultaneously
runtime.GOMAXPROCS(8) // Set GOMAXPROCS to 8
n := runtime.GOMAXPROCS(0) // Get the current value of GOMAXPROCS
Notes:
■ If an M is blocked, the Go scheduler can spin up more Ms
■ The GC derives the limit of how much CPU it should consume from GOMAXPROCS
13. k8s Deployment Config
spec:
containers:
- name: img
image: img:latest
resources:
requests:
cpu: "2000m" <--- Guaranteed minimum amount of CPU
resources
limits:
cpu: "4000m" <--- Maximum amount of CPU resources
14. k8s Scheduler
k8s uses Completely Fair Scheduler (CFS) as a process scheduler
Two main parameters:
■ cpu.cfs_period_us: Period (set to 100 ms by default)
■ cpu.cfs_quota_us: The amount of CPU time the app can consume during the defined
period
Example: if the resources.limits.cpu is set to 2000m (= 2 cores) => 2x100 = 200 ms
=> Each period of 100 ms, the app can consume up to 200 ms of CPU resource
15. Wait... What’s the default value of GOMAXPROCS?
Equal to runtime.NumCPU(), the number of CPU cores
GOMAXPROCS
Kubernetes
4 CPU-Core Machine spec:
resources:
limits:
cpu: 1000m
App
=> In this situation, will GOMAXPROCS be equal to 4 or to 1?
It will be equal to 4, Go isn’t CFS-aware (github.com/golang/go/issues/33803)
19. Why? rps < 20
Quota: 100 ms
Used: 99 ms ✅
100 ms period
28
ms
53
ms
18
ms
Quota: 100 ms
Used: 86 ms ✅
100 ms period
19
ms
17
ms
25
ms
25
ms
Quota: 100 ms
Used: 90 ms ✅
Core 0
Core 1
Core 2
Core 3
100 ms period
25
ms
28
ms
19
ms
18
ms
GOMAXPROCS=4
20. ...
...
...
...
Why? rps >= 20
Throttling
Throttling
Throttling
Throttling
Quota: 100 ms
Used: 100 ms ❌
25
ms
25
ms
25
ms
25
ms
Core 0
Core 1
Core 2
Core 3
100 ms period
Throttling
Throttling
Throttling
Throttling
Quota: 100 ms
Used: 100 ms ❌
25
ms
25
ms
25
ms
25
ms
100 ms period
Throttling
Throttling
Throttling
Throttling
Quota: 100 ms
Used: 100 ms ❌
25
ms
25
ms
25
ms
25
ms
100 ms period
GOMAXPROCS=4
21. Solution
Is the solution to remove the limit?
Yes, in most cases
Main drawbacks:
■ May increase latency variance
■ Be careful in some specific conditions;
For example, if a workload has direct
correlation between CPU and memory
usage, watch out to OOMs
Main benefit:
■ A workload can use all the idle
CPU in the node
22. At Google?
■ We do have CPU limits
■ Not set manually, the CPU limit is recalculated each time a job is
rescheduled onto different machine
■ GOMAXPROCS is also set automatically depending on the CPU limit
24. Conclusion
■ Be aware that Go isn't CFS-aware
■ GOMAXPROCS should reflect the available compute parallelism
■ Be careful with k8s CPU limits
■ Benchmarks for the win