Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
-
Updated
Jun 11, 2024 - Groovy
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
A collection of git utilities, useful extra git scripts, tutorials and other useful articles.
Web UI for Jaeger
Terraform provider for Nobl9
On-Call/DevOps Assistant - Get a head start on fixing alerts with AI investigation
🐒 🔥 Datadog Failure Injection System for Kubernetes
A curated list of awesome DevOps platforms, tools, practices and resources
DevOps Tutorials
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
log data pre processing in python
Kaytu's AI platform boosts cloud efficiency by analyzing historical usage and delivering intelligent recommendations—such as optimizing instance sizes—that maintain reliability. Pay for what you need, without compromising your apps.
Terraform Pull Request Automation
Massively parallel ssh client
A prometheus exporter for pg-promise
A prometheus exporter for node-postgres
A prometheus exporter exposing metrics for KafkaJS
An active monitoring software to detect failures before your customers do.