Data Center Observability & Site Reliability Engineer, Chennai
AALUCKS Talent Pro

Position: Data Center Observability & Site Reliability Engineer, Chennai
Department: Information Technology | Role: Full-time | Experience: 8 to 12 Years | Number of Positions: 2 | Location: Chennai
Skillset:
Observability Engineering, Grafana, Loki, Mimir, Alloy agent, Infrastructure metrics - GPU/CPU/K8s, Scripting - Python, Go, Bash, Prometheus, ELK, Docker, Terraform, Excellent English communication skills
Job Description:
Location: Open (should be flexible with Korea time zone)
Experience: 8+ Years
Notice Period: Immediate to 30 Days
We’re looking for a skilled Observability & Site Reliability Engineer to join our team supporting large-scale, enterprise-grade infrastructure. The ideal candidate will have deep experience with observability tools—especially Grafana, Loki, Mimir, and Kubernetes metrics/logs—and a passion for performance, scale, and uptime.
Key Must-Have Skills:
5+ years in Observability Engineering
Expertise in Grafana, Loki, Mimir, Alloy agent
Strong understanding of infrastructure metrics (GPU/CPU/K8s)
Familiarity with scripting (Python, Go, Bash)
Prior exposure to Prometheus, ELK, Docker, Terraform
Flexible to work with Korean stakeholders & time zones
Role Highlights:
Design and manage observability stack across large datacenter infra
Build scalable telemetry systems, dashboards, alerts & reports
Apply SRE practices to ensure system reliability and performance
Troubleshoot real-time issues and support ongoing optimization
Good to Have:
Prior experience working with Korean stakeholders
Knowledge of cloud platforms like AWS, GCP, Azure
Required Qualification:
Bachelor of Engineering - Bachelor of Technology (B.E./B.Tech.) - IT/CS/E&CE/MCA
With a Leading digital solutions provider
Notable Facts: 12+ Global offices | 500+ Clients | 50+ Countries