Observability Architect
Posted 106 weeks ago
Job Description
Job Summary
We are seeking a seasoned Observability Architect to define and lead our end-to-end observability strategy across highly distributed, cloud-native, and hybrid environments. This role requires a visionary leader with deep hands-on experience in New Relic and a strong working knowledge of other modern observability platforms like Datadog, Prometheus/Grafana, Splunk, OpenTelemetry, and more. You will design scalable, resilient, and intelligent observability solutions that empower engineering, SRE, and DevOps teams to proactively detect issues, optimize performance, and ensure system reliability. This is a senior leadership role with significant influence over platform architecture, monitoring practices, and cultural transformation across global teams. Key Responsibilities
- Architect and implement full-stack observability platforms, covering metrics, logs, traces, synthetics, real user monitoring (RUM), and business-level telemetry using New Relic and other tools like Datadog, Prometheus, ELK, or AppDynamics.
- Design and enforce observability standards and instrumentation guidelines for microservices, APIs, front-end applications, and legacy systems across hybrid cloud environments.
- Experience in OpenTelemetry adoption, ensuring vendor-neutral, portable observability implementations where appropriate.
- Build multi-tool dashboards, health scorecards, SLOs/SLIs, and integrated alerting systems tailored for engineering, operations, and executive consumption.
- Collaborate with engineering and DevOps teams to integrate observability into CI/CD pipelines, GitOps, and progressive delivery workflows.
- Partner with platform, cloud, and security teams to provide end-to-end visibility across AWS, Azure, GCP, and on-prem systems.
- Lead root cause analysis, system-wide incident reviews, and reliability engineering initiatives to reduce MTTR and improve MTBF.
- Evaluate, pilot, and implement new observability tools/technologies aligned with enterprise architecture and scalability requirements.
- Deliver technical mentorship and enablement, evangelizing observability best practices and nurturing a culture of ownership and data-driven decision-making.
- Drive observability governance and maturity models, ensuring compliance, consistency, and alignment with business SLAs and customer experience goals.
Required Qualifications
- 15+ years of overall IT experience, hands-on with application development, system architecture, operations in complex distributed environments, troubleshooting and integration for applications and other cloud technology with observability tools.
- 5+ years of hands-on experience with observability tools such as New relic, Datadog, Prometeus, etc. including APM, infrastructure monitoring, logs, synthetics, alerting, and dashboard creation.
- Proven experience and willingness to work with multiple observability stacks, such as:
- Datadog, Dynatrace, AppDynamics
- Prometheus, Grafana, etc.
- Elasticsearch, Fluentd, Kibana (EFK/ELK)
- Splunk, OpenTelemetry,
- Solid knowledge of Kubernetes, service mesh (e.g., Istio), containerization (Docker) and orchestration strategies.
- Strong experience with DevOps and SRE disciplines, including CI/CD, IaC (Terraform, Ansible), and incident response workflows.
- Fluency in one or more programming/scripting languages: Java, Python, Go, Node.js, Bash.
- Hands-on expertise in cloud-native observability services (e.g., CloudWatch, Azure Monitor, GCP Operations Suite).
- Excellent communication and stakeholder management skills, with the ability to align technical strategies with business goals.
Preferred Qualifications
- Architect level Certifications in New Relic, Datadog, Kubernetes, AWS/Azure/GCP, or SRE/DevOps practices.
- Experience with enterprise observability rollouts, including organizational change management.
- Understanding of ITIL, TOGAF, or COBIT frameworks as they relate to monitoring and service management.
- Familiarity with AI/ML-driven observability, anomaly detection, and predictive alerting.
Why Join Us?
- Lead enterprise-scale observability transformations impacting customer experience, reliability, and operational excellence.
- Work in a tool-diverse environment, solving complex monitoring challenges across multiple platforms.
- Collaborate with high-performing teams across development, SRE, platform engineering, and security.
- Influence strategy, tooling, and architecture decisions at the intersection of engineering, operations, and business.
Job Summary
Contact
Unit #E1J, First Floor, Tower B, Godrej Eternia, Plot #70, Industrial Area, Phase 1, ChandigarhChandigarh, Chandigarh, 160002 Phone: +91 - 7814302836