Design, implement, and operate scalable Elastic Stack (ELK) solutions for logs, metrics, traces, and events.
Own end-to-end log ingestion pipelines using Beats, Logstash, Elastic Agent, and custom integrations.
Perform log parsing, filtering, cleanup, normalization, and enrichment using Grok, conditionals, processors, ingest pipelines, and ECS standards.
Define and implement ingestion best practices for performance and reliability.
Configure and maintain Kibana dashboards, visualizations, Lens, and Canvas for operational and business observability use cases.
Experience using Elastic Observability/SIEM and Elastic APM to instrument applications, collect and correlate logs, metrics, and traces, perform performance analysis, and visualize service dependencies.
Create and manage Elastic Machine Learning jobs (anomaly detection using multi metrics, forecasting) and interpret outcomes to generate insights and alerts.
Integrate Elasticsearch with other observability tools such as:
Prometheus & Grafana (metrics collection and visualization).
SolarWinds and Dynatrace (infrastructure monitoring and APM).
Correlate logs, metrics, traces, and events across platforms to enable unified observability.
Design observability solutions that support operations, infrastructure, and application teams.
Setup kibana alert rules and write advance watcher scripts.
Leverage Elastic AI Assistant, including LLM integrations in cloud environments (especially AWS), to enhance investigation, analysis, and insights.
Manage Elasticsearch clusters, including:
Familiar in installing ELK tech stack, perform patching and upgrade.
Node roles, index lifecycle management (ILM), shard strategies, and data tiers
Security (users, roles, API keys, TLS).
Performance tuning, scaling, and troubleshooting.
Apply ELK cluster management best practices for stability, availability, and resiliency.
Monitor cluster health and proactively address capacity and performance issues.
Instrument and observe AWS workloads including: EC2, Lambda, ECS/EKS, API Gateway, RDS, S3, and other supporting services.
Integrate observability deployment (eg: logstash deployment) into DevSecOps practices.
Use automation tools where applicable for operational tasks (eg: for data extraction/cleaning/transformation, reconciliation) using scripting or programming languages (Python) where applicable.
Requirements:
Strong understanding on observability concepts, eg: know what is considered as important telemetry, golden signal, how to monitor, how to derive insights, etc.
Able to propose solution that can uplift observability maturity in the organization.
Strong hands-on experience with Elasticsearch, Logstash, Kibana, and Elastic ML.
Strong know how to perform log ingestion, parsing, Grok patterns, filtering, and enrichment.
Experience managing and operating production enterprise ELK clusters.
Experience with monitoring tools such as Solarwinds, Prometheus, Slack, Grafana, Dynatrace, or similar tools.
Good understanding of AWS services (EC2, S3, Lambda, VPC, Cloudwatch) relevant to observability.
Familiarity with Rest API, AI/ML, LLMs, RAG, Graph Databases, OTEL and emerging observability intelligence concepts.
Experience on topology mapping or service dependency visualization.
Strong scripting and automation skills.
Experience with CI/CD pipelines and deployment automation for logstash pipeline deployment or dashboard/canvas deployment.
Good understanding of infra (Servers, network, storage) and application tech. stack monitoring.
Ensure observability configurations meet security and compliance requirements.
Familiarity with Erlang, Java and MQ application architecture for understanding application behavior and identifying useful observability telemetry would be an advantage.
Strong communication and stakeholder engagement skills, with the ability to translate complex telemetry data into clear, actionable insights.
Strong sense of ownership and accountability, with ahigh level of commitment to delivery, quality, and outcomes.