about this role
The challenge
As a Site Reliability Engineer / Systems Administrator, you’ll play a key role in scaling and strengthening the reliability of Jotelulu’s cloud platform. Your mission will be to monitor and optimize cloud systems, automate processes, ensure effective incident management, and maintain a robust, scalable, and secure infrastructure that supports mission-critical services.
You’ll be part of an Operations & SRE environment focused on reliability, performance, and continuous improvement. Working within a highly collaborative engineering setup, you will contribute to building and maintaining infrastructure across multiple availability zones, ensuring stability and operational excellence while supporting the growth of the platform.
Collaboration will be essential. You’ll work closely with DevOps, Product, and Development teams to build reliable services, support infrastructure decisions, lead incident response, proactively detect risks, and ensure systems and teams can scale efficiently and confidently.
Requirements that are important for usWe are looking for a SysAdmin / SRE with strong experience in cloud infrastructure, systems administration, and reliability practices, capable of operating and improving large-scale environments.
Relevant experience and expected outcomes:
- Proven experience managing large-scale cloud or MSP infrastructures.
- Expert-level Linux systems administration.
- Experience with Windows Server (2012–2025) in production environments.
- Strong troubleshooting skills across systems, networking, storage, and application layers.
- Solid networking knowledge including TCP/IP, DNS, load balancing, firewalling, BGP, and network virtualization.
- Experience with storage solutions such as Ceph, NFS or similar technologies.
- Familiarity with IaaS orchestration platforms such as CloudStack or similar.
- Experience implementing and maintaining monitoring and observability tools.
- Experience with Infrastructure as Code and automation using Ansible.
- Experience designing or maintaining CI/CD pipelines.
- Knowledge of databases such as MySQL, MariaDB or PostgreSQL.
- Strong understanding of ITIL processes for incident, problem, and change management.
- Strong documentation practices and focus on operational excellence.
Key skills and expected impact:- Strong analytical mindset focused on reliability, scalability, and continuous improvement.
- Ability to monitor systems, detect risks proactively, and minimize downtime.
- Capability to lead incident response and ensure effective resolution.
- Strong communication skills in Spanish and intermediate English.
- Ability to collaborate across teams and contribute to infrastructure and operational improvements.
- Experience optimizing distributed systems performance.
- Knowledge of advanced security and system hardening practices.
- Ability to improve operational workflows and work with ticketing systems.
Tools
- Operating Systems: Linux, Windows Server.
- Automation: Ansible, scripting (Bash, Python, PowerShell).
- CI/CD: pipeline implementations.
- Monitoring & Observability: Zabbix, Prometheus, Grafana, ELK Stack.
- Storage: Ceph, NFS or similar.
- Orchestration: CloudStack, OpenStack.
- Databases: MySQL, MariaDB, PostgreSQL.
- Collaboration & ITSM: tools aligned with ITIL practices.
Originally posted on Himalayas