Aktana is looking for a result-driven DevOps Engineer who is passionate about leading edge Cloud and SaaS technologies, and who is able to think big picture and yet is a hands-on system architect and mentor to his/her team.
The ideal candidate is driven to build highly scalable, fault-tolerant, and easy to administer SaaS infrastructure for deploying, configuring, monitoring, maintaining, and troubleshooting Aktana services. You are pro-active and organized, diligent about documentation, can’t sleep unless Nagios has everything covered, and don’t feel a job is done until it’s automated for the next time.
This is an opportunity to join and grow our operations team, the process, and the way our overall infrastructure is run.
Essential Duties & Responsibilities:
- Installing, configuring, monitoring, and maintaining Aktana SaaS services on different environments. Environments include internal development, testing, and staging environments as well as production environments.
- Monitoring systems, databases and networks for proper operation and performance.
- Providing a 7×24 on call support for the operations infrastructure.
- Establishing recommended configurations for the applications operating environment, including computer hardware, storage, software and configuration necessary to properly host our applications.
- Establishing standard processes for diagnosing issues, tracking status and escalating issues within and outside the group.
- Establishing product and process improvement plans to reduce support effort and increase product availability and scalability.
- Establishing operational objectives, strategies and work plans to improve current operations as well as planning for future products and customer requirements.
- Establishing and assuring adherence to budgets, schedules, work plans and performance requirements.
- This role involves technical implementation and cross-functional collaboration to meet our business goals in a fast-pace environment.
- Working together with engineering teams on design, reliability and maintenance issues.
The candidate is expected to be 100% hands-on, self-motivated, proactive and solution-oriented. They must be willing to mentor and challenge their staff, lead technical projects, assist team members in meeting their individual goals and to promote a positive attitude and work culture.
Required Experience & Skillset:
- Excellent troubleshooting, debugging, and problem solving skills.
- Experience in managing SaaS Operations and Infrastructure is a must.
- Excellent Python and/or Perl and/or Shell scripts programming skills.
- Experience with provisioning clusters on AWS EC2 / RDS / S3 / EMR, Rackspace, and/or Google Compute.
- Experience with build/deploy infrastructure (e.g. Jenkins, Rundeck), and build tools (e.g. Maven, Ant, Make, CMake, etc).
- Demonstrated low level OS experience (paging, swapping, load, user, kernel analysis), practical file system experience (I/O, clustering, NFS, CIFS, fibre channel, iSCSI, etc.).
- Demonstrable knowledge and experience in networking/distributed computing, routing, and client/server programming on Linux and Unix including TCP/IP, UDP, ports, multicast, unicast, traceroute, ping, DNS.
- Infrastructure Engineering: Proven experience capacity planning, performance tuning, and infrastructure architecture.
- Understanding of scaling horizontally and vertically web, application and data systems.
- Hands-on experience with RDBMS installation, administration, and tuning preferably MySQL database.
- Familiarity with system OS-level metrics e.g. number of processes, threads, handles, virtual and physical memory. Required for Linux.
- Knowledge of OS-specific performance monitoring tools (Performance Viewer, vmstat, mpstat, iostat, sar).
- Knowledge of installation, configuration and monitoring of Apache HTTP server.
- Experience with load balancer concepts including HA, VIPs, and SNAT. Fundamental knowledge of core Enterprise LINUX (Red Hat/CentOS) with a focus upon building, maintaining, securing and performance tuning systems.
- Experience with virtual infrastructure platforms is a must.
- Experience with Java/J2EE platforms. Knowledge of JVM tuning and troubleshooting.
- SNMP-based NMS monitoring systems for performance trending analysis as well as Nagios platform alerting.
- You have managed CapEx planning, contract/vendor relations, and asset inventory management.
- Must be able to work a flexible work schedule that may include nights, weekends, and holidays.
- BS/MS degree in Computer Science or related fields and/or equivalent work experience.