Job Description
You will lead our client's forward-deployed reliability functionassisting 24/7 customer support, hands-on hotfixes, and onsite remediation. Youll dive into code when needed, ship patches safely, build customer-facing tools and internal utilities, and assist in a small triage pod. Youre equal parts incident commander, field engineer, and player-coach who turns real-world issues into stable, scalable solutions.
What Youll Do
- Assist 24/7 Reliability: Participate in an on-call rotation covering critical incidents across software, firmware, networking/LTE, and on-site hardware setups (docks, sensors, EXT/compute, cabling).
- Hands-on Engineering: Triage issues, reproduce defects, write/merge hotfixes, and upstream fixes to core services with tests, feature flags, and safe deploys.
- Onsite Remediation: Travel to customer sites for high-severity issues, new site stabilizations, and complex integrations; coordinate with local stakeholders and vendors to facilitate customer needs.
- Build Tools & Products: Create small services, scripts, and dashboards that solve recurring customer pain (telemetry, log scrapers, health checks, alerting, one-click diagnostics).
- Run Incident Management: Establish SEV levels, lead bridges, drive comms (internal/external), deliver RCAs within SLA, and track corrective actions to closure.
- Operational Excellence: Maintain runbooks, golden signals/SLOs, playbooks, and site commissioning checklists; push automation to eliminate manual toil.
- Partner Cross-Functionally: Work tightly with Customer Success, Product, and Core Eng to prioritize fixes, capture field insights, and harden releases before wide rollouts.
- Security & Compliance: Handle sensitive data responsibly, follow access controls, and ensure logs/RCAs meet public-safety expectations.
- Readiness & Training: Level up CS and field partners with training, shadowing, and cert programs; ensure every site has a clear break-glass plan.
Qualifications
- Bachelors degree in Engineering, Electrical/Mechanical/Systems Engineering, or related technical field (or equivalent experience)
- 24+ years in Technical Support/Platform/Firmware roles with direct on-call ownership; 12+ years leading small teams or rotations.
- Strong debugging across hardware and software: backend services (Python/Go/Node), web clients, Linux, networking (LTE/VPN/DNS), edge devices, or embedded/Linux SBCs (e.g., Jetson), with basic hardware troubleshooting knowledge
- Proven incident management in production environments; disciplined use of logs, metrics, traces, and safe rollout patterns (canary, feature flags, rollback).
- Comfortable reading and modifying code(C++ /Shell / Python), writing tests, and merging hotfixes to production under pressure.
- Excellent internal and external customer communicationclear, calm, and accountability-first.
- Ability to travel on short notice; valid drivers license; comfortable around rooftops/docks and light hands-on hardware work.
- Able to lift 50lbs for short durations.
- Current part 107 commercial drone pilots' licenses or the ability to obtain them.
Nice to Have
- Public safety, robotics, autonomy, or telecom experience.
- Cloud ops (GCP/AWS), Terraform, container orchestration, CI/CD.
- C++, Python, React/TypeScript familiarity
- FAA/DFR familiarity; RF/interference troubleshooting; ADS-B/UTM exposure.
- Background working with secure environments and audit-ready processes.
- Prior experience with DJI systems
On-Call & Travel Expectations
- On-call: Participates in and manages a 24/7 rotation; off-hours and weekends as needed.
- Travel: ~3060% variable; spikes for critical incidents, new city launches, and complex integrations.
Job Tags
Part time, Local area, Weekend work,