Applied ML Engineer - GenAI (RAG and AgenticAI Frameworks)
Engineering
Remote
3 - 5 years experience
Responsibilities
We’re hiring a Senior ML Engineer / Applied Scientist who can develop scalable RAG systems, build agentic architectures, optimize LLM inference with vLLM/SGLang, and work with modern vector databases including Qdrant and MongoDB Vector Search, while integrating and routing both open-source and closed-source LLMs.
This is a deeply hands-on, product-focused engineering role.
1. Design & Build Production-Ready RAG Pipelines
● Create end-to-end RAG architectures: ingestion → chunking → embeddings → indexing → retrieval → reranking → grounded generation.● Integrate open-source models (LLaMA, Qwen, Mistral, Gemma) and closed-source APIs (GPT-4.x, Claude 3, Gemini, Grok).● Implement:○ Hybrid search (dense + sparse + BM25)○ Multi-hop retrieval○ Query rewriting & decomposition○ Graph-based retrieval○ Cache-first retrieval strategies● Build multilingual pipelines with strong support for Indic languages + low-resource language documents.2. Vector Database Engineering (Qdrant + MongoDB Vector Search)
● Build & optimize vector search systems using:○ Qdrant (HNSW, sparse-dense hybrid search, payload filtering, distributed mode)○ MongoDB Atlas Vector Search (embedding fields, hybrid search, faceted filtering, metadata-aware retrieval)● Implement scalable embedding stores with:○ FAISS○ Pinecone○ Weaviate○ Milvus/Zilliz○ Elastic/OpenSearch● Optimize indexing, recall, latency, and filtering over large enterprise datasets.3. Build Agentic Systems (LangGraph, CrewAI, AutoGen, LangChain)
● Create complex multi-agent workflows using:○ LangGraph → graph-based LLM state machines○ CrewAI → role-driven multi-agent collaboration○ AutoGen → LLM-to-LLM conversational agents○ LangChain LCEL → composable agent pipelines○ LlamaIndex Agents● Implement advanced agentic behaviors:○ ReAct, Reflexion, Tree-of-Thought○ Planner–executor loops○ Tool selection, code execution, browser/search tools○ Long-running agents with memory/state management4. High-Performance Inference (vLLM, SGLang)
● Deploy and optimize LLM inference with:○ vLLM (PagedAttention, continuous batching, fast KV cache)○ SGLang (router-driven fast inference + structured outputs)○ TensorRT-LLM○ TGI/TGIv2○ Ollama (local inference)● Implement:○ Model parallel routing○ Quantization (AWQ, GPTQ, FP8/INT4)○ Streaming APIs○ Multi-model load balancing○ Cost vs performance routing (open vs closed models)5. Document Intelligence & Ingestion Pipelines
● Build ingestion for PDFs, scanned docs, forms, multi-language content.● Use OCR + layout tools like:○ Unstructured, DocTR, TrOCR, Tesseract, LayoutLMv3● Metadata extraction, chunking, semantic filtering, and embeddings pipelines. ● Build evaluation using:○ Ragas, DeepEval, LangSmith, TruLens● Measure:○ Retrieval quality (recall@k, reranker improvements)○ Grounded answer faithfulness○ Hallucination rate○ Agent success/failure rates○ vLLM/SGLang throughput, latency, cost● Create dashboards for real-time monitoring.7. Product & Systems Integration
● Work with backend, product, and infra teams to deploy RAG + Agentic pipelines in production.● Implement safe, consistent, cost-aware model usage across workloads.● Build robust APIs, error handling, retry logic, and fallbacks.● 4–8+ years in ML engineering or LLM-based product development.● Strong experience with RAG systems, LLM grounding, and retrieval optimization.● Experience with both open-source (LLaMA, Qwen, Mistral, Gemma) and closed-source (GPT-4.x, Claude 3, Gemini, Grok) LLMs.Hands-on experience with:
● Qdrant (essential) — indexing, payload filtering, distributed deployment, HNSW tuning● MongoDB Atlas Vector Search (essential) — embedding schema, indexing, hybrid queriesPlus experience with:○ FAISS○ Milvus/Zilliz○ Pinecone○ Weaviate○ Elastic/OpenSearch vector search● LangGraph● CrewAI● AutoGen● LangChain LCEL● LlamaIndex
Strong experience with:
You will build the AI core of real-world product experiences — RAG-driven assistants, automated agents, enterprise knowledge systems, and multilingual search solutions powering millions of users.
Skills Required
Submit your application
Share your details. Our team reviews every application.
Applied ML Engineer - GenAI (RAG and AgenticAI Frameworks)
Engineering
Remote
3 - 5 years experience
Key Skills
Application Tips
- Ensure your resume is up to date
- Highlight relevant experience