Gnani.ai logo
Back to all positions
Job Description

Applied ML Engineer - GenAI (RAG and AgenticAI Frameworks)

Engineering

Remote

3 - 5 years experience


Responsibilities

We’re hiring a Senior ML Engineer / Applied Scientist who can develop scalable RAG systems, build agentic architectures, optimize LLM inference with vLLM/SGLang, and work with modern vector databases including Qdrant and MongoDB Vector Search, while integrating and routing both open-source and closed-source LLMs.

This is a deeply hands-on, product-focused engineering role.

Key Responsibilities

1. Design & Build Production-Ready RAG Pipelines

● Create end-to-end RAG architectures: ingestion → chunking → embeddings → indexing → retrieval → reranking → grounded generation.● Integrate open-source models (LLaMA, Qwen, Mistral, Gemma) and closed-source APIs (GPT-4.x, Claude 3, Gemini, Grok).● Implement:○ Hybrid search (dense + sparse + BM25)○ Multi-hop retrieval○ Query rewriting & decomposition○ Graph-based retrieval○ Cache-first retrieval strategies● Build multilingual pipelines with strong support for Indic languages + low-resource language documents.

2. Vector Database Engineering (Qdrant + MongoDB Vector Search)

● Build & optimize vector search systems using:○ Qdrant (HNSW, sparse-dense hybrid search, payload filtering, distributed mode)○ MongoDB Atlas Vector Search (embedding fields, hybrid search, faceted filtering, metadata-aware retrieval)● Implement scalable embedding stores with:○ FAISS○ Pinecone○ Weaviate○ Milvus/Zilliz○ Elastic/OpenSearch● Optimize indexing, recall, latency, and filtering over large enterprise datasets.

3. Build Agentic Systems (LangGraph, CrewAI, AutoGen, LangChain)

● Create complex multi-agent workflows using:○ LangGraph → graph-based LLM state machines○ CrewAI → role-driven multi-agent collaboration○ AutoGen → LLM-to-LLM conversational agents○ LangChain LCEL → composable agent pipelines○ LlamaIndex Agents● Implement advanced agentic behaviors:○ ReAct, Reflexion, Tree-of-Thought○ Planner–executor loops○ Tool selection, code execution, browser/search tools○ Long-running agents with memory/state management

4. High-Performance Inference (vLLM, SGLang)

● Deploy and optimize LLM inference with:○ vLLM (PagedAttention, continuous batching, fast KV cache)○ SGLang (router-driven fast inference + structured outputs)○ TensorRT-LLM○ TGI/TGIv2○ Ollama (local inference)● Implement:○ Model parallel routing○ Quantization (AWQ, GPTQ, FP8/INT4)○ Streaming APIs○ Multi-model load balancing○ Cost vs performance routing (open vs closed models)

5. Document Intelligence & Ingestion Pipelines

● Build ingestion for PDFs, scanned docs, forms, multi-language content.● Use OCR + layout tools like:○ Unstructured, DocTR, TrOCR, Tesseract, LayoutLMv3● Metadata extraction, chunking, semantic filtering, and embeddings pipelines.

6. Evaluation & Observability

● Build evaluation using:○ Ragas, DeepEval, LangSmith, TruLens● Measure:○ Retrieval quality (recall@k, reranker improvements)○ Grounded answer faithfulness○ Hallucination rate○ Agent success/failure rates○ vLLM/SGLang throughput, latency, cost● Create dashboards for real-time monitoring.

7. Product & Systems Integration

● Work with backend, product, and infra teams to deploy RAG + Agentic pipelines in production.● Implement safe, consistent, cost-aware model usage across workloads.● Build robust APIs, error handling, retry logic, and fallbacks.

Required Skills

Core ML & AI Skills

● 4–8+ years in ML engineering or LLM-based product development.● Strong experience with RAG systems, LLM grounding, and retrieval optimization.● Experience with both open-source (LLaMA, Qwen, Mistral, Gemma) and closed-source (GPT-4.x, Claude 3, Gemini, Grok) LLMs.

Vector Databases

Hands-on experience with:

● Qdrant (essential) — indexing, payload filtering, distributed deployment, HNSW tuning● MongoDB Atlas Vector Search (essential) — embedding schema, indexing, hybrid queries
Plus experience with:○ FAISS○ Milvus/Zilliz○ Pinecone○ Weaviate○ Elastic/OpenSearch vector search

Agentic Frameworks

● LangGraph● CrewAI● AutoGen● LangChain LCEL● LlamaIndex

Inference Frameworks


Strong experience with:

● vLLM● SGLang● TensorRT-LLM● TGI/TGIv2● Ollama

Document & NLP Skills

● OCR + layout parsing● Chunking, embeddings, re-ranking● Indic language data handling

Product Skills

● Strong bias for production readiness, latency/cost optimization, reliability, and API design.

Nice to Have

● Knowledge graph integration.● Multi-agent debugging and telemetry.● Retrieval chain-of-thought or agentic reasoning research.● Experience with long-context models (128k–200k).● Contributions to Qdrant, LangGraph, CrewAI, vLLM, or SGLang.


Impact

You will build the AI core of real-world product experiences — RAG-driven assistants, automated agents, enterprise knowledge systems, and multilingual search solutions powering millions of users.


Skills Required
Primary Skills
RAG, Agentic Systems, vLLM/SGLang & Vector Databases
Apply

Submit your application

Share your details. Our team reviews every application.

Resume / CV *

We will review your profile and respond with next steps. If there is a mutual fit we move fast.
Applying For
Applied ML Engineer - GenAI (RAG and AgenticAI Frameworks)

Engineering

Remote

3 - 5 years experience


Key Skills
RAG, Agentic Systems, vLLM/SGLang & Vector Databases

Application Tips
  • Ensure your resume is up to date
  • Highlight relevant experience