Data Science 🇺🇸 Remote — USA Full-time · Fully Remote Posted May 2026

Data Scientist

Build the ML models, evaluation frameworks, and LLM benchmarking tools that underpin Densight Labs' enterprise deployments globally. Fully remote — open to US-based candidates only.

About the role

Densight Labs deploys AI into enterprise workflows across Pakistan, Saudi Arabia, UAE, and increasingly North America. The Data Scientist builds the technical infrastructure that makes those deployments reliable, measurable, and scalable — LLM evaluation frameworks, RAG pipeline architectures, fine-tuning experiments, and AI quality assurance systems.

This is a fully remote role open to US-based candidates. You will work closely with the core engineering team, informing the technical architecture of new AI deployments, building evaluation tools that measure LLM output quality, and leading the firm's AI benchmarking capability. You are the person who ensures that what we deploy performs.

Responsibilities

  • Build and maintain ML models and evaluation frameworks for enterprise AI deployments
  • Develop LLM fine-tuning pipelines for domain-specific business applications
  • Design and build RAG (Retrieval Augmented Generation) architectures for client knowledge bases
  • Lead AI quality assurance: define output quality metrics, build evaluation test sets, run regression testing
  • Benchmark AI model performance across Claude, GPT-4, Gemini for specific enterprise use cases
  • Collaborate with AI Engineers in Pakistan and GCC on technical architecture decisions
  • Contribute to internal AI capability documentation and playbooks

Requirements

  • MS or PhD in Computer Science, Statistics, Machine Learning, or a related technical field
  • Hands-on experience with LLM fine-tuning (LoRA, QLoRA, or RLHF approaches)
  • Experience building RAG pipelines: chunking strategies, embedding models, vector retrieval
  • Proficiency in Python and ML libraries: PyTorch, Transformers, scikit-learn, LangChain or LlamaIndex
  • Experience designing and running model evaluation frameworks for production LLM applications

Nice to have

  • Experience with enterprise AI deployment in regulated industries (healthcare, financial services)
  • Familiarity with Anthropic's model architecture and API; active Claude user
  • Background in NLP with specific experience in information extraction, summarisation, or classification
  • Contributions to open-source ML or LLM projects

What we offer

  • Competitive market salary benchmarked to US Data Scientist rates, with performance bonuses
  • Full access to every enterprise AI API and tool: Claude, OpenAI, Gemini, Perplexity
  • Monthly learning budget for courses, conferences, and research resources
  • Fully remote — work from anywhere in the United States
  • Flexible hours aligned with global team (Pakistan and GCC overlap required for ~4 hours/day)
  • Opportunity to define the technical research agenda as the firm scales to North America