Opportunity Description
Design and build data-centric GenAI methods for synthetic data generation, multimodal data curation, data augmentation, filtering, deduplication, and quality assessment.
Develop and evaluate synthetic data pipelines for text, speech, vision, and multimodal GenAI use cases, including controllable generation, provenance tracking, safety checks, and domain adaptation.
Build evaluation frameworks that connect data quality to downstream GenAI model performance, including benchmark design, ablation studies, error analysis, and model-feedback loops.
Research and implement modern generative AI techniques, including LLM/VLM-based data generation, fine-tuning, instruction tuning, preference optimization, and model-based data labeling.
Build scalable data and ML pipelines for acquisition, cleaning, transformation, metadata extraction, embedding generation, labeling, training, and evaluation.
Develop produ...
Interested in this opportunity? Apply now through Expertini.
Apply for this Position