Data Analyst
Location: Cupertino (hybrid), CAPosted On: 04/01/2024
Requirement Code: 67609
Requirement Detail
• Design, develop, and oversee the implementation of
processes and tools to assess the quality of data used in LLM model training.
• Study dataset and quality requirements, develop automated metrics to measure/monitor data quality at scale.
• Evaluate the annotation data for instruction tuning, preference alignment, and model optimization;
• Proficiency with applying quantitative methods to structured & unstructured data for complex data analysis, pattern recognition, insights generation and metrics development.
• Experience with understanding the complexity of user behavior via deep-dive investigations as well as designing, analyzing, and interpreting A/B experiments.
• Strong programming skills in data manipulation & processing (SQL & Python preferred).
• Proven expertise in data wrangling and developing data visualizations & reporting with toolings such as Tableau, AWS etc.
• Strong understanding of machine learning principles, especially in the context of NLP and LLMs
• Collaborate with Data Scientists to scrutinize annotation data and develop strategies for continuous data quality improvement.
• Recommend and implement enhancements to our quality processes, tools, and methodologies based on industry best practices.
• 5+ years of design/test/implementation/consulting experience in data quality management for machine learning model training
• Demonstrated experience in project management and cross-functional collaboration
• Exceptional analytical, problem-solving, and organizational skills
• Strong verbal and written communications skills with the ability to work effectively across internal and external organizations and virtual teams
• Fluent in an Asian language like Mandarin is a big plus.