Job Description
Job Description
⚡ Research Scientists/Engineers (all levels)
🔍 Focus on Vision Data Infrastructure
🤖 Fundamental AI Research Institute
🌎 San Francisco Bay Area, USA
💸 $250,000 - $600,000 salary + annual bonus
Come join one of the only research institutions globally with resources to compete with top AI companies =>10s of 1000s of GPUs to explore state-of-the-art research in LLMs, Multimodal and Agentic AI.
Currently seeking AI talent with expertise in building scalable pipelines for vision data to support both image/video generative training and multi-modal alignment. You’ll design high-performance pipelines for large-scale image and video datasets , enabling efficient pretraining, alignment, and simulation-based data generation.
Responsibilities:
Vision Data Sourcing & Curation
- Collect and organize image and video data from open datasets and the web.
- Handle data cleaning, filtering, deduplication, and metadata generation.
- Ensure ethical and compliant data collection at scale.
Processing & Augmentation
- Build high-throughput pipelines for vision data preprocessing (frame extraction, resolution normalization, format conversion, latent caching).
- Implement GPU-accelerated augmentation and distributed data loading (WebDataset, TFRecords, Parquet).
Synthetic & Simulation-Based Data Generation
- Use simulation tools (e.g., Unreal Engine 5 , Isaac Sim, Unity) to generate high-quality synthetic vision data .
- Create specialized datasets for VLM training , visual reasoning , and agent interaction .
Requirements:
- Strong experience with data engineering , computer vision , or machine learning infrastructure .
- Expertise in building and scaling ETL/data pipelines for large unstructured datasets.
- Proficiency with Python , PyTorch , and distributed data frameworks (e.g., Ray , Spark , Dask ).
- Experience with WebDataset , TFRecords , Parquet , or similar high-throughput data formats.
- Familiarity with GPU-accelerated preprocessing , NVIDIA DALI , or equivalent systems.
- Understanding of image/video codecs , data compression , and cloud storage optimization .
Preferred Experience:
- Prior work with simulation-based or synthetic data generation using Unreal Engine , Isaac Sim , or Unity .
- Experience curating datasets for multimodal or vision-language model training.
- Knowledge of data ethics , privacy , and compliance frameworks for large-scale AI datasets.
- Experience contributing to open datasets or data-centric AI research .
Why apply:
- Opportunity to join a fast-growing core team that are already pushing AI breakthroughs
- Highly competitive salary package
- Work alongside ambitious and bright superstars from tech and academia
- Medical, Dental and Vision Insurance
- Relocation package available
🌎 San Francisco Bay Area, USA
📧 Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at [email protected]
Job Tags
Relocation package,