Research Scientist - Vision Data Infrastructure Job at Storm3, San Francisco, CA

UCtmdVQ2YTl1ZVExMDM2LzRNSHN4QUdyM1E9PQ==
  • Storm3
  • San Francisco, CA

Job Description

Job Description

Research Scientists/Engineers (all levels)

🔍 Focus on Vision Data Infrastructure

🤖 Fundamental AI Research Institute

🌎 San Francisco Bay Area, USA

💸 $250,000 - $600,000 salary + annual bonus

Come join one of the only research institutions globally with resources to compete with top AI companies =>10s of 1000s of GPUs to explore state-of-the-art research in LLMs, Multimodal and Agentic AI.

Currently seeking AI talent with expertise in building scalable pipelines for vision data to support both image/video generative training and multi-modal alignment. You’ll design high-performance pipelines for large-scale image and video datasets , enabling efficient pretraining, alignment, and simulation-based data generation.

Responsibilities:

Vision Data Sourcing & Curation

  • Collect and organize image and video data from open datasets and the web.
  • Handle data cleaning, filtering, deduplication, and metadata generation.
  • Ensure ethical and compliant data collection at scale.

Processing & Augmentation

  • Build high-throughput pipelines for vision data preprocessing (frame extraction, resolution normalization, format conversion, latent caching).
  • Implement GPU-accelerated augmentation and distributed data loading (WebDataset, TFRecords, Parquet).

Synthetic & Simulation-Based Data Generation

  • Use simulation tools (e.g., Unreal Engine 5 , Isaac Sim, Unity) to generate high-quality synthetic vision data .
  • Create specialized datasets for VLM training , visual reasoning , and agent interaction .

Requirements:

  • Strong experience with data engineering , computer vision , or machine learning infrastructure .
  • Expertise in building and scaling ETL/data pipelines for large unstructured datasets.
  • Proficiency with Python , PyTorch , and distributed data frameworks (e.g., Ray , Spark , Dask ).
  • Experience with WebDataset , TFRecords , Parquet , or similar high-throughput data formats.
  • Familiarity with GPU-accelerated preprocessing , NVIDIA DALI , or equivalent systems.
  • Understanding of image/video codecs , data compression , and cloud storage optimization .

Preferred Experience:

  • Prior work with simulation-based or synthetic data generation using Unreal Engine , Isaac Sim , or Unity .
  • Experience curating datasets for multimodal or vision-language model training.
  • Knowledge of data ethics , privacy , and compliance frameworks for large-scale AI datasets.
  • Experience contributing to open datasets or data-centric AI research .

Why apply:

  • Opportunity to join a fast-growing core team that are already pushing AI breakthroughs
  • Highly competitive salary package
  • Work alongside ambitious and bright superstars from tech and academia
  • Medical, Dental and Vision Insurance
  • Relocation package available

🌎 San Francisco Bay Area, USA

📧 Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at [email protected]

Job Tags

Relocation package,

Similar Jobs

The Kerryman Bar & Restaurant

Busser Job at The Kerryman Bar & Restaurant

The Kerryman Bar & Restaurant is an award winning establishment that has been open in River North for over 20 years. We are now hiring for a part time busser. Candidate must be able to work nights and weekends. The successful candidate should be prepared to work as...