Sr Data Scientist

Department: MiscLocation: Santa Clara, CA
Apply Now

Position Description

Summary

We’re Blue River, a team of innovators driven to create intelligent machinery that solves monumental problems for our customers. We empower our customers – farmers, construction crews, and foresters – to implement safer and more sustainable solutions, driving increased profitability with less reliance on scarce labor. We believe that focusing on the small stuff – pixel-by-pixel and task-by-task – leads to big gains. 

Blue River Technology is based in Santa Clara, CA. 

Job Responsibilities

  • Define, curate, and manage datasets of images, sensor data, and scenarios that are designed to increase the trust and safety of autonomy.
  • Work closely with data engineers and field data capture technicians to mine fleet data and identify open needs.
  • Define frameworks for cataloging and searching scenario-based data to serve multiple stakeholders, including computer vision and robotics teams.
  • Monitor, investigate, and fix data ingestion issues related to dataset curation for training and testing computer vision algorithms.
  • Investigate data quality and actively participate in conceptualizing and developing short and long-term solutions.
  • Provide data and infrastructure support to internal teams.
  • Provide guidance to improve the stability, security, efficiency, and scalability of image data pipelines.
  • Improve code quality through writing unit tests, automation, and performing code reviews.
  • Examine the correlation between customer experience and virtual performance in like scenarios; adjust as needed. Ensure that defined safety and productive test cases are adequately covered with curated scenarios.

Requirements

  • Master’s degree in Math, Physics, Data Science, or related field plus 5 years of related experience.
  • Required skills:
    • Implement and deploy computer vision and machine learning-based data pipeline systems using semantic segmentation, image & video classification, object detection, supervised, and unsupervised learning (5 yrs).
    • Experience working with data engineers, data scientists, software engineers, and field staff through the lifecycle of developing and deploying a machine learning system (4 yrs).
    • Perform non-parametric statistical tests and analysis on large image-based data sets using sklearn, scikit-image, scipy, and OpenCV (3 yrs).
    • Write technical documentation, tutorials, and summaries to train data collection teams and conduct on-site training (3 yrs).
    • Deploy scalable cloud-based solutions to mine, preprocess, resize, crop, rectify, and filter image-based data sets (5 yrs).
    • Implement code using Python libraries, including NumPy, SciPy, OpenCV, Pandas, Seaborn, Matplotlib, CUDA, Pytorch, and TensorFlow (5 yrs).
    • Design, implement, debug, and deploy stereo image-based data pipelines using Apache TeamCity, AWS Airflow, Redis, Google appsheet, Data bricks datatables, Celery, and advanced search solutions on LabelBox with open source models such as CLIP and BLIP (6 mos).
    • Design, build, and debug custom Python pipelines using Python Functools for processing large image datasets, deploy these pipelines using Docker and Docker-compose (1 yr).
    • Use statistical sampling algorithms to design efficient data collection methods for large stereo camera-based image datasets and coordinate data collection (6 mos).
    • 10% domestic travel required. Position is remote, but there is domestic travel to test/training sites required and regular in-office time (about once a week) to interact with local workstations and participate in in-person meetings.

The US annual base salary range for this position is $209,862 – $275,000, along with eligibility for Blue River’s bonus and benefit programs.

#LI-DNI