DataHeroes is a fast-growing technology company founded by serial entrepreneurs that is building the first database for machine learning and data science to help significantly improve model quality and reduce the time it takes to build them. DataHeroes is backed by top VCs, one of which is AI Fund, led by professor Andrew Ng, who was the head of Google brain, chief scientist at Baidu, founder of Coursera and is one of the top thought leaders in data science. The company is building a new Python machine learning library, which will allow data scientists, data analysts and machine learning engineers to take a small sample of their entire dataset, while still maintaining its statistical properties, including the edge cases, so that models built on the small sample will provide the same quality as models built on the entire dataset. The small sample is kept in a unique data structure, which allow its users to perform any operation they need to, such as data exploration, data cleaning, data labeling, feature extraction, model training, validation and testing, explainability and more, in a fraction of the time they’re used to invest today in these operations.
We’re looking for a Python Developer which will be part of a small and proficient team. The person will be working with different databases and different cloud providers and will also develop and handle machine learning engineering-related items
- Education – B.A. or B.Sc. in computer science or an equivalent field from one of the leading academic institutes;
- At least 3 years of Python experience;
- Experience working with cloud environments (AWS, Google, Azure);
- Experience working with SQL and NoSQL Databases;
- Excited to tackle difficult research questions;
- Proficiency in English – writing, reading and speaking.
- Integrating the DataHeroes library with various SQL and No-SQL databases and with various cloud environments (AWS, GCP, Azure, etc.);
- Enhancing the DataHeroes library functionality by adding new features and capabilities to the unique DataHeroes data structure, such as cross validation support and adding/removing features to/from the dataset;
- Identifying new datasets across a variety of domains (computer vision, NLP, fraud detection, churn prediction, etc.) to test the library across various use cases, identify its bottlenecks and improving its performance;
- Participating in the development of cluster computation using Spark and similar technologies.
- Joining a small and unique startup at the beginning of its journey and being part of the team designing and shaping the company’s solution;
- The employees joining us would be entitled for company options;
- Participating once or twice a year in company events, usually abroad, in which all company employees get to meet each other face-to-face and have fun.