Lead Data Scientist, Real Estate
Cape Analytics
WITHIN 1 MONTH, YOU’LL
- Develop scientifically rigorous, creative methodologies to continuously improve our machine learning models
- Incorporate machine learning and data-driven decisioning into the core of our infrastructure
- Explore and mine new data sources that will help optimize and validate our models
- Link model capabilities to market needs by customizing models, designing and running validation studies
WITHIN 3 MONTHS, YOU’LL
- Start to assist in Sprint planning and Quarterly planning with the team
- Contribute to design and automation of model training, model post-processing and evaluation pipelines at scale
- Leverage the extensive data generated by Cape in addition to data from external sources to generate structured knowledge about our feature space
- Implement automated solutions for ensuring data quality and delivery
- Contribute to peer mentorship, knowledge bases, and skills transfer
WITHIN 6 MONTHS, YOU’LL
- Be primarily responsible for roadmap planning with Product team along with Sprint planning and Quarterly planning
- Present your results internally and externally
- Defend your methodology and incorporate feedback from internal teams as well as customers
- Improve model performance by identifying failure modes using supervised and unsupervised learning techniques
- Ideate and implement data-driven methodologies to help scale model performance across geographical, climatic, and temporal dimensions
THE SKILL SET
- PhD in a STEM field with 5 years of hands-on industry experience or Masters in a STEM field with 7 years of hands-on industry experience
- 1-3 yrs of technical management experience of other data scientists
- A background in the Finance or Real Estate sector is strongly preferred. This includes familiarity with Real Estate data such as MLS and other public record data, Mortgage Loans, Automated Valuation Models, Asset Valuations, Cash Flow Analysis, Risk Analysis etc.
- Solid knowledge of statistical techniques, including hypothesis testing, statistical sampling, significance testing, statistical inference, maximum likelihood estimation, and experimental design, among others
- Mastery of, supervised and unsupervised algorithms and their implementations, machine learning concepts including regularization, learning curves, optimizing hyperparameters, cross-validation, among others
- Advanced knowledge and significant programming experience in Python programming or other scripting language including relevant libraries like numpy, pandas, SciPy, matplotlib
- Familiarity with the Linux environment including shell scripting, Git and tools for reproducibility (e.g. virtual environments, Docker)
- Demonstrated expertise in building data tools for ETL and data analysis
- Experience in building meaningful data visualizations using at least one scripting-based visualization tool such as matplotlib, d3.js or bokeh
- Nice to haves: Experience designing data schemas and extracting data from SQL and NoSQL databases. Experience with GIS systems. Experience with modern data technologies, e.g. Spark, pytorch, Jupyter Notebook, DockerExperience with cloud computing on AWS or GCP
Something looks off?