Data science is hot! Changing your job title to data scientist on LinkedIn will trigger a small army of recruiters to swarm you with all kinds of awesome job offers. So in terms of career prospective jumping on the data science hype train seems like a good bet. But what skills do you need in order to be a good data scientist? No wait, let’s rephrase that question: what is data science in the first place?
This might seem like a silly question but if we dig a little deeper we find that data science actually does not always mean the same thing. If you ask different people it can be anything from market research to hardcore artificial intelligence. Although both of these extremes build on utilizing data, the skills needed for either discipline vary wildly.
So what does data science mean in the context of bol.com? This series of blog posts is meant to shed some light on what we, the data scientists of bol.com, think data science is and what a good data scientist should be able to do. We have identified four key aspects that should be central in the skillset of every bol.com data scientist and asked some of our team members how these aspects feature in their day-to-day work.
When thinking of a solution to a data science problem you never know whether something is going to work or create actual value. The only real way of knowing is by testing. So just like ‘real’ science we have adopted an experiment driven way of working. Our team members can tell you all about how we try to do this at bol.com.
If there is one advice that I could give anyone who wants to do data science at bol.com it would be ‘keep it simple stupid’. The field of machine learning is rapidly developing and techniques get ever more complicated. Do you need those techniques? Probably not. Even the more basic models can be very powerful and their advantage is that they are often easier to implement and understand. This way you can get your model out there much quicker and you will be producing meaningful results for bol.com much faster. Is it perfect then? Probably not, but it allows you to validate your model, learn about its shortcomings. And iterating and improving on an existing simple base solution is much easier than doing the complicated and sophisticated thing from the start.
We fail a lot! We fail fast! And we are extremely proud of this. The more we fail, the more data we generate, the more wrong assumptions we get rid of. Failing is a crucial part of learning as is A/B testing in a “data-driven” methodology. However, just being good at failing does not mean your failures are actually beneficial. In all our experiments we always want to make sure that the following things are covered:
- the test had a purpose, there was a hypothesis behind it
- the hypothesis was validated
- you learned from it and came up with a new corrected plan/hypothesis
As long as you keep up repeating the above 3 steps and do them as fast as possible, you will be successful at bol.com. Keep it to the point, keep it simple, keep it fast.
You cannot be a Data Scientist without wanting to validate what you are doing and to iterate on your results. Maximize the number of tests to maximize the gain in the short run and in the long run. So, don’t be afraid to fail. As long as your results have value and you learn from it, it is ok to screw up. Also, don’t wait too long with reaching out to other Data Scientists to help you with a problem. Everybody is happy to help and insights are always valuable.
Great, you have trained a model which is performing pretty well. Now what? Let’s test it on our customers for which the model was made, and see whether it improves their experience. This is probably the most overlooked and at the same time most important aspect of any data science product. An alteration to an existing product or model cannot go untested! As a data scientist you represent this way of thinking. You understand and conduct A/B/n tests, multivariate test or even master bandit algorithms to statistically validate the hypotheses you began your machine learning process with.
Most of the discussions within a Data Science project are about checking stuff: validating models, exploring edge cases, debugging your own assumptions, trying to understand the models you created. All those things have one thing in common: let the data do the talking. There is little need for people with opinions who think they already figured out what is going on before any data has been studied. The end result should be a methodology that is verified, grounded and can be explained in a real life setting.