Our thoughts about how to join in Data Science.
If you are a developer in an IT company or if you would like to become a developer in the future, if you study applied math at university, or if you are an expert in statistics, an analyst who likes to work with data, or just a technical person interested in IT trends, then you will definitely find this article interesting and useful.
Today, founders of Mindcraft.ai CEO Andy Bosyi and R&D Director Mykola Kozlenko,will help you to delve into the technical side of this science and tell you what is hidden behind Data Science, what you need to know and learn from the data scientist, which technologies are now leading Data Science Development
If you think of possibilities and prospects of Data Science, probably, you will want to expand your expertise and use these tools in your work. Even if you are a successful developer now, more knowledge and skills can always be useful.
So, let us begin with the basics. Data Science is a science that works with data and aims at creating additional benefits for the society or business through the analysis of these data. From the developers’ side, Data Science is a combination of three main components:
- Programming. Data science cannot exist without coding.
- Statistics. All the science about data is built on the basis of the laws of statistics and probability theory, with the use of applied math.
- Domain. You should understand the business area of your client. Without this knowledge you will not be able to analyze you clients’ data effectively and adequately.
Speaking about the programming part, here is what a beginner might need. If you do not know which programming language to choose, you should consider R language or Python. They are the most popular for work with DS. Initially, R was the dominant language, but now Python is becoming even more popular. MATLAB is also a very useful tool. Why these languages? Compiled languages are not optimal for the work with data, because we need an opportunity make changes in the code and promptly see how it influences the system. That is why Python is so useful. We also use a convenient environment IPython Notebook (Jupyter Notebook) in our work, which gives the whole team an opportunity to work on the project simultaneously and provides many more advantages.
There are many existing libraries and tools which simplify the life of data scientists. For example, scikit learn library, which is a simple and effective tool for work with data, is an open source tool built on the basis of NumPy, SciPy, matplotlib. This library helps to perform classification, regression, clustering, dimensionality reduction, model selection and processing of data.
Now, we come to the statistical aspects in Data Science. Here is what you need from the sphere of statistics:
- refresh your memory on the university course in Statistics. Actually, to a great extent, it is a major part of what you need to know,
- understand the Bayes’ theorem and basics of probability theory,
- know how to select data and understand how it can be useful, be able to analyze data,
- know how to extrapolate data, because even big data do not always accurately represent reality,
- use the right methods to eliminate mistakes in data,
- know how to make a sample correctly, to use the right confidence intervals,
- build the right hypothesis, know how to define the null hypothesis and confirm alternative hypotheses.
If everything listed above is not just a set of complicated words for you, if you know the meaning of these terms and are familiar with the practical application of these methods, then your way to the world of Data Science will be easier than you might think.
So, if you are familiar with programming, statistics and know how to analyze the client’s business, welcome to the world of Data Science, where you will work with such cool things like machine learning, neural networks, deep learning, OCR, etc.
Probably, you are still not sure whether you want to work with Data Science or not, and it is difficult to predict whether you will enjoy working in this area. In order to help you with your first steps, here are some recommendations. Nowadays, there are a lot of online and offline courses on Data Science, though it is rather difficult to make a choice between them. This choice is very important, especially for beginners, because your first experience will determine your future wish to continue moving in this direction. For example, there are complicated courses full of complex mathematics and formulae, they can repel you from Data Science forever. You don’t need those. Instead, we recommend courses from Stanford University, from Andrew Ng, co-founder of Coursera. This course is quite simple and understandable, so you can try to start with it.
If you have already learned some basic information and want to test your skills, there is a very interesting resource – kaggle.com. There are many different competitions on this website. Companies publish their challenges, and anyone can try to solve them. Eventually, those who have found the best solutions, receive a reward from the company. In addition to these challenges, there are two situations that you can solve simply for yourself to find out how good your skills are. For example, you will need to predict the likelihood of the survival of people who were on the Titanic. This is how it works: at the beginning you get all the data and characteristics for every person who was on the Titanic. You need to make a model that can determine whether a person survived or not. In this task it is important to understand what factors were important and to what extent. For instance, the gender factor. Presumably, women were rescued first, which means that women have higher survival rates, and so on. The second task concerns recognition of a handwritten text. If you complete the task and appear in the list of top 1000 – you did a good job. If you get into the top hundred – then you can consider yourself a real pro!
So, if you are ready to plunge into the world of Data Science – do it! Do not hesitate, it will definitely be very interesting. At least you should try, and we wish you good luck! And who knows, maybe someday we will work together in one team.
Let us know if you wanna be a part of Data Science.
Also, we would like to say thanks to our friends from [bvblogic] Thanks for the help on these article
Nazar Savchenko, OD at MindCraft
Information Technology & Data Science