# Our thoughts about How to join in Data Science.

If you are a developer in an IT company or if you would like to become a developer in the future, if you study applied math at university, or if you are an expert in statistics, an analyst who likes to work with data, or just a technical person interested in IT trends, then you will definitely find this article interesting and useful for join Data Science. Today, founders of Mindcraft.ai CEO Andy Bosyi and R&D Director Mykola Kozlenko,will help you to delve into the technical side of this science and tell you what is hidden behind Data Science, what you need to know and learn from the data scientist, which technologies are now leading Data Science Development If you think of possibilities and prospects of Data Science, probably, you will want to expand your expertise and use these tools in your work. Even if you are a successful developer now, more knowledge and skills can always be useful. So, let us begin with the basics.**Data Science**is a science that works with data and aims at creating additional benefits for the society or business through the analysis of these data. From the developers’ side, Data Science is a combination of

*three main components*:

- Programming. Data science cannot exist without coding.
- Statistics. All the science about data is built on the basis of the laws of statistics and probability theory, with the use of applied math.
- Domain. You should understand the business area of your client. Without this knowledge you will not be able to analyze you clients’ data effectively and adequately.

**programming**part, here is what a beginner might need. If you do not know which programming language to choose, you should consider R language or Python. They are the most popular for work with DS. Initially, R was the dominant language, but now Python is becoming even more popular. MATLAB is also a very useful tool. Why these languages? Compiled languages are not optimal for the work with data, because we need an opportunity make changes in the code and promptly see how it influences the system. That is why Python is so useful. We also use a convenient environment IPython Notebook (Jupyter Notebook) in our work, which gives the whole team an opportunity to work on the project simultaneously and provides many more advantages. There are many existing libraries and tools which simplify the life of data scientists. For example, scikit learn library, which is a simple and effective tool for work with data, is an open source tool built on the basis of NumPy, SciPy, matplotlib. This library helps to perform classification, regression, clustering, dimensionality reduction, model selection and processing of data. Now, we come to the

**statistical aspects**in Data Science. Here is what you need from the sphere of statistics:

- refresh your memory on the university course in Statistics. Actually, to a great extent, it is a major part of what you need to know,
- understand the Bayes’ theorem and basics of probability theory,
- know how to select data and understand how it can be useful, be able to analyze data,
- know how to extrapolate data, because even big data do not always accurately represent reality,
- use the right methods to eliminate mistakes in data,
- know how to make a sample correctly, to use the right confidence intervals,
- build the right hypothesis, know how to define the null hypothesis and confirm alternative hypotheses.