+49 (0) 431 – 88 38 026
DataScience business case starts with the right questions

You want to deal with data science. Are you wondering what data analytics and AI can achieve for your company? What is the use of it all? How can I use these technologies? And where do I start? The central element of a solid data science strategy is the right question, which in turn leads to a functioning business case. What do you want to achieve? Which data do you have and which questions are important but currently not answered? Which external factors such as weather, traffic, economic situation etc. have an impact on your outcome, including your processes, on the quality of your work? All of these questions must be collected, fluffed and compared with data available internally and externally. This is the only way to determine the relevant questions that need to be answered with the help of data analysis.

Creating a platform - gaining experience: Data Lake and Data Analytics as kickstarter

If you want to use machine learning and maybe also deep learning to gain new insights, you have to be aware that the data is at the beginning. Without data, lots of data, you cannot build a meaningful AI case. Data must therefore not only be identified internally and externally, but also stored, merged, aggregated, consolidated and offset. All of this takes place on a company's own platform, which is either mapped in its own data center ("on prices") or in the cloud. The data is stored and analyzed here. If the data platform (the so-called "data lake") is now in place, data must be recorded and processed. Because only with a lot of data and meaningful ("smart") data can analyzes be carried out afterwards and later also use AI models. The data can already be evaluated during the development phase - without the use of machine learning or deep learning. It is astonishing, for example, what exciting insights can be achieved with a simple regression analysis.

Visualization and engineering

If the data has already been recorded and is managed and made available on its own data lake, it makes sense to identify sources in the company and also externally, with the help of which additional data can be recorded and the current models can be included. There is also "research" and visualization of data relationships, visualization tools such as Microsoft Power BI or Tableau or libraries for your own applications such as D3JS. Interactive infographics that can be used in corporate communication can also be derived from the data. In order not only to further enrich this data, but also to make it usable at different points in the organization, interfaces must be developed (so-called REST APIs or GraphQL APIs). They access the data according to certain rules and perform calculations and / or aggregation in real time and deliver the results. These software modules can also be used as intelligent assistants in the context of machine learning and deep learning.

Supreme discipline: deep learning

Deep learning is a subset of artificial intelligence that mimics how the human brain works to process data and create patterns for decision-making. The deep learning neural networks are able to learn from data during use (analogous to the human brain). The data can be structured, but above all it can also be unstructured. Deep Learning works with a variety of "neuron layers", where the name "Deep = Tief" comes from. The power of deep learning is to understand precisely this unstructured data that can no longer be processed by humans and to recognize patterns here that humans cannot recognize.

What is data science?

Data Science describes the science of data analysis. It comprises methods and tools as well as principles and models of mathematics and computer science, with the aim of extracting valuable / relevant information from extensive data (big data). This knowledge is used to achieve competitive advantages or to identify errors (e.g. in production). Based on the comprehensive data analysis using high-performance computers (in the data center or in the cloud), machine learning can then be used to gain knowledge from the data and to make predictions.

What does a data scientist do?

A data scientist processes and analyzes large amounts of data using computer technologies in order to generate information and knowledge from them. The aim is to use the data in such a way that added value and additional benefits are achieved (see also Data Science). The data scientist also ensures that current data is always fed into the system (as a package = "batching" or continuously = "streaming") and monitors the technologies. He then prepares the data so that it can be used for further analyzes with AI models (machine learning, deep learning, ...). It also condenses and visualizes the information so that it becomes presentable and thus understandable / communicable.

Why the hype now?

We are at a time when almost unlimited computing capacities (including cloud services from Microsoft, IBM, Amazon or google) are available and beyond that, comprehensive, mobile networking not only of people via smartphones and tablets, but also there is also an increasing number of devices and things ("Internet of Things") as well as vehicles and buildings. This setup produces an exponentially growing number of data. Every day. Every minute. Every second. This data can no longer be processed with standard analysis forms and tools, so that new concepts and technologies have emerged with big data and data analytics in order to be able to process and analyze incredibly large amounts of data in real time. Since not only the data is available (e.g. open data from the states, cities and municipalities as well as the federal government and the European Union), but also the computing capacity in the cloud is affordable for every company, the majority of decisions will be made in the future based on data. Data scientists and data analysts are the experts in demand here.

Where is the benefit of data science in the company?

Since the investment costs can be quite manageable at the beginning, every company, every organization should deal with data science in order to gain initial experience. The fact that data can increasingly be made available free of charge or at very low cost and that IoT technologies (keyword: sensor technology) are becoming increasingly cheaper also promotes the development of a data science strategy. The first benefit is to set up your own data lake strategy in order to lay the foundation for central data acquisition and storage as well as processing / aggregation. From the analysis of the consolidated data, very useful insights can be obtained even without AI. If a large amount of data is available - either by storing it over time or by merging a lot of different data - new insights can be determined using machine learning or deep learning, and predictions and recommendations can be obtained automatically. This is where the greatest potential for use exists - and in the future companies with a well-thought-out AI strategy and a professional AI platform will make significantly more precise, loss-free and secure decisions - and thus be more successful.

Regression analysis

Linear regression is a simple mathematical construct to derive the relationships between new data pairs from the correlation of data sets by calculation. In practice, it often happens that you have key figures and these are e.g. have two characteristics described, which are then examined to determine how closely they are related (correlate). Example: house area and house price when selling. The linear regression method assumes that there is a linear relationship between the two values. So I want to find out exactly where this straight line is in my graph so that I can then use it to deduce from one key figure to the other number. You can also estimate a trend using linear regression. This is particularly important if the data is a time series.

Machine learning

Machine learning as a central method of data analysis and a sub-category of "artificial intelligence" includes the automated (self-learning) creation of analytical models for pattern recognition. The concept is that systems learn independently from data and recognize patterns in order to then make decisions automatically. So it is the first step to deviate from a rule-based approach and instead learn from data by pattern recognition.

Deep learning

Deep learning is a subset of artificial intelligence that mimics how the human brain works when it processes data and creates patterns for decision-making. The deep learning neural networks are able to learn from data during use (analogous to the human brain). The data can be structured, but above all it can also be unstructured. Deep Learning works with a variety of "neuron layers", where the name "Deep = Tief" comes from. The power of deep learning is to understand precisely this unstructured data that can no longer be processed by humans and to recognize patterns here that humans cannot recognize.