What is Big Data?
It is often defined as a means of responding to a massive volume of data, hence the term BIG DATA.
The problem with this definition is that it overlooks a fundamental concept of Big Data because Big Data is for dealing with large volumes of data, but the main challenge of Big Data is to enhance the value of these data, whatever their volume.
Technological transformations necessary to enhance this data
Today, companies are facing an exponential increase in data. To give you a more precise idea, you should know that this mass of data can reach up to several petabytes of data and that this data is of various natures.
For example, we can have data from logs, social networks, e-commerce transactions, data analysis, the Internet of Things, images, audio, video, etc..
Of course, many companies want to take advantage of this data (whether it is data they have collected themselves or public data), such as data from the web or Open Data.
Traditional data processing technologies such as business intelligence or databases were not designed to handle such a volume of data, and to extract value from this data, will only be possible for a company by going beyond the limits faced by traditional information systems, these limits are five, these are the 5V.
What is the 5V
The first V corresponds to Volume, it is the explosion of data volumes that must be processed and analyzed, and it is this aspect that has been talked about mainly so far.
The second, Variety, is the difficulty of storing, interpreting and efficiently cross-referencing these increasingly diverse and multiple data sources.
The third, Velocity, is the speed at which data is generated, captured and shared.
Not only consumers, but also businesses are generating more and more data, and all this in much shorter timescales.
However, there is still a time lag between the processing and analysis of this data and the speed at which it is generated, and companies can only capitalize on this data if it is collected and shared in real-time.
The fourth, Value, is about monetizing a company’s data, but also measuring the return on investment of implementing Big Data.
Finally, the fifth, Verity, is the ability to have reliable data available for processing according to the confidence criterion, the data will be given more or less importance.
For example, among the data that may need to be trusted are data from social networks, whose source and objectivity is difficult to assess.
And it is in the face of these constraints that Big Data will be able to propose a set of technologies that will make it possible to overcome these five limits at once.
The processing of these data, and their valorization, will then be done thanks to the implementation of a Big Data architecture, it is about the implementation of a platform allowing a collection of the company’s data. These data are often stored in a Data lake (which is a universal data warehouse), and then these data will be analyzed and monetized.
This is the purpose of Big Data. The Big Data will, therefore, shift the focus of a company to data and especially to the value, it will generate for the company. Hadoop is a free, open-source framework created by the Apache Software Foundation, the principle is that it will take files, transform them into large blocks and distribute them to a cluster of machines for processing.
In terms of scale, we’re still talking about a volume of data of several petabytes, and on a cluster of several thousand machines. Seen that way, it’s still more impressive and that’s why Hadoop allows the creation of distributed and scalable applications, which fits well with the needs of Big Data.
But the main reason for Hadoop’s success is not technical, it’s an economic reason, because before, to process a huge volume of data, supercomputers and specialized hardware were needed. Hadoop has made it possible to perform calculation operations on 1 petabyte of data on standard servers in a reliable and distributed way and therefore at a lower cost.
What is NoSQL
NoSQL databases, for Not Only SQL, refer to a family of database management systems that will move away from the classic model of SQL relational databases. NoSQL databases will have a simpler and more flexible database architecture than traditional relational SQL databases.
NoSQL solutions allow storing a database on a maximum number of machines, which will result in a distributed database allowing to dynamically distribute the load.
In the end, NoSQL databases allow for high performance in terms of data processing, scalable architecture, and the ability to manage the variety of data, which corresponds to the needs of Big Data, which is why NoSQL is so popular.