Not all data management problems can be solved by a relational database, particularly when it involves giant sized chunks of data. IBM saw this and began offering a package of services called BigInsights Core that is based on Apache’s open source Hadoop. Hadoop, designed to filter, sort, and manage structured or unstructured data on a large server cluster, is comprised of two distributed systems:
- Map/Reduce, a system that knows where data is stored on disks throughout the cluster and where the nearest processor to it is. When it comes time to sort or filter the data, it can give the orders to call up the data from disk in large chunks of 64 or 128 megabytes and move it to nearby processors.
- And HDFS, or Hadoop File System, that knows how to distribute the data across a cluster in the first place.
After Hadoop has done its work, it’s possible for data warehouses, business analytics systems and relational databases to work with a more manageable results set.