大数据学习概念篇

1.理解概念

Hadoop

Hadoop是一个由Apache基金会所开发的分布式系统基础架构。
用户可在不了解分布式底层细节的情况下，开发分布式程序。充分利用集群的威力进行高速运算和存储

spark

Spark是当前最流行的开源大数据内存计算框架，用Scala语言实现，由UC伯克利大学AMPLab实验室开发并于2010年开源。
以通用、易用为目标，高速发展后成为最活跃的Apache开源项目。

hive

Hive是基于Hadoop的一个数据仓库工具。
可以将结构化的数据文件映射为一张数据库表，并提供简单的sql查询功能，可以将sql语句转换为MapReduce任务进行运行。

Hbase

HBase – Hadoop Database，是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统。
利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。

zookeeper

官方介绍

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

大数据学习 概念篇

大数据学习概念篇