hadoop

1. 对象存储

1.1 aliyun OSS

1.2 aws S3

2. Hadoop

参考:

2.1 Hadoop 支持 S3(minio)

[root@node7131 hadoop-2.10.0]# cat etc/hadoop-minio/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>fs.defaultFS</name>
        <!--value>hdfs://localhost:9000</value-->
        <value>s3a://minio-buc</value>
</property>

<property>
  <name>fs.s3a.endpoint</name>
  <value>http://10.192.71.32:9000</value>
  <description>AWS S3 endpoint to connect to. An up-to-date list is
    provided in the AWS Documentation: regions and endpoints. Without this
    property, the standard region (s3.amazonaws.com) is assumed.
  </description>
</property>

<property>
  <name>fs.s3a.access.key</name>
  <value>ak_123456</value>
  <description>AWS access key ID.
   Omit for IAM role-based or provider-based authentication.</description>
</property>

<property>
  <name>fs.s3a.secret.key</name>
  <value>sk_123456</value>
  <description>AWS secret key.
   Omit for IAM role-based or provider-based authentication.</description>
</property>

<property>
  <name>fs.s3a.aws.credentials.provider</name>
  <description>
    Comma-separated class names of credential provider classes which implement
    com.amazonaws.auth.AWSCredentialsProvider.

    These are loaded and queried in sequence for a valid set of credentials.
    Each listed class must implement one of the following means of
    construction, which are attempted in order:
    1. a public constructor accepting java.net.URI and
        org.apache.hadoop.conf.Configuration,
    2. a public static method named getInstance that accepts no
       arguments and returns an instance of
       com.amazonaws.auth.AWSCredentialsProvider, or
    3. a public default constructor.

    Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider allows
    anonymous access to a publicly accessible S3 bucket without any credentials.
    Please note that allowing anonymous access to an S3 bucket compromises
    security and therefore is unsuitable for most use cases. It can be useful
    for accessing public data sets without requiring AWS credentials.

    If unspecified, then the default list of credential provider classes,
    queried in sequence, is:
    1. org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider:
       Uses the values of fs.s3a.access.key and fs.s3a.secret.key.
    2. com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports
        configuration of AWS access key ID and secret access key in
        environment variables named AWS_ACCESS_KEY_ID and
        AWS_SECRET_ACCESS_KEY, as documented in the AWS SDK.
    3. com.amazonaws.auth.InstanceProfileCredentialsProvider: supports use
        of instance profile credentials if running in an EC2 VM.
  </description>
</property>

<property>
  <name>fs.s3a.impl</name>
  <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  <description>The implementation class of the S3A Filesystem</description>
</property>

</configuration>

2.2 Hadoop 支持 HAS S3

<property>
  <name>fs.s3a.signing-algorithm</name>
  <value>S3SignerType</value>
  <description>How.</description>
</property>

2.3 Hadoop 支持 aliyun OSS

[root@node7131 hadoop-2.10.0]# cat etc/hadoop-aliyun/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>fs.defaultFS</name>
        <value>oss://buc-pic</value>
</property>

<property>
  <name>fs.oss.endpoint</name>
  <value>oss-cn-hangzhou.aliyuncs.com</value>
  <description>Aliyun OSS endpoint to connect to. An up-to-date list is
    provided in the Aliyun OSS Documentation.
   </description>
</property>

<property>
   <name>fs.oss.impl</name>
   <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
</property>

<property>
  <name>fs.oss.accessKeyId</name>
  <value>LTAI4FgP65SyfaP5BmaV1x5C</value>
  <description>Aliyun access key ID</description>
</property>

<property>
  <name>fs.oss.accessKeySecret</name>
  <value>zRRfmJuGbuyz4laXbkRfzDKq8ugeTw</value>
  <description>Aliyun access key secret</description>
</property>

<property>
  <name>fs.oss.credentials.provider</name>
  <description>
    Class name of a credentials provider that implements
    com.aliyun.oss.common.auth.CredentialsProvider. Omit if using access/secret keys
    or another authentication mechanism. The specified class must provide an
    accessible constructor accepting java.net.URI and
    org.apache.hadoop.conf.Configuration, or an accessible default constructor.
  </description>
</property>

</configuration>

3. Hive

参考:

hive 使用 S3 的配置和 HDFS 一致。

存在问题:

  1. hive CLI 对接S3,能够在S3中创建相应的 database/table 目录,但Insert 数据接口还有些问题,继续调试中。
    问题在于Insert数据时,会创建一些临时目录,但是object key长度超过127导致上传失败。

hive 上传数据时的步骤:

hive> insert into users values(1, "liudong", "pass", "ak", "sk");
  1. 创建目录 PUT /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/
    同时会删除:
    tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/
    tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/
    tmp/hive/root/
    tmp/hive/
    tmp/

  2. HEAD /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/data_file

  3. HEAD /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/data_file/

  4. HEAD /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4
    NOT FOUND

  5. HEAD /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/
    200 OK

  6. PUT /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/data_file HTTP/1.1
    Host: 10.192.71.31
    Authorization: AWS HIK3xk48642XJ88h7d70nW6613us4x28:RF0GxG7CrW6ohifnoZKItxCXZfs=
    User-Agent: Hadoop 2.10.0, aws-sdk-java/1.11.271 Linux/3.10.0-123.el7.x86_64 OpenJDK_64-Bit_Server_VM/11.0.3+7 java/11.0.3 groovy/2.4.4 com.amazonaws.services.s3.transfer.TransferManager/1.11.271
    amz-sdk-invocation-id: a6433047-696f-0143-2216-475c1ce77bb0
    amz-sdk-retry: 0/0/500
    Date: Sun, 19 Jan 2020 02:42:13 GMT
    Content-MD5: QlifiGByqjSTxcfTVojBcg==
    Content-Type: application/octet-stream
    Content-Length: 21
    Connection: Keep-Alive
    Expect: 100-continue

HTTP/1.1 100 Continue

1.liudong.pass.ak.sk
HTTP/1.1 200 OK
ETag: 42589f886072aa3493c5c7d35688c172
Date: Sun, 19 Jan 2020 02:42:13 GMT
Connection: keep-alive
Content-Length: 0

  1. Delete tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/ 及各级父目录

  2. HEAD /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1
    NOT FOUND

HEAD /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1/
NOT FOUND

  1. GET /buc-ld/?prefix=user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1/&delimiter=/&max-keys=1&encoding-type=url

HEAD /buc-ld/user/hive/warehouse/ld.db/users
NOT FOUND

HEAD /buc-ld/user/hive/warehouse/ld.db/users/
NOT FOUND

GET /buc-ld/?prefix=user/hive/warehouse/ld.db/users/&delimiter=/&max-keys=1&encoding-type=url
OK

HEAD /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1
NOT FOUND

HEAD /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1/
NOT FOUND

PUT /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1/ HTTP/1.1

  1. PUT /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1/_tmp.-ext-10002/

  2. GET /buc-ld/?prefix=tmp/hive/root/**872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/&delimiter=/&max-keys=1&encoding-type=url HTTP/1.1
    OK

HEAD /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/data_file HTTP/1.1
OK

GET /buc-ld/tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__4/data_file HTTP/1.1
1.liudong.pass.ak.sk

HEAD /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1/_task_tmp.-ext-10002/_tmp.000000_0 HTTP/1.1
Host: 10.192.71.31
Authorization: AWS HIK3xk48642XJ88h7d70nW6613us4x28:wFAXYdh95S9PDQAIepKB5X/jrkU=
User-Agent: Hadoop 2.10.0, aws-sdk-java/1.11.271 Linux/3.10.0-123.el7.x86_64 OpenJDK_64-Bit_Server_VM/11.0.3+7 java/11.0.3 groovy/2.4.4
amz-sdk-invocation-id: 70ecf423-2f71-d1ae-9138-60ebf6864c38
amz-sdk-retry: 0/0/500
Date: Sun, 19 Jan 2020 02:42:14 GMT
Content-Type: application/octet-stream
Connection: Keep-Alive

HTTP/1.1 400 Bad Request
Content-Length: 171
Date: Sun, 19 Jan 2020 02:42:14 GMT
Connection: close

<?xml version="1.0" encoding="utf-8"?>
<Error>
	<Code>InvalidArgument</Code>
	<Message>Invalid Argument</Message>
	<Resource></Resource>
	<RequestId></RequestId>
</Error>

DELETE /buc-ld/user/hive/warehouse/ld.db/users/.hive-staging_hive_2020-01-19_10-42-12_891_8904992031068082033-1/ HTTP/1.1

  1. 上传到临时文件 /tmp/hive/root/92f872b5-0c96-4208-907a-747133d4522c/_tmp_space.db/Values__Tmp__Table__2/data_file/
  2. 删除临时文件

hive 支持删除操作的配置

  1. 配置文件
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>

<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
<property>
<name>hive.in.test</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask.size</name>
<value>10000000</value>
</property>

创建表,支持update/delete

create table test(id int, name string) clustered by (id) into 5 buckets row format de                                                                   limited fields terminated by ',' lines terminated by '
' stored as orc tblproperties('tran                                                                   sactional'='true');

4. hbase

参考:

hbase 配置文件

<configuration>

  <property>
    <name>hbase.rootdir</name>
    <!--value>s3a://10.192.71.31:80/</value-->
    <value>s3a://hbase-1/</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/data/hbase-2.2.2/data/zookeeper</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
    <description>
      Controls whether HBase will check for stream capabilities (hflush/hsync).

      Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
      with the 'file://' scheme, but be mindful of the NOTE below.

      WARNING: Setting this to false blinds you to potential data loss and
      inconsistent system state in the event of process and/or node failures. If
      HBase is complaining of an inability to use hsync or hflush it's most
      likely not a false positive.
    </description>
  </property>


</configuration>

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>fs.defaultFS</name>
        <!--value>hdfs://localhost:9000</value-->
        <value>s3a://hbase-1</value>
</property>

<property>
  <name>fs.s3a.endpoint</name>
  <value>http://10.192.71.31:80</value>
  <description>AWS S3 endpoint to connect to. An up-to-date list is
    provided in the AWS Documentation: regions and endpoints. Without this
    property, the standard region (s3.amazonaws.com) is assumed.
  </description>
</property>

<property>
  <name>fs.s3a.access.key</name>
  <value>HIK3xk48642XJ88h7d70nW6613us4x28</value>
  <description>AWS access key ID.
   Omit for IAM role-based or provider-based authentication.</description>
</property>

<property>
  <name>fs.s3a.secret.key</name>
  <value>HIKco547Q032JB34Q200J16xt0Q5UKE4</value>
  <description>AWS secret key.
   Omit for IAM role-based or provider-based authentication.</description>
</property>

<property>
  <name>fs.s3a.aws.credentials.provider</name>
  <description>
    Comma-separated class names of credential provider classes which implement
    com.amazonaws.auth.AWSCredentialsProvider.

    These are loaded and queried in sequence for a valid set of credentials.
    Each listed class must implement one of the following means of
    construction, which are attempted in order:
    1. a public constructor accepting java.net.URI and
        org.apache.hadoop.conf.Configuration,
    2. a public static method named getInstance that accepts no
       arguments and returns an instance of
       com.amazonaws.auth.AWSCredentialsProvider, or
    3. a public default constructor.

    Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider allows
    anonymous access to a publicly accessible S3 bucket without any credentials.
    Please note that allowing anonymous access to an S3 bucket compromises
    security and therefore is unsuitable for most use cases. It can be useful
    for accessing public data sets without requiring AWS credentials.

    If unspecified, then the default list of credential provider classes,
    queried in sequence, is:
    1. org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider:
       Uses the values of fs.s3a.access.key and fs.s3a.secret.key.
    2. com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports
        configuration of AWS access key ID and secret access key in
        environment variables named AWS_ACCESS_KEY_ID and
        AWS_SECRET_ACCESS_KEY, as documented in the AWS SDK.
    3. com.amazonaws.auth.InstanceProfileCredentialsProvider: supports use
        of instance profile credentials if running in an EC2 VM.
  </description>
</property>

<property>
  <name>fs.s3a.impl</name>
  <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
  <description>The implementation class of the S3A Filesystem</description>
</property>

<property>
  <name>fs.s3a.attempts.maximum</name>
  <value>1</value>
  <description>How many times we should retry commands on transient errors.</description>
</property>

<<property>
  <name>fs.s3a.signing-algorithm</name>
  <value>S3SignerType</value>
  <description>How.</description>
</property>

<property>
  <name>fs.s3a.paging.maximum</name>
  <value>999</value>
  <description>How.</description>
</property>

</configuration>

原文地址:https://www.cnblogs.com/walkinginthesun/p/12382074.html