Showing posts with label HDFS. Show all posts
Showing posts with label HDFS. Show all posts

Monday, 29 January 2018

How to list blocks of file in HDFS

In this article I will explain how HDFS stores blocks. As per HDFS architecture, every file is split into blocks of size defined by dfs.blocksize and replicated by dfs.replication times to achieve fault tolerance and data locality. We can use hadoop's fsck command to check file status.Let's go through the steps.

In this article I'll be doing HDFS blocks listing on file /user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar .

Step 1: Check that file exists

Lets confirm that file exists at given path.

[root@sandbox ~]# hadoop fs -ls /user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar
-rw-r--r--   1 oozie hdfs  190868107 2017-05-05 12:59 /user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar

Step 2: Check blocksize configured in cluster

In HDFS dfs.blocksize property is used to configure block size of files. Lets grep for the pattern in hdfs configuration file. In sanbox environment the file exists at /etc/hadoop/conf/hdfs-site.xml location.

[root@sandbox ~]# grep -C2 'dfs.blocksize' /etc/hadoop/conf/hdfs-site.xml 

    <property>
      <name>dfs.blocksize</name>
      <value>134217728</value>
    </property>

Step 3: Check size of file

Let us check if the file has size more than a block size ?

[root@sandbox ~]# hadoop fs -du -s -h /user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar
182.0 M  /user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar

Step 4: Check replication factor configured for cluster

In HDFS dfs.replication property is used to configure replicas of blocks. Lets grep for the pattern in hdfs configuration file. In sanbox environment the file exists at /etc/hadoop/conf/hdfs-site.xml location.

[root@sandbox ~]# grep -C2 'dfs.replication<' /etc/hadoop/conf/hdfs-site.xml 
    
    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
[root@sandbox ~]# 

Step 5: Use fsck to check blocks and datanode

Now we can use fsck command to check the status of file including number of blocks and locations.

[root@sandbox ~]# hdfs fsck /user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar -files -blocks -locations
Connecting to namenode via http://sandbox.hortonworks.com:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fuser%2Foozie%2Fshare%2Flib%2Flib_20170505125744%2Fspark%2Fspark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar
FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path /user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar at Mon Jan 29 05:32:44 UTC 2018
/user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar 190868107 bytes, 2 block(s):  OK
0. BP-1875268269-127.0.0.1-1493988757398:blk_1073742550_1726 len=134217728 repl=1 [DatanodeInfoWithStorage[127.0.0.1:50010,DS-3fd6f5d7-12ac-4a3c-8890-77034935b5e6,DISK]]
1. BP-1875268269-127.0.0.1-1493988757398:blk_1073742551_1727 len=56650379 repl=1 [DatanodeInfoWithStorage[127.0.0.1:50010,DS-3fd6f5d7-12ac-4a3c-8890-77034935b5e6,DISK]]

Status: HEALTHY
 Total size:    190868107 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  2 (avg. block size 95434053 B)
 Minimally replicated blocks:   2 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    1
 Average block replication: 1.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      1
 Number of racks:       1
FSCK ended at Mon Jan 29 05:32:44 UTC 2018 in 3 milliseconds


The filesystem under path '/user/oozie/share/lib/lib_20170505125744/spark/spark-assembly-1.6.3.2.6.0.3-8-hadoop2.7.3.2.6.0.3-8.jar' is HEALTHY
[root@sandbox ~]# 

Sunday, 28 January 2018

HDFS Metadata: Namenode

In this article I will explain how namenode maintains metadata information in directory configured using dfs.namenode.name.dir in hdfs-site.xml file. In my case dfs.namenode.name.dir is configured to /hadoop/hdfs/namenode location. So lets start with listing on this directory.

ls -1 /hadoop/hdfs/namenode
current
in_use.lock

There are two entries namely

in_use.lock :

This is lock file held by namenode process. It is used to prevent concurrent modification of directory by multiple namenode processes.

current: This is directory. Lets do listing on this

ls -1 /hadoop/hdfs/namenode/current
VERSION
edits_0000000000000138313-0000000000000138314
edits_0000000000000138315-0000000000000138316
edits_0000000000000138317-0000000000000138318
edits_0000000000000138319-0000000000000138320
edits_0000000000000138321-0000000000000138322
edits_0000000000000138323-0000000000000138324
edits_inprogress_0000000000000138325
fsimage_0000000000000137650
fsimage_0000000000000137650.md5
fsimage_0000000000000138010
fsimage_0000000000000138010.md5
seen_txid

There are lot of files, lets explore one by one.

VERSION:

This is a Storage information file with following content:

#Wed Dec 02 13:16:31 IST 2015
namespaceID=2109784471
clusterID=CID-59abe9cc-89c7-4cf8-ada2-6c6409c98c97
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1469059006-127.0.0.1-1449042391563
layoutVersion=-63

You can refer to org.apache.hadoop.hdfs.server.common.StorageInfo.java and org.apache.hadoop.hdfs.server.common.Storage.java for more information.

namespaceID:

Unique namespace identifier assigned to the file system after hdfs format. This is stored on all nodes of cluster. This is essential to join cluster, datanode with different namespaceID is not allowed to join cluster.

clusterID:

It identifies a cluster, and it has to be unique during the life time of a cluster. This is important for federated deployment. Introduced in HDFS-1365

cTime:

creation time of file system, this field is updated during HDFS upgrades.

storageType:

storageType can be one of NAME_NODE OR JOURNAL_NODE ( one of the org.apache.hadoop.hdfs.server.common.NodeType exclude DATA_NODE.)

blockpoolID:

Unique identifier of storage block pool.This is important for federated deployment. Introduced in HDFS-1365

layoutVersion:

Layout version of storage data. Whenever new features related to metadata are added to HDFS project, this version is changed.

edits_0000000000000abcdef-0000000000000uvwxyz ( edits_startTransactionID-endTransactionID):

This file contains all edit log transactions information between startTransactionID to endTransactionID. Its a log of each file system change like file creation,deletion or modification.

edits_inprogress0000000000000138325 (edits_inprogressstartTransactionID):

Current edit log file. This file contains edit logs starting from startTransactionID. All the new transactions are appended to this file.

fsimage_0000000000000abcdef ( fsimage_endTransactionID):

This file contains the complete state of the file system at a point in time, in this case till endTransactionID.

fsimage_0000000000000abcdef.md5 ( fsimage_endTransactionID.md5):

It is MD5 checksum of fsimage_endTransactionID file, used to prevent from disk corruption.

seen_txid:

The last transactionID of last checkpointing or edit logs roll. This file is updated when fsimage is merged with edits file or a new edits file is created. This is used to verify if edits are missing at the time of startup.

HDFS Metadata Datanode

In this article I will explain how datanode maintains metadata information in directory configured using dfs.datanode.name.dir in hdfs-site.xml file. In my case dfs.datanode.name.dir is configured to /hadoop/hdfs/datanode location. So lets start with listing on this directory.

ls -1 /hadoop/hdfs/datanode
current
in_use.lock

There are two entries namely

in_use.lock :

This is lock file held by datanode process. It is used to prevent concurrent modification of directory by multiple datanode processes.

current: This is directory. Lets do tree listing on this

tree current/
current/
|-- BP-1469059006-127.0.0.1-1449042391563
|   |-- current
|   |   |-- VERSION
|   |   |-- finalized
|   |   |   `-- subdir0
|   |   |       `-- subdir0
|   |   |           |-- blk_1073741825
|   |   |           `-- blk_1073741825_1001.meta
|   |   |-- rbw
|   |-- dncp_block_verification.log.curr
|   |-- dncp_block_verification.log.prev
|   `-- tmp
`-- VERSION

There are lot of files and directories, lets explore one by one.

VERSION:

This is a Storage information file with following content:

#Wed Dec 02 13:16:39 IST 2015
storageID=DS-c25c62e1-a512-451e-87b2-e9175afca9f4
clusterID=CID-59abe9cc-89c7-4cf8-ada2-6c6409c98c97
cTime=0
datanodeUuid=ad7ecbe4-b4a2-4b52-8146-5240ec849119
storageType=DATA_NODE
layoutVersion=-56

You can refer to org.apache.hadoop.hdfs.server.common.StorageInfo.java and org.apache.hadoop.hdfs.server.common.Storage.java for more information.

storageID:

It is unique to the datanode, and same across all storage directories on datanode. Namenode uses this id, to uniquely identify the datanode.

clusterID:

It identifies a cluster, and it has to be unique during the life time of a cluster. This is important for federated deployment. Introduced in HDFS-1365

cTime:

creation time of file system, this field is updated during HDFS upgrades.

datanodeUuid:

Unique identifier of a datanode, introduced in HDFS-5233

storageType:

It'll be DATA_NODE.

layoutVersion:

Layout version of storage data. Whenever new features related to metadata are added to HDFS project, this version is changed.

BP-randomInteger-NameNodeIpAddress-creationTime:

This is unique block pool id, where BP stands for Block Pool, it is followed by unique random integer, IP address of namenode and block pool creation time.Block pool collects a set of blocks whihc belongs to a namespace.

finalized:

This directory contains block which are completed. Each block file contains hdfs data.

rbw:

This directory contains blocks that are still being written to by HDFS client. Here rbw stands for replic being written.

dncpblockverification.log.*:

This file tracks the last time each block was verified by comparing its contents against the checksum. This file is rolled periodically, so dncpblockverification.log.curr is current file and dncpblockverification.log.prev this is old file which has been rolled back. Background block verification work happens in ascending order of last verification time.

Saturday, 27 January 2018

HDFS Metadata Configuration Properties

In this article, we will see configuration properties used to decide behaviour of hdfs metadata directories. All these properties are part of hdfs-site.xml file. Description and default values are picked from hdfs-default.xml.

dfs.datanode.data.dir

Description:

Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.

Default Value:

file://${hadoop.tmp.dir}/dfs/data

dfs.namenode.checkpoint.check.period

Description:

The SecondaryNameNode and CheckpointNode will poll the NameNode every dfs.namenode.checkpoint.check.period seconds to query the number of uncheckpointed transactions.

Default Value:

60

dfs.namenode.checkpoint.period

Description:

The number of seconds between two periodic checkpoints.

Default Value:

3600

dfs.namenode.checkpoint.txns

Description:

The Secondary NameNode or CheckpointNode will create a checkpoint of the namespace every dfs.namenode.checkpoint.txns transactions, regardless of whether dfs.namenode.checkpoint.period has expired.

Default Value:

1000000

dfs.namenode.edit.log.autoroll.check.interval.ms

Description:

How often an active namenode will check if it needs to roll its edit log, in milliseconds.

Default Value:

300000

dfs.namenode.edit.log.autoroll.multiplier.threshold

Description:

Determines when an active namenode will roll its own edit log. The actual threshold (in number of edits) is determined by multiplying this value by dfs.namenode.checkpoint.txns. This prevents extremely large edit files from accumulating on the active namenode, which can cause timeouts during namenode startup and pose an administrative hassle. This behavior is intended as a failsafe for when the standby or secondary namenode fail to roll the edit log by the normal checkpoint threshold.

Default Value:

2.0

dfs.namenode.edits.dir

Description:

Determines where on the local filesystem the DFS name node should store the transaction (edits) file. If this is a comma-delimited list of directories then the transaction file is replicated in all of the directories, for redundancy. Default value is same as dfs.namenode.name.dir

Default Value:

${dfs.namenode.name.dir}

dfs.namenode.name.dir

Description:

Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

Default Value:

file://${hadoop.tmp.dir}/dfs/name

dfs.namenode.num.checkpoints.retained

Description:

The number of image checkpoint files (fsimage_*) that will be retained by the NameNode and Secondary NameNode in their storage directories. All edit logs (stored on edits_* files) necessary to recover an up-to-date namespace from the oldest retained checkpoint will also be retained.

Default Value:

2

dfs.namenode.num.extra.edits.retained

Description:

The number of extra transactions which should be retained beyond what is minimally necessary for a NN restart. It does not translate directly to file's age, or the number of files kept, but to the number of transactions (here "edits" means transactions). One edit file may contain several transactions (edits). During checkpoint, NameNode will identify the total number of edits to retain as extra by checking the latest checkpoint transaction value, subtracted by the value of this property. Then, it scans edits files to identify the older ones that don't include the computed range of retained transactions that are to be kept around, and purges them subsequently. The retainment can be useful for audit purposes or for an HA setup where a remote Standby Node may have been offline for some time and need to have a longer backlog of retained edits in order to start again. Typically each edit is on the order of a few hundred bytes, so the default of 1 million edits should be on the order of hundreds of MBs or low GBs.
NOTE: Fewer extra edits may be retained than value specified for this setting if doing so would mean that more segments would be retained than the number configured by dfs.namenode.max.extra.edits.segments.retained.

Default Value:

1000000

dfs.namenode.checkpoint.dir

Description:

Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.

Default Value:

file://${hadoop.tmp.dir}/dfs/namesecondary

dfs.namenode.checkpoint.edits.dir

Description:

Determines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directories then the edits is replicated in all of the directories for redundancy. Default value is same as dfs.namenode.checkpoint.dir

Default Value:

${dfs.namenode.checkpoint.dir}

dfs.namenode.checkpoint.max-retries

Description:

The SecondaryNameNode retries failed check pointing. If the failure occurs while loading fsimage or replaying edits, the number of retries is limited by this variable.

Default Value:

3

Tuesday, 7 April 2015

HDFS - Quota Management

HDFS Quotas:

You can set two types of quotas in HDFS

a. Space Quota: The amount of space used by given directory

b. Name Quota: The number of file and directory names used.

Notes:

  • Quotas for space and names are independent of each other
  • File and directory creation fails if creation would cause the quota to be exceeded.
  • Block allocations fail if the quota would not allow a full block to be written.
  • Each replica counts against quota. For eg. if user is writing 3GB file with replication factor of 3, then 9GB will be consumed from his quota.
  • Largest quota is Long.Max_Value

HDFS Quota Operations:

a. Set a Name quota:

ACL: Only admin can perform this operation.

Command: Hadoop admin can use following command to set name quota.

hadoop dfsadmin -setQuota number_of_files path

eg.

hadoop dfsadmin -setQuota 100 /grid/landing

Explanation: It sets hadoop quota to 100, which means user can create 100 files including directories under /grid/landing path.

b. Clear a Name quota:

ACL: Only admin can perform this operation.

Command: Hadoop admin can use following command to clear name quota.

hadoop dfsadmin -clearQuota path

eg.

hadoop dfsadmin -clearQuota /grid/landing 

c. Set Space quota:

ACL: Only admin can perform this operation.

Command: To set space quota, hadoop admin can use following command,

hadoop dfsadmin -setSpaceQuota size path

eg.

hadoop dfsadmin -setSpaceQuota 15G /grid/landing

Explanation: It means user can write upto 5GB ( 5 * 3 = 15) of data under /grid/landing path , assuming the replication factor of 3. Here user cannot write data less than block size. why? Because HDFS assumes an entire block will be filled, when its allocated. eg. say, if some path /projects/ingestion has quota of 50 MB, and if someone is writing a file of 10MB under this path, it'll fail, because of quota violation. Here HDFS thinks user is writing 384MB (128 * 3) data, instead of (10 * 3 = 30 MB).

d. Clear Space quota:

ACL: Only admin can perform this operation.

Command: To clear space quota, admin can use following command,

hadoop dfsadmin -clearSpaceQuota path

eg.

hadoop dfsadmin -clearSpaceQuota /grid/landing

e. Get quota allocation of Path:

ACL: Anyone can check quota allocation of path

Command: Hadoop admin can use following command to check quota allocation of a path,

hadoop fs -count -q path

eg.

hadoop fs -count -q /grid

Explanation: Above command will give output of following format,

hadoop fs -count -q /grid
9223372036854775807 9223372036854775333 none  inf  141  333 655855032 /grid

where ,

column1 --> Namespace quota, which means total 9223372036854775807 files can be created
column2 --> Available Namespace quota, user can add 9223372036854775333 files .
column3 --> Space quota
column4 --> Available space quota
column5 --> Number of directories
column6 --> Number of Files
column7 --> Size of content available
column8 --> Path