Monday, May 30, 2016

[Big Data] Purge Leftovers on Hadoop - High Block Count warnings & Some Practices

Hi everyone,

Today,  i will talk about scraps on Hadoop system. We all know that Hadoop deletes remainders after operations succesfully finishes!:) But you should always check block counts. It caused by small size files and causes poor performance issues.

It is highly possible that you see High Block Count warnings on Cloudera Manager main page,
You can check block counts per datanode from following on CM -> HDFS service -> Active NameNode Web UI-> Live Nodes

Or this link http://#ACTIVE_NAME_NODE_IP#:50070/dfshealth.html#tab-datanode

In this work, i will point some practices about how to get rid of small files, ( most of them :) )


Sunday, May 29, 2016

[Big Data] FLAFKA & 2 topics - 2 hdfs sinks on a Kerberos Secured Cluster

Hi All,

After a long time , i started to complete my drafts :)

Here i will show an example for Flafka, which use Kafka as topics (messages) receiver and writes informations to HDFS via Flume.. After that i will query the logs via HIVE external table.