Optimizing Hadoop for Small File Management
Journal Title: Transactions on Machine Learning and Artificial Intelligence - Year 2017, Vol 5, Issue 4
Abstract
HDFS is one of the most used distributed file systems, that offer a high availability and scalability on lowcost hardware. HDFS is delivered as the storage component of Hadoop framework. Coupled with map reduce, which is the processing component, HDFS and MapReduce become the de facto platform for managing big data nowadays. However, HDFS was designed to handle specifically a huge number of large files, while when it comes to a large number of small files, Hadoop deployments may be not efficient. In this paper, we proposed a new strategy to manage small files. Our approach consists of two principal phases. The first phase is about consolidating more than only one client’s small files input, and store the inputs continuously in the first allocated block, in a SequenceFile format, and so on into the next blocks. That way we avoid multiple block allocations for different streams, to reduce calls for available blocks and to reduce the metadata memory on the NameNode. This is because groups of small files packaged in a SequenceFile on the same block will require one entry instead of one for each small file. The second phase consists of analyzing attributes of stored small files to distribute them in such a way that the most called files will be referenced by an additional index as a MapFile format to reduce the read throughput during random access.
Authors and Affiliations
O. Achandair, M. Elmahouti, S. , Khoulji, M. L. Kerkeb
Mobile Agent Life Cycle Demystified using Formal Method
Underlying technique for mobile agent development is often mystified. Existing research sometimes ignore unveiling the details of the mobility and autonomy of the agent system. This paper exposes using formal methods the...
Intelligent System for the Management of Resources Dedicated to Humanity
Our work consists, one way or another, in projecting the light on the intensive need for the reasonable management of water resources. According to the latest studies and statistics, Morocco will soon face a serious cris...
Optimization of Fuzzy Neural Networks using Mine Blast Algorithm for Classification Problem
The integration of Fuzzy Neural Networks (FNNs) with optimization techniques has not only solved the issues “black box” in Artificial Neural Networks (ANNs) but also has been effective in a wide variety of real-world app...
Investigating Undergraduate Students’ Attitudes Towards English Mobile Learning A Case Study of Moroccan University Students
The increase of mobile devices, the availability of several features, and the decrease in terms of cost of smartphones made them useful not only for communication, but also for learning. Similarly, the importance of Engl...
Entrepreneur: Artificial Human Optimization
A new field titled ‘Artificial Human Optimization’ is introduced in this paper. All optimization methods which were proposed based on Artificial Humans will come under this new field. Less than 20 papers were published i...