Data Integration in Big Data Environment
Journal Title: Bonfring International Journal of Data Mining - Year 2015, Vol 5, Issue 1
Abstract
Data Integration is the process of transferring the data in source format into the destination format. Many data warehousing and data management approaches has been supported by integration tools for data migration and transportation by using Extract-Transform-Load (ETL) approach. These tools are widely fit for handling large volumes of data and not flexible to handle semi or unstructured data. To overcome these challenges in big data world, programmatically driven parallel techniques such as map-reduce models were introduced. Data Integration as a process is highly cumbersome and iterative especially to add new data sources. The process of adding these new data sources are time consuming which results in delay, loss of data and irrelevance of the data and improper utilization of useful information. Traditionally waterfall approach is used in EDW (Enterprise Data Warehouse), where one cannot move to the next phase before completing the earlier one. This approach has its merits to ensure the right data sources are picked and right data integration processes are developed to sustain the usefulness of EDW. In big data environment, the situation is completely different. Therefore the traditional approaches of integration are inefficient in handling the current situation. So people are expected to do something regarding this issue. In this paper the importance of data integration in Big Data world are identified and the open problems of Big Data Integration are outlined to proceed future research in Big Data environment.
Authors and Affiliations
B. Arputhamary, L. Arockiam
A Difference-Cum-Exponential Type Estimator for Estimating the Population Mean Under Stratification
In survey sampling, stratification is helpful in improving precision of estimators over simple random sampling in case of heterogeneous population. In the present paper, a difference-cum exponential type estimator of pop...
New Approach to Solve Fuzzy Linear Programming Problems by the Ranking Function
In this paper, a new method is proposed to find the fuzzy optimal solution of fully fuzzy linear programming problems with triangular fuzzy numbers. A computational method for solving fully fuzzy linear programming probl...
Statistical Evaluation of Diagnostic Tests
The use of routine laboratory tests in diagnosing disease is becoming of increasing importance. This emphasizes to test the efficiency of diagnostic tests, since relatively few diagnostic tests correctly classify all sub...
Enhanced Sentiment based Text to Speech Synthesis Using Forward Parsing with Prosody Feature for English
Most of the prototypes and fully operational systems have been constructed based on various synthesis techniques. To produce the output of (TTS) in the same form as if it is actually spoken the prosody feature allows the...
Asymptotic Behavior Results for Nonlinear Impulsive Neutral Differential Equations with Positive and Negative Coefficients
This paper is focused on the following nonlinear impulsive neutral differential equation.., Sufficient conditions are obtained for every solution of (*) to tends to a constant as.,