Big Data Pipeline: An Overview of Ingestion and Preparation Tools
Abstract
As the digital landscape evolves, the exponential growth of data across domains such as healthcare, smart cities, and the Internet of Things (IoT) necessitates advanced tools for efficient data ingestion and preparation. Big Data ingestion involves collecting and transferring data from diverse sources into centralized systems, while preparation ensures that data is cleaned, transformed, and made ready for analysis. This paper presents a comprehensive review of recent research and technologies in Big Data ingestion and preparation, emphasizing the importance of selecting appropriate tools based on project-specific requirements such as data volume, format, and latency. Tools including Apache Kafka, NiFi, Flume, Sqoop, and Spark are critically analyzed for their roles in batch and stream ingestion, real-time processing, and data transformation. The study further explores architectural frameworks, performance metrics, and challenges such as unstructured data handling, real-time governance, and integration complexities. Concluding with emerging trends and research directions, this paper contributes to a better understanding of scalable and adaptive Big Data pipelines in modern data-intensive environments.
Doi: 10.24897/acn.64.68.aasrj720253
References
Full Text: PDF
Refbacks
- There are currently no refbacks.
