Proyecto de procesamiento distribuido que analiza datos meteorológicos de Medellín (2023-2024) usando MapReduce en Hadoop. Calcula temperatura promedio y precipitación total por mes.
Python, R, Data Modeling, Data Warehousing, Athena, Talend, JSON, XML, YAML, Kubernetes, Docker, Snowflake, Tableau, Power BI, JIRA, Agile Methodologies, Data ...
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame ...
ABSTRACT: Distributed Systems (DS) have a collection of heterogeneous computing resources to process user tasks. Task scheduling in DS has become prime research case, not only due of finding an ...
Abstract: Hadoop is a framework for processing large amount of data in parallel with the help of Hadoop Distributed File System (HDFS) and MapReduce framework. Job scheduling is an important process ...
Abstract: The MapReduce parallel programming model is designed for large-scale data processing, but its benefits, such as fault tolerance and automatic message routing, are also helpful for ...
Sybase is hoping its IQ analytic database can make its mark in the burgeoning “Big Data” market with an array of new features, including native integration with the open-source MapReduce and Hadoop ...