Uploaded File
Moe
Moeredshift92@gmail.com
347-468-0742
New York, NY 10001
Hadoop Developer
7 years experience W2
0
Recommendations
Average rating
74
Profile views
Summary

  • Over 6+ years of experience in Development, Design, Integration and Presentation with Java along with Extensive years of Big Data /Hadoop experience in Hadoop ecosystem such as Hive, Pig, Flume, Sqoop, Zookeeper, HBase, SPARK, Kafka, Python and AWS.
  • Experience implementing big data projects using Cloudera.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Hands-on experience in designing and implementing solutions using Apache Hadoop 2.4.0, HDFS 2.7, MapReduce2, HBase 1.1, Hive 1.2, Oozie 4.2.0, Tez 0.7.0, Yarn 2.7.0, Sqoop 1.4.6, MongoDB.
  • Setting up and integrating Hadoop eco system tools
  • HBase, Hive, Pig, Sqoop etc.
  • Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, Mongo DB, No SQL.
  • Hands on experience loading the data into Spark RDD and performing in-memory data computation
  • Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Extensive experience in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Expertise in Apache Spark Development (Spark SQL, Spark Streaming, MLlib, GraphX, Zeppelin, HDFS, YARN and NoSQL).
  • Experience in analyzing data using Hive, Pig Latin and custom MR programs in Java.
  • Hands on experience in writing Spark SQL scripting.
  • Sound knowledge in programming Spark using Scala.
  • Good understanding in processing of real-time data using Spark.
  • Experienced in Worked on No SQL databases
  • HBase, Cassandra & Mongo DB, database performance tuning & data modelling. 1
  • Strong Experience in Front End Technologies like JSP, HTML5, jQuery, JavaScript, CSS3. the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Configured Hadoop clusters in Open Stack and Amazon Web Services (AWS)
  • Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
  • Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
  • Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
  • Experience developing iterative algorithms using Spark Streaming in Scala and Python to build near real-time dashboards.
  • Gaining optimum performance with data compression, region splits and by manually managing compaction in HBase.
  • Upgrading from HDP 2.1 to HPD 2.2 and then to HDP 2.3.
  • Working experience in Map Reduce programming model and Hadoop Distributed File System.
  • Hands on experience on Unix/Linux environments, which included software installations/ upgrades, shell scripting for job automation and other maintenance activities.
  • Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
  • Thorough knowledge and experience in SQL and PL/SQL concepts.
  • Expertise in setting up standards and processes for Hadoop based application design and implementation.

Experience
HADOOP DEVELOPER
Banking/Financial
Jun 2018 - present
  • Developed architecture document, process documentation, server diagrams,requisition documents¥Developed ETL data pipelines using Spark, Spark streaming and Scala.
  • Loaded data from RDBMS to Hadoop using Sqoop
  • Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
  • Responsible for loading Data pipelines from web servers using Sqoop, Kafka and SparkStreaming API
  • Developed the Kafka producers, partitions in brokers and consumer groups.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Implemented Spark using Scala and Sparks for faster testing and processing of data.
  • Data Processing: Processed data using Map Reduce and Yarn.
  • Worked on Kafka as a proof of concept for log processing.
  • Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Monitoring the hive Meta store and the cluster nodes with the help of Hue.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created AWS EC2 instances and used JIT servers.
  • Developed various UDFs in Map-Reduce and Python for Pig and Hive.¥Data Integrity checks have been handled using hive queries, Hadoop and Spark.
  • Worked on performing transformations & actions on RDDs and Spark Streaming data with Scala.
  • Implemented the Machine learning algorithms using Spark with Python.
  • Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Responsible in handling Streaming data from web server console logs.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.¥Worked on developing ETL processes (Data Stage Open Studio) to load data frommultiple data sources to HDFS using FLUME and SQOOP and performed structuralmodifications using Map Reduce, HIVE.
  • Involved in NoSQL database design, integration and implementation
  • Loaded data into NoSQL database HBase.
  • Developed Kafka producer and consumers, HBase clients, Spark and HadoopMapReduce jobs along with components on HDFS, Hive.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed bothManaged and External tables in Hive to optimize performance.

Environment: Spark, Spark Streaming, Apache Kafka, Hive, Tez, AWS, ETL, PIG, UNIX, Linux, Tableau, Teradata, Pig, Sqoop, HDFS, Map Reduce, Flume, Hive, Informatica 9.1/8.1/7.1/6.1 , Oracle 11g, ETL, Hadoop 2.x, NOSQL, Flat files, Eclipse

Apache AWS Data Integration Eclipse ETL Flume Hadoop Hadoop Developer Hbase HDFS Hive Linux Machine Learning MapReduce MongoDB Oozie Oracle Pig Python Spark Sqoop Tableau Teradata UNIX
Remove Skill
HADOOP DEVELOPER
Information Technology
Apr 2017 - Apr 2018
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Written python scripts to analyze the data of the customer.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Implemented Spark RDD transformations to Map business analysis and apply action son top of transformations.
  • Involved in migrating Map Reduce jobs into Spark jobs and used SparkSQL and Data frames API to load structured data into Spark clusters.
  • Use Data frames for data transformations using RDD.
  • Designed and Developed Spark workflows using Scala for data pull from cloud-based systems and applying transformations on it.
  • Using Spark streaming consumes topics from distributed messaging source Event huband periodically pushes batch of data to Spark for real time processing
  • Tuned Cassandra and MySQL for optimizing the data.
  • Implemented monitoring and established best practices around usage of elastic search¥Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
  • Hands-on experience with Horton works tools like Tea and Amari.
  • Worked on Apache Knife as ETL tool for batch processing and real time processing.
  • Fetch and generate monthly reports. Visualization of those reports using Tableau.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Strong working experience on Cassandra for retrieving data from Cassandra clusters torun queries.
  • Used Data tax Spark-Cassandra connector to load data into Cassandra and used CQLto analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Worked with BI (Business Intelligence) teams in generating the reports and designingETL workflows on Tableau.
  • Deployed data from various sources into HDFS and buildingreports using Tableau.
  • Extensively in creating Map-Reduce jobs to power data for search and aggregation.
  • Managed Hadoop jobs by DAG using Oozie workflow scheduler.
  • Involved in developing code to write canonical model JSON records from numerousinput sources to Kafka Queues.
  • Involved in loading data from Linux file systems, servers, java web services usingKafka producers and consumers.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark,Scala, Hadoop, Hive, Cassandra.

Apache Data Warehousing Elasticsearch ETL Flume Hadoop Hadoop Developer HDFS Hive Java JSON Linux MapReduce MySQL Oozie OpenShift Pig Python Shell Scripts Spark Sqoop Tableau Ubuntu WebServices Tableau Desktop
Remove Skill
HADOOP DEVELOPER
Banking/Financial
Apr 2016 - Apr 2017

Responsibilities

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig,Hive and Sqoop.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for fasterprocessing of data
  • Converted existing MapReduce jobs into Spark transformations and actions usingSpark RDDs, Data frames and Spark SQL APIs.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Worked on Big Data infrastructure for batch processing as well as real-time processing.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.¥Developed Spark jobs and Hive Jobs to summarize and transform data
  • Expertise in implementing Spark Scala application using higher order functions for bothbatch and interactive analysis requirement.
  • Experienced in developing Spark scripts for data analysis in Scala.¥Used Spark-Streaming APIs to perform necessary transformations.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark SQLand Scala.
  • Worked with spark to consume data from Kafka and convert that to common formatusing Scala.
  • Worked extensively with importing metadata into Hive and migrated existing tablesand applications to work on Hive and Spark.
  • Converted existing MapReduce jobs into Spark transformations and actions usingSpark RDDs, Data frames and Spark SQL APIs.
  • Wrote new spark jobs in Scala to analyze the data of the customers and sales history.
  • Involved in requirement analysis, design, coding and implementation phases of theproject.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Experience in both SQLContext and Spark Session.
  • Developed Scala based Spark applications for performing data cleansing, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop and written the PySparkScript to process the HDFS data.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Extensively worked on the core and Spark SQL modules of Spark.
  • Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Created partitioned tables and loaded data using both static partition and dynamicpartition method.¥Implemented POC's on migrating to Spark-Streaming to process the live data.¥Executed Hive queries on Parquet tables stored in Hive to perform data analysis tomeet the business requirements.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to HDFS as per the business requirement.
  • Used Impala to read, write and query the data in HDFS.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Stored the output files for export onto HDFS and later these files are picked up bydownstream systems
  • Load the data into Spark RDD and do in memory data Computation to generate theOutput response.

Environment:Hadoop 2.x, Spark Core, Spark SQL, Spark API Spark Streaming, Pyspark,Hive, Oozie, Amazon EMR, Tableau, Impala, RDBMS,YARN, JIRA, MapReduce

Apache Big Data Data Analysis Data Cleansing Hadoop Hadoop Developer HDFS Hive impala Machine Learning MapReduce Metadata Oozie Pig Python Spark SQL Sqoop Tableau
Remove Skill
Software Developer
Information Technology
Apr 2014 - Apr 2016

Responsibilities

  • Launched Amazon EC2 Instances using AWS (Linux/ Ubuntu/RHEL) and configure dinstances with respect to specific applications
  • Conducted functional testing, regression resting using Java-Selenium WebDriver andData-driven framework and Keyword-driven framework using Page Factory model.
  • Experience in Selenium Grid for cross-platform, cross-browser and parallel tests usingTestNG and Maven¥Experienced in working with the Protractor.
  • Used Jenkins to execute the test scripts periodically on Selenium Grid for different platforms
  • Expertise in grouping of test suites, test cases and test methods for regression and functional testing using TestNG annotations¥Experienced in writing test cases and conducted sanity, regression, integration, unittest, black-box and white-box tests7
  • Integrated Jenkins with Git version control to schedule automatic builds usingpredefined maven commands.
  • Developed BDD framework from scratch using Cucumber and defined steps, scenariosand features
  • Utilized Apache POI jar file to read test data from the excel spreadsheets and loadthem into test cases.
  • Administered and Engineered Jenkins for managing weekly Build, Test, and Deploy chain, SVN/GIT with Dev/Test/Prod Branching Model for weekly releases.
  • Handled Selenium Synchronization problems using Explicit & Implicit waits duringregression testing.
  • Experienced in writing complex and dynamic Xpaths ¥Executed test cases in real device for both mobile app and mobile website.
  • Thorough experience in implementing Automation tools Selenium WebDriver, JUnit,TestNG, Eclipse, Git/GitHub, Jenkins, SOAP UI and REST with POSTMAN.¥Used cucumber to automate services using Rest API.
  • Created profiles in maven to launch specific TestNG suite from Jenkins job¥Implemented SOAP UI tool to test SOAP based architecture application to test SOAPservices and RESTAPI.
  • Used the Groovy language to verify Webservices through SOAP UI.
  • Experience in testing the cloud platform.
  • Shared Daily Status Reports with all the team members, Team Leads, Managers

Environments: Selenium IDE, Groovy, RC Web Driver, Cucumber, HPQC, My Eclipse, JIRA,MySQL, Oracle, Java, JavaScript .Net, Python, Microservices, Restful API Testing, JMeter,VBScript, JUnit, TestNG, Firebug, Xpath, Windows

.NET Apache AWS Eclipse Git Groovy Java JavaScript Jenkins Junit Linux Maven MySQL Oracle Python REST Selenium SOAP SVN Ubuntu UI WebServices
Remove Skill
Edit Skills
Non-cloudteam Skill
Education
Master's in Business Administration
University of Wales
Skills
Apache
2021
5
Python
2021
5
Linux
2021
4
AWS
2021
3
Eclipse
2021
3
Hadoop
2021
3
Hadoop Developer
2021
3
HDFS
2021
3
Hive
2021
3
Java
2018
3
MapReduce
2021
3
MySQL
2018
3
Oozie
2021
3
Oracle
2021
3
Pig
2021
3
Spark
2021
3
Sqoop
2021
3
Tableau
2021
3
Ubuntu
2018
3
WebServices
2018
3
.NET
2016
2
ETL
2021
2
Flume
2021
2
Git
2016
2
Groovy
2016
2
JavaScript
2016
2
Jenkins
2016
2
Junit
2016
2
Machine Learning
2021
2
Maven
2016
2
REST
2016
2
Selenium
2016
2
SOAP
2016
2
SVN
2016
2
UI
2016
2
Big Data
2017
1
Data Analysis
2017
1
Data Cleansing
2017
1
Data Integration
2021
1
Data Warehousing
2018
1
Elasticsearch
2018
1
Hbase
2021
1
impala
2017
1
JSON
2018
1
Metadata
2017
1
MongoDB
2021
1
OpenShift
2018
1
Shell Scripts
2018
1
SQL
2017
1
Tableau Desktop
2018
1
Teradata
2021
1
UNIX
2021
1
Data Architecture
0
1
Data Integrity
0
1
Data Modeling
0
1
Data Profiling
0
1
HTML
0
1
J2EE
0
1
JDBC
0
1
jQuery
0
1
JSP
0
1
MS Azure
0
1
node.js
0
1
XML
0
1