cloudteam profile

Mohd

opukarimjob@gmail.com

929-436-7306

New York City, NY 10259

HADOOP DEVELOPER

9 years experience W2

Recommendations

Average rating

189

Profile views

Summary

5 years of IT experience in analysis, design, development, and implementation of large-scale applications using Big Data and Spark/J2EE technologies such as Apache Spark, Hadoop, Hive, Pig, Sqoop, Oozie, HBase, Zookeeper, Python & Scala.
Strong experience writing Spark Core, Spark SQL, Spark Streaming, Spark MapReduce, Spark on Spark Applications.
Experienced in Apache Spark, Hive and Pig's analytical functions and extending Spark, Hive and Pig functionality by writing custom UDFs and hooking UDF's into larger Spark applications to be used as in-line functions.
Experience with installing backup, recovery, configuration and development on multiple Hadoop distribution platforms Cloudera and Hortonworks including cloud platforms Amazon AWS and Google Cloud. batch processing of data sources using Apache Spark and Elastic search.
Experienced in implementing Spark RDD transformations, actions to implement business analysis
Migrated Hive QL queries on structured into Spark QL to improve performance
Configured, deployed, and maintained a single node storm cluster in DEV environment
Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and Json files, ORC and Parquet).
Handled importing of data from RDBMS into HDFS using Sqoop.
Developed Spark scripts to import large files from Amazon S3 buckets.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Experienced in writing Hive Scripts for analyzing data in Hive warehouse using Hive Query Language
Involved in creating Hive tables, loading with data, and writing hive queries to process the data.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Created scripts to automate the process of Data Ingestion.
Experience in using Testing Frameworks of Bigdata world, MRUnit, PIG Unit for testing raw data and executed performance scripts. Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with Data Frames in Spark.
Extend the capabilities of Data Frames using User Defined Functions in Python and Scala.
Collaborate with key stake holders and translate business requirements to technical requirements and implement solutions under the guidance of technical leads.
Responsible for building data pipelines to ingest data, integrate data from multiple data sources (On-Premises & Cloud) and create aggregated data sets for reporting needs.
Designs new software and web applications, supports applications under development and customizes current applications. Develops software update process for existing applications. Assists in the roll-out of software releases.
Displays expertise in process design and redesign skills. Presents and defends architectural, design and technical choices to internal audiences.
Setup & Managing windows Servers on Amazon using EC2, EBS, ELB, SSL, Security Groups, RDS and IAM.
Managing VPC, Subnets make connection between different zones Blocking suspicious IP/subnet via ACL.
Managing CDN on Amazon Cloud Front (Origin Path: Server / S3) to improve site performance.
Create & Managing buckets on S3 and store db and logs backup, upload images for CDN server.
Setup databases on Amazon RDS or EC2 instances as per requirement.
Developing the Equities Trading system in Core Java. Experience working as a Core Java Developer within the banking industry.
Strong Java development skills, with good understanding of core java. Experience working as a Core Java Developer within the banking industry.
Worked on Airflow performance tuning of the DAG's and task instance. Worked on Airflow scheduler (celery) and worker setting in airflow file
Worked with Docker images to maintain application versions. Kept Docker images up to date with the latest changes. Maintained image versioning on docker repository.
Worked with deploying workloads to Kubernetes clusters. Ensured application pods were healthy after deployment. Monitored cluster health and orchestration.
Worked with Data Dog for migration of legacy system to cloud native solution. Leveraged Data Dog monitoring to ensure migration workflows were passing
Worked with Kafka to process streaming data. Managed Kafka configuration to maintain system decoupling. Monitored application logs gathering to ensure Kafka workflow remained error free. Environment: HDFS, Apache Spark, Kafka, Cassandra, Hive, Scala, Spark, Sqoop's, Shell scripting, Pyspark, AWS. Core Java. AMERICAN EXPRESS NEW YORK, NY BIG DATA DEVELOPER April 2020
August 2021
Developed NiFi workflows to automate the data movement between different Hadoop systems.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Imported large datasets from DB2 to Hive Table using Sqoop
Implemented Apache PIG scripts to load data from and to store data into Hive.
Partitioned and bucketed Hive tables and compressed data with Snappy to load data into Parquet hive tables from Avro hive tables
Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through Spark SQL
Developed Spark scripts by using Scala Shell commands as per the requirement.
Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requ

Experience

Education

Bachelor's in Marketing

STANFORD UNIVERSITY BANGLADESH 2008

Skills

Spark

2022

Big Data

2021

Scripting

2022

Shell Scripts

2022

SQL

2022

Apache

2022

AWS

2021

Hadoop

2022

HDFS

2022

Hive

2022

Kafka

2022

Linux

2021

MongoDB

2021

MySQL

2020

Pig

2022

Python

2022

Sqoop

2022

XML

2022

AJAX

2020

AngularJS

2020

backbone.js

2020

Bootstrap

2020

CSS

2020

Database Backups

2022

Django

2020

Git

2020

HTML

2020

IBM Websphere MQ

2020

impala

2021

Jenkins

2020

jQuery

2020

MapReduce

2021

MVC

2020

Oozie

2021

PostgreSQL

2020

Requirements Gathering

2020

Selenium

2020

SQL Server

2020

Triggers

2020

Data Integration

2021

Database Upgrades

2018

DB2

2021

Docker Containers

2018

ETL

2021

Flume

2018

Hadoop Admin

2018

Hadoop Developer

2022

Hbase

2021

Java

2022

JSON

2022

OLTP

2021

Performance Tuning

2018

PySpark

2022

Windows

2022

J2EE