add photo
Mohd
opukarimjob@gmail.com
929-436-7306
New York City, NY 10259
HADOOP DEVELOPER
9 years experience W2
0
Recommendations
Average rating
181
Profile views
Summary

  • 5 years of IT experience in analysis, design, development, and implementation of large-scale applications using Big Data and Spark/J2EE technologies such as Apache Spark, Hadoop, Hive, Pig, Sqoop, Oozie, HBase, Zookeeper, Python & Scala.
  • Strong experience writing Spark Core, Spark SQL, Spark Streaming, Spark MapReduce, Spark on Spark Applications.
  • Experienced in Apache Spark, Hive and Pig's analytical functions and extending Spark, Hive and Pig functionality by writing custom UDFs and hooking UDF's into larger Spark applications to be used as in-line functions.
  • Experience with installing backup, recovery, configuration and development on multiple Hadoop distribution platforms Cloudera and Hortonworks including cloud platforms Amazon AWS and Google Cloud. batch processing of data sources using Apache Spark and Elastic search.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis
  • Migrated Hive QL queries on structured into Spark QL to improve performance
  • Configured, deployed, and maintained a single node storm cluster in DEV environment
  • Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and Json files, ORC and Parquet).
  • Handled importing of data from RDBMS into HDFS using Sqoop.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Experienced in writing Hive Scripts for analyzing data in Hive warehouse using Hive Query Language
  • Involved in creating Hive tables, loading with data, and writing hive queries to process the data.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Created scripts to automate the process of Data Ingestion.
  • Experience in using Testing Frameworks of Bigdata world, MRUnit, PIG Unit for testing raw data and executed performance scripts. Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with Data Frames in Spark.
  • Extend the capabilities of Data Frames using User Defined Functions in Python and Scala.
  • Collaborate with key stake holders and translate business requirements to technical requirements and implement solutions under the guidance of technical leads.
  • Responsible for building data pipelines to ingest data, integrate data from multiple data sources (On-Premises & Cloud) and create aggregated data sets for reporting needs.
  • Designs new software and web applications, supports applications under development and customizes current applications. Develops software update process for existing applications. Assists in the roll-out of software releases.
  • Displays expertise in process design and redesign skills. Presents and defends architectural, design and technical choices to internal audiences.
  • Setup & Managing windows Servers on Amazon using EC2, EBS, ELB, SSL, Security Groups, RDS and IAM.
  • Managing VPC, Subnets make connection between different zones Blocking suspicious IP/subnet via ACL.
  • Managing CDN on Amazon Cloud Front (Origin Path: Server / S3) to improve site performance.
  • Create & Managing buckets on S3 and store db and logs backup, upload images for CDN server.
  • Setup databases on Amazon RDS or EC2 instances as per requirement.
  • Developing the Equities Trading system in Core Java. Experience working as a Core Java Developer within the banking industry.
  • Strong Java development skills, with good understanding of core java. Experience working as a Core Java Developer within the banking industry.
  • Worked on Airflow performance tuning of the DAG's and task instance. Worked on Airflow scheduler (celery) and worker setting in airflow file
  • Worked with Docker images to maintain application versions. Kept Docker images up to date with the latest changes. Maintained image versioning on docker repository.
  • Worked with deploying workloads to Kubernetes clusters. Ensured application pods were healthy after deployment. Monitored cluster health and orchestration.
  • Worked with Data Dog for migration of legacy system to cloud native solution. Leveraged Data Dog monitoring to ensure migration workflows were passing
  • Worked with Kafka to process streaming data. Managed Kafka configuration to maintain system decoupling. Monitored application logs gathering to ensure Kafka workflow remained error free. Environment: HDFS, Apache Spark, Kafka, Cassandra, Hive, Scala, Spark, Sqoop's, Shell scripting, Pyspark, AWS. Core Java. AMERICAN EXPRESS NEW YORK, NY BIG DATA DEVELOPER April 2020
  • August 2021
  • Developed NiFi workflows to automate the data movement between different Hadoop systems.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Imported large datasets from DB2 to Hive Table using Sqoop
  • Implemented Apache PIG scripts to load data from and to store data into Hive.
  • Partitioned and bucketed Hive tables and compressed data with Snappy to load data into Parquet hive tables from Avro hive tables
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through Spark SQL
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Responsible for implementing ETL process through Kafka-Spark-HBase Integration as per the requ

Experience
Education
Bachelor's in Marketing
STANFORD UNIVERSITY BANGLADESH 2008
Skills
Spark
2022
5
Big Data
2021
4
Scripting
2022
4
Shell Scripts
2022
4
SQL
2022
4
Apache
2022
3
AWS
2021
3
Hadoop
2022
3
HDFS
2022
3
Hive
2022
3
Kafka
2022
3
Linux
2021
3
MongoDB
2021
3
MySQL
2020
3
Pig
2022
3
Python
2022
3
Sqoop
2022
3
XML
2022
3
AJAX
2020
2
AngularJS
2020
2
backbone.js
2020
2
Bootstrap
2020
2
CSS
2020
2
Database Backups
2022
2
Django
2020
2
Git
2020
2
HTML
2020
2
IBM Websphere MQ
2020
2
impala
2021
2
Jenkins
2020
2
jQuery
2020
2
MapReduce
2021
2
MVC
2020
2
Oozie
2021
2
PostgreSQL
2020
2
Requirements Gathering
2020
2
Selenium
2020
2
SQL Server
2020
2
Triggers
2020
2
Data Integration
2021
1
Database Upgrades
2018
1
DB2
2021
1
Docker Containers
2018
1
ETL
2021
1
Flume
2018
1
Hadoop Admin
2018
1
Hadoop Developer
2022
1
Hbase
2021
1
Java
2022
1
JSON
2022
1
OLTP
2021
1
Performance Tuning
2018
1
PySpark
2022
1
Windows
2022
1
J2EE
0
1