Shiromani

shiromani.neupane27@gmail.com

neupaneshiro@yahoo.com

410-302-4946

Ellicott City, MD 21041

Hadoop/Spark Developer (Big Data Engineer)

12 years experience W2

Recommendations

Average rating

415

Profile views

Summary

9+ years of extensive hands-on experience with Big Data Engineer stack including Spark, HDFS, MapReduce, Sqoop, Hive, Pig, HBase, Oozie, Flume, Kafka, Zookeeper, Cloudera, and Databricks.
7 +Years of experience in Big Data Analytics using Various Hadoop eco-systems tools and Spark Framework and currently working on spark and Spark Streaming Frameworks extensively using Scala as the main programming dialect Comfortable working with various facets of the big data ecosystem, real-time or batch, Structured or Unstructured data processing.
Developed robust RESTful APIs using Facets API framework to facilitate seamless communication between different software systems.
Designed and implemented API endpoints to retrieve, create, update, and delete data from Facets databases, ensuring data integrity and security.
Hands on experience in Azure Data Factory, Azure Databricks, Azure Devops, Azure Blob Storage, Azure Data Lake, Azure Functions, Azure SQL DB with integrating with various clouds data storage systems.
Utilized Azure DevOps to deploy Azure Machine Learning Studio, Databricks notebooks, Synapse notebooks, and Azure Data Factory pipelines, leveraging GitHub, PowerShell scripts, and Visual Studio.
Employed Azure DevOps to deploy Azure Machine Learning Studio, Databricks notebooks, Synapse notebooks, and Azure Data Factory pipelines using GitHub, PowerShell scripts, and Visual Studio.
Integrated third-party systems with Facets API, payment gateways, and customer relationship management (CRM) tools.
Utilized an orchestration strategy to deploy and manage applications, employing tools such as ARM templates, Azure Automation, Azure Pipelines, Logic Apps, or Azure Functions.
Designed, implemented, and maintained scalable data pipelines on AWS using services such as AWS Glue, Lambda, and Kinesis.
Developed ETL processes to ingest, transform, and store data in Amazon Redshift and Amazon S3.
Collaborated with data scientists and analysts to understand data requirements and ensure data quality and accessibility.
Managed and optimized data storage solutions using AWS S3, Redshift, DynamoDB, and RDS.
Skilled in working with big data technologies such as Hadoop, Apache Spark, Apache Kafka, and HBase within the Azure ecosystem, enabling efficient storage, processing, and analysis of large-scale data sets.
Proficient in SQL and NoSQL databases on Azure, including Azure SQL Database, Azure Cosmos DB, and Azure Database for PostgreSQL, with expertise in designing and optimizing database schemas for performance and scalability.
Experienced in building and optimizing data pipelines using Azure services like Azure Data Factory, Azure Stream Analytics, and Azure Event Hubs for real-time data processing and analytics.
Proficient in designing, developing, and deploying big data solutions on the Azure cloud platform, leveraging services such as Azure Data Lake Storage, Azure Databricks, Azure HDInsight, and Azure Synapse Analytics.
Optimized API performance by fine-tuning queries, implementing caching mechanisms, and leveraging asynchronous processing techniques.
Collaborated with cross-functional teams, including software engineers, QA testers, and product managers, to gather requirements and deliver high-quality API solutions.
Conducted code reviews and provided constructive feedback to peers to maintain code quality, consistency, and adherence to best practices.
5+ years' experience with Cloudera and CDP technologies includes familiarity with Cloudera Manager, Cloudera Data Hub, Cloudera Data Warehouse, and related tools.
Experience with NoSQL databases like HBase as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, Spark
Streaming/SQL, Kafka, Flume. Delta Lake for managing and processing large-scale data pipelines for reliability, performance, and data integrity to big data workloads.
Orchestrated data workflows in AWS, utilizing services like S3 for data storage, Glue for ETL jobs, and Athena for ad-hoc querying of data stored in S3.
Leveraged AWS data services such as S3, Glue, and Athena for building cloud-based data pipelines, enabling serverless ETL and ad-hoc querying capabilities.
Configured and optimized Amazon EMR clusters to handle large-scale data processing workloads, ensuring scalability, reliability, and cost-effectiveness.
Developed and optimized complex SQL queries for data extraction, transformation, and loading, ensuring accurate and efficient data retrieval.
Implemented and maintained real-time data streaming solutions using Apache Kafka and AWS Kinesis, enabling timely processing and analysis of streaming data.
Utilized AWS Glue for data cataloging and metadata management, ensuring consistency and accessibility of metadata across various AWS services. Spark on Databricks for managing and processing large-scale data pipelines for reliability, performance, and data integrity to big data workloads.
Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL development using Kafka, Flume and Sqoop.
Designing, developing, and deploying data pipelines using Cloudera and CDP tools and technologies.
Expertise skills in handling analytics projects using Big Data technologies. Hands on experience in ingesting data from external servers to Hadoop.
Experience in moving large amounts of logs, streaming event data and Transactional data using Flume.
Hands on experience developing workflows that execute Sqoop, Pig, Hive and Shell scripts using Oozie.
Good experience with Hive Data Warehousing concepts like Static/Dynamic Partitioning, Bucketing, Managed, and External Tables, join operations on tables.
Expert knowledge in creating, updating, maintaining, and scheduling through calendars and troubleshoot of Control-M jobs and batch flows.
Collaborated with business analysts and stakeholders to understand data requirements and translate th

Experience

Education

Bachelor's in finance

Tribhuvan University 2007

Attended in Electrical Engineering

Tribhuvan University 2003

Skills

Big Data

2024

Data Engineering

2024

Hadoop

2024

HDFS

2024

MapReduce

2024

Spark

2024

ETL

2024

Java

2024

Flume

2021

Hbase

2021

Hive

2021

Oozie

2021

Oracle

2021

Pig

2021

Python

2021

Sqoop

2021

Apache

2024

AWS

2024

PySpark

2021

Data Warehousing

2024

Kafka

2024

MongoDB

2021

SQL

2021

Scripting

2021

Shell Scripts

2022

Eclipse

2021

Hadoop Developer

2021

Data Cleansing

2020

Data Lakes

2020

Data Management

2020

Docker Containers

2020

MapR

2020

Microsoft Excel

2021

Performance Tuning

2024

UNIX

2020

Windows

2020

Data Integration

2024

Data Modeling

2024

impala

2021

SQL Server

2021

AWS S3

2021

Cassandra

2021

Data Validation

2024

Netbeans

2021

API Development

2024

Compliance

2024

Data Governance

2020

Data Integrity

2024

Data Security

2024

Design Patterns

2024

Informatica

2024

Informatica Powercenter

2024

Linux

2016

Maven

2016

MS Azure

2024

MySQL

2016

OAuth

2024

Pipeline

2022

PL/SQL

2022

Salesforce

2024

Talend Studio

2024

Triggers

2024

Agile Methodology

2021

AWS EC2

2021

Cloud Computing

2021

Data Access

2016

Data Analysis

2016

Data Architecture

2021

Erwin Data Modler

2021

Git

2021

JDBC

2021

Metadata

2016

MS Power BI

2021

Netezza

2021

OLTP

2021

PostgreSQL

2016

Scrum

2021

Snowflake

2021

SSIS

2021

SSRS

2021

Stored Procedure

2021

BaSH

2021

Data Marts

2021

Analytics

Apache Tomcat

ARM

AWS EMR

AWS Lambda

BEA WebLogic

2013

BMC Control-M

2021

Business Analysis

C++

CentOS

Continuous Deployment

2021

Continuous Integration

2021

CSS

Data Mining

Data Science

EJB

2013

Hibernate

2013

HTML

J2EE

2013

Java Servlet

2013

JavaScript

2013

JSP

2013

Junit

2013

Machine Learning

OpenShift

PowerShell

Requirements Gathering

REST

RHadoop

Scala

2021

Shipping

SOAP

2013

Software Engineer

Spark Core

2021

Spark Streaming

2021

SQL Developer

Struts

2013

SVN

Tableau

Ubuntu

Visual Studio

VMWare

WSDL

2013

XML

2013