Sign In
Looking for talent?
Check out our hiring section
Login to your account
Remember me?
Login
Forgot password?
Not a user yet?
Click here to register.
LOADING
Select Login
Uploaded File
Shiromani
shiromani.neupane27@gmail.com
neupaneshiro@yahoo.com
410-302-4946
Ellicott City, MD 21041
Hadoop/Spark Developer (Big Data Engineer)
11 years experience
W2
0
Recommendations
Average rating
406
Profile views
Summary
9+ years of extensive hands-on experience with Big Data Engineer stack including Spark, HDFS, MapReduce, Sqoop, Hive, Pig, HBase, Oozie, Flume, Kafka, Zookeeper, Cloudera, and Databricks.
7 +Years of experience in Big Data Analytics using Various Hadoop eco-systems tools and Spark Framework and currently working on spark and Spark Streaming Frameworks extensively using Scala as the main programming dialect Comfortable working with various facets of the big data ecosystem, real-time or batch, Structured or Unstructured data processing.
Developed robust RESTful APIs using Facets API framework to facilitate seamless communication between different software systems.
Designed and implemented API endpoints to retrieve, create, update, and delete data from Facets databases, ensuring data integrity and security.
Hands on experience in Azure Data Factory, Azure Databricks, Azure Devops, Azure Blob Storage, Azure Data Lake, Azure Functions, Azure SQL DB with integrating with various clouds data storage systems.
Utilized Azure DevOps to deploy Azure Machine Learning Studio, Databricks notebooks, Synapse notebooks, and Azure Data Factory pipelines, leveraging GitHub, PowerShell scripts, and Visual Studio.
Employed Azure DevOps to deploy Azure Machine Learning Studio, Databricks notebooks, Synapse notebooks, and Azure Data Factory pipelines using GitHub, PowerShell scripts, and Visual Studio.
Integrated third-party systems with Facets API, payment gateways, and customer relationship management (CRM) tools.
Utilized an orchestration strategy to deploy and manage applications, employing tools such as ARM templates, Azure Automation, Azure Pipelines, Logic Apps, or Azure Functions.
Designed, implemented, and maintained scalable data pipelines on AWS using services such as AWS Glue, Lambda, and Kinesis.
Developed ETL processes to ingest, transform, and store data in Amazon Redshift and Amazon S3.
Collaborated with data scientists and analysts to understand data requirements and ensure data quality and accessibility.
Managed and optimized data storage solutions using AWS S3, Redshift, DynamoDB, and RDS.
Skilled in working with big data technologies such as Hadoop, Apache Spark, Apache Kafka, and HBase within the Azure ecosystem, enabling efficient storage, processing, and analysis of large-scale data sets.
Proficient in SQL and NoSQL databases on Azure, including Azure SQL Database, Azure Cosmos DB, and Azure Database for PostgreSQL, with expertise in designing and optimizing database schemas for performance and scalability.
Experienced in building and optimizing data pipelines using Azure services like Azure Data Factory, Azure Stream Analytics, and Azure Event Hubs for real-time data processing and analytics.
Proficient in designing, developing, and deploying big data solutions on the Azure cloud platform, leveraging services such as Azure Data Lake Storage, Azure Databricks, Azure HDInsight, and Azure Synapse Analytics.
Optimized API performance by fine-tuning queries, implementing caching mechanisms, and leveraging asynchronous processing techniques.
Collaborated with cross-functional teams, including software engineers, QA testers, and product managers, to gather requirements and deliver high-quality API solutions.
Conducted code reviews and provided constructive feedback to peers to maintain code quality, consistency, and adherence to best practices.
5+ years' experience with Cloudera and CDP technologies includes familiarity with Cloudera Manager, Cloudera Data Hub, Cloudera Data Warehouse, and related tools.
Experience with NoSQL databases like HBase as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, Spark
Streaming/SQL, Kafka, Flume. Delta Lake for managing and processing large-scale data pipelines for reliability, performance, and data integrity to big data workloads.
Orchestrated data workflows in AWS, utilizing services like S3 for data storage, Glue for ETL jobs, and Athena for ad-hoc querying of data stored in S3.
Leveraged AWS data services such as S3, Glue, and Athena for building cloud-based data pipelines, enabling serverless ETL and ad-hoc querying capabilities.
Configured and optimized Amazon EMR clusters to handle large-scale data processing workloads, ensuring scalability, reliability, and cost-effectiveness.
Developed and optimized complex SQL queries for data extraction, transformation, and loading, ensuring accurate and efficient data retrieval.
Implemented and maintained real-time data streaming solutions using Apache Kafka and AWS Kinesis, enabling timely processing and analysis of streaming data.
Utilized AWS Glue for data cataloging and metadata management, ensuring consistency and accessibility of metadata across various AWS services. Spark on Databricks for managing and processing large-scale data pipelines for reliability, performance, and data integrity to big data workloads.
Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL development using Kafka, Flume and Sqoop.
Designing, developing, and deploying data pipelines using Cloudera and CDP tools and technologies.
Expertise skills in handling analytics projects using Big Data technologies. Hands on experience in ingesting data from external servers to Hadoop.
Experience in moving large amounts of logs, streaming event data and Transactional data using Flume.
Hands on experience developing workflows that execute Sqoop, Pig, Hive and Shell scripts using Oozie.
Good experience with Hive Data Warehousing concepts like Static/Dynamic Partitioning, Bucketing, Managed, and External Tables, join operations on tables.
Expert knowledge in creating, updating, maintaining, and scheduling through calendars and troubleshoot of Control-M jobs and batch flows.
Collaborated with business analysts and stakeholders to understand data requirements and translate th
Experience
Edit Skills
Non-cloudteam Skill
Education
Bachelor's in finance
Tribhuvan University 2007
Record has not been verified.
Attended in Electrical Engineering
Tribhuvan University 2003
Record has not been verified.
Skills
Big Data
2024
18
Data Engineering
2024
18
Hadoop
2024
18
HDFS
2024
18
MapReduce
2024
18
Spark
2024
18
ETL
2024
16
Java
2024
16
Flume
2021
15
Hbase
2021
15
Hive
2021
15
Oozie
2021
15
Oracle
2021
15
Pig
2021
15
Python
2021
15
Sqoop
2021
15
Apache
2024
14
AWS
2024
13
PySpark
2021
12
Data Warehousing
2024
10
Kafka
2024
10
MongoDB
2021
9
SQL
2021
9
Scripting
2021
8
Shell Scripts
2022
8
Eclipse
2021
7
Hadoop Developer
2021
7
Data Cleansing
2020
6
Data Lakes
2020
6
Data Management
2020
6
Docker Containers
2020
6
MapR
2020
6
Microsoft Excel
2021
6
Performance Tuning
2024
6
UNIX
2020
6
Windows
2020
6
Data Integration
2024
5
Data Modeling
2024
5
impala
2021
5
SQL Server
2021
5
AWS S3
2021
4
Cassandra
2021
4
Data Validation
2024
4
Netbeans
2021
4
API Development
2024
3
Compliance
2024
3
Data Governance
2020
3
Data Integrity
2024
3
Data Security
2024
3
Design Patterns
2024
3
Informatica
2024
3
Informatica Powercenter
2024
3
Linux
2016
3
Maven
2016
3
MS Azure
2024
3
MySQL
2016
3
OAuth
2024
3
Pipeline
2022
3
PL/SQL
2022
3
Salesforce
2024
3
Talend Studio
2024
3
Triggers
2024
3
Agile Methodology
2021
2
AWS EC2
2021
2
Cloud Computing
2021
2
Data Access
2016
2
Data Analysis
2016
2
Data Architecture
2021
2
Erwin Data Modler
2021
2
Git
2021
2
JDBC
2021
2
Metadata
2016
2
MS Power BI
2021
2
Netezza
2021
2
OLTP
2021
2
PostgreSQL
2016
2
Scrum
2021
2
Snowflake
2021
2
SSIS
2021
2
SSRS
2021
2
Stored Procedure
2021
2
BaSH
2021
1
Data Marts
2021
1
Analytics
0
1
Apache Tomcat
0
1
ARM
0
1
AWS EMR
0
1
AWS Lambda
0
1
BEA WebLogic
2013
1
BMC Control-M
2021
1
Business Analysis
0
1
C
0
1
C++
0
1
CentOS
0
1
Continuous Deployment
2021
1
Continuous Integration
2021
1
CSS
0
1
Data Mining
0
1
Data Science
0
1
EJB
2013
1
Hibernate
2013
1
HTML
0
1
J2EE
2013
1
Java Servlet
2013
1
JavaScript
2013
1
JSP
2013
1
Junit
2013
1
Machine Learning
0
1
OpenShift
0
1
PowerShell
0
1
Requirements Gathering
0
1
REST
0
1
RHadoop
0
1
Scala
2021
1
Shipping
0
1
SOAP
2013
1
Software Engineer
0
1
Spark Core
2021
1
Spark Streaming
2021
1
SQL Developer
0
1
Struts
2013
1
SVN
0
1
Tableau
0
1
Ubuntu
0
1
Visual Studio
0
1
VMWare
0
1
WSDL
2013
1
XML
2013
1