cloudteam profile

Vishnu

vardh3282@gmail.com

469-312-6820

Dallas, TX 75398

Big Data Engineer

11 years experience W2

Recommendations

Average rating

178

Profile views

Summary

7 years of experience into Big Data stacks
AWS cloud, PySpark, Python, SQL, MPP Databases and Dell Boomi
Designed, developed and deployed multiple high-throughput, scalable and complex big data ETL pipelines in health care, e-commerce and finance domains.
Performed spark jobs using AWS services like EC2, S3, EMR, Lambda, Step Function, Redshift, AWS Glue, Data Sync, ECS and Dell Boomi.
Experienced in creating IAM roles and policies for different AWS services to ensure security while performing a task.
Familiar with creating and managing the infrastructure stack using AWS CloudFormation.
Developed data pipelines involving both relational and non-relational databases to perform ingestion into AWS S3.
Experienced in production support like performance tuning, job optimization, job monitoring and job automation.
Experience in developing generic frameworks for data ingestion, data curation, data migration, and analytic frameworks using Spark in Scala and Python language.
Skilled in handling structured and semi-structured data like CSV, Parquet, Avro, XML, JSON etc.
IDEs used
Databricks, IntelliJ, PyCharm, Sublime Text, Jupyter Notebook, RStudio, Atom
Strong knowledge in Spark ecosystems such as Spark core, Datasets/DataFrame, Spark SQL, Spark Streaming libraries, wiring UDFs.
Experience in consuming data from various data sources like Kafka, S3, SFTP servers, etc., and multiple data stores like HBase, Hive, Athena, DynamoDB, etc.,
Good knowledge in utilizing Jenkins and GitHub to perform Continuous Integration and Deployment (CI/CD).
Experience in performance tuning of spark applications from various aspects.
Extensive knowledge in developing spark streaming jobs with a good knowledge on Kafka.
Good understanding of different databases like MySQL, PostgreSQL, MongoDB, Cassandra, HBase, HDFS.
Experienced in data ingestion, data curation and data manipulation using Scala and Python.
Capable of performing data analysis and other numerical computations using Python libraries such as Pandas and NumPy.
Experienced in performing optimized spark jobs, bulk load/extract from Redshift tables, writing UDFs, submitting spark jobs to EMR in Scala also capable to perform these in Python.

Experience

Education

not provided

Skills

Data Engineering

2020

Apache

2020

AWS

2020

AWS S3

2020

Big Data

2020

ETL

2020

MongoDB

2020

Pipeline

2020

Spark

2020

SQL

2020

AWS Redshift

2020

Data Migration

2020

Data Warehousing

2020

MySQL

2018

Python

2020

AWS EC2

2018

Cassandra

2018

Data Analysis

2020

Data Cleansing

2016

Data Science

2018

Hbase

2018

JSON

2018

Machine Learning

2018

Microsoft Excel

2016

MS Azure

2018

PostgreSQL

2018

Snowflake

2018

SQL Server

2018

Teradata

2018

AWS CloudFormation

2020

Business Analysis

2020

Cloud Infrastructure

2020

Data Integration

2016

Hive

2020

Jenkins

Kafka

MapReduce

2016

Performance Tuning

Production Support

2020

PySpark

2020

Shell Scripts

2020

XML