Uploaded File
add photo
Puneeth
rpuneeth253@gmail.com
732-640-6562
Atlanta, GA 30301
Machine Learning/Data Scientist
6 years experience W2
0
Recommendations
Average rating
21
Profile views
Summary

Over 6 years of experience in Data science / Data analysis, ETL Development, and Project Management.

  • Having Experience in all phases of diverse technology projects specializing in Data Science and Machine Learning.
  • Proven expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization Methods and Natural Language Processing (NLP), Time Series Analysis.
  • Experienced in Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR (Support Vector Regression), Decision Tree Regression, Random Forest Regression.
  • Experienced in advanced statistical analysis and predictive modelling in the structured and unstructured data environment.
  • Expertise in Hadoop ecosystem components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Spark, Spark SQL, Spark Streaming, and Hive for scalability, distributed computing, and high-performance computing
  • Strong knowledge of NOSQL column-oriented databases like HBase, Cassandra, Mongo DB, and Mark Logic and its integration with the Hadoop cluster.
  • Strong expertise in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Governance, Data Lineage, Data Integration, Master Data Management (MDM), Metadata Management Services, Reference Data Management (RDM).
  • Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, Scikit-learn, Mat plot lib, Sea born, Beautiful Soup, Orange, Rpy2, Lib SVM, Neurolab NLTK.
  • Solid understanding of AWS (Amazon Web Services) S3, EC2, RDS and IAM, Azure ML, Apache Spark, Scala process, and concepts.
  • Good Understanding of working on Artificial Neural Networks and Deep Learning models using Theano and Tensor Flow packages using in Python.
  • Experienced in Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree & Random Forest classification.
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in writing/documenting Technical Design Document (TDD), Functional Specification Document (FSD), Test Plans, GAP Analysis and Source to Target mapping documents.
  • Strong understanding of project life cycle and SDLC methodologies including RUP, RAD, Waterfall, and Agile.
  • Very good knowledge and understanding of Microsoft SQL Server, Oracle, Teradata, Hadoop/Hive.
  • Strong expertise in ETL, Data warehousing, Operational Data Store (ODS), Data Marts, OLAP and OLTP technologies.
  • Analytical, performance-focused, and detail-oriented professional, offering in-depth knowledge of data analysis and statistics utilized complex SQL queries for data manipulation.
  • Expertises in using Linear & Logistic Regression and Classification Modelling, Decision-trees, Principal Component Analysis (PCA), Cluster and Segmentation analyses, and have authored and co-authored several scholarly articles applying these techniques.
  • Assist in determining the full domain of the MVP, create and implement its relevant data model for the App and work with App developers integrating the MVP into the App and any backend domains.
  • Ensure REST-based API including all CRUD operations integrate with the App and other service domains.
  • Installing and configuring additional services on appropriate AWS EC2, RDS, S3 and/or other AWS service instances.
  • Integrating these services with each other and ensuring that user access to data, data storage, and communication between various services.
  • Excellent Team player and self-starter possess good communication skills.

Experience
Machine Learning/Data Scientist
Sep 2018 - present
Atlanta, GA
GreenSky, LLC is a third-party service provider and program administrator to federally insured, federal and state-chartered banks that provide consumer loans under the GreenSky LLC programs. GreenSky LLC helps businesses grow by giving them the ability to offer credit to their customers. Responsibilities:
  • Design and develop state-of-the-art deep-learning / machine-learnig algorithms for analyzing the image and video data among others.
  • Develop and implement innovative AI and machine learning tools that will be used in the Risk
  • Experience with Tensor Flow, Cafe and other Deep Learning frameworks.
  • Develop project requirements and deliverable timelines execute efficiently to meet the plan timelines.
  • Analyzing large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi-Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Caffe, Tensor Flow, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Involved with Data Analysis Primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Well experienced in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Understanding requirements, the significance of weld point data, energy efficiency using large datasets
  • Develop necessary connectors to plug ML software into wider data pipeline architectures.
  • Creating and supporting a data management workflow from data collection, storage, and analysis to training and validation.
  • Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualizations using mat plot lib and python.
  • Experience with Tensor Flow, Theano, Keras and other Deep Learning Frameworks.
  • Built Artificial Neural Network using Tensor Flow in Python to identify the customer's probability of canceling the connections. (Churn rate prediction)
  • Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
  • Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.
  • Developed NLP models for Topic Extraction, Sentiment Analysis
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Design and build scalable software architecture to enable real-time / big-data processing.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems.
  • Performed data analysis by using Hive to retrieve the data from the Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
  • ML performance a deep analysis of the HTPD/RTPD/LTPD test data to define a model of FBC growth rate across the temperature.
  • ML models for projection pre-production SLC, MLC, TLC single and multi-die packages ICC memory.
  • Used Tensor Flow library in dual GPU environment for training and testing of the Neural Networks
  • Develop necessary connectors to plug ML software into wider data pipeline architectures.
  • Identify and assess available machine learning and statistical analysis libraries (including Regressors classifiers, statistical tests, and clustering algorithms).
  • Design and build scalable software architecture to enable real-time / big-data processing.
  • Taking responsibility for technical problem solving, creatively meeting product objectives and developing best practices.
  • Have a high sense of urgency to deliver projects as well as troubleshoot and fix data queries/ issues.
  • Work independently with R&D partners to understand requirements.
  • Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions
  • Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects. Environment: R 9.0, R Studio, Machine learning, Informatica 9.0, Scala, Spark, Cassandra, ML, DL, Scikit-learn, Shogun, Data Warehouse, MLLib, Cloud era Oryx, Apache.
  • Business Analysis Data Analysis Data Architecture Data Warehousing ETL Hadoop Hbase Hive Informatica Machine Learning Oracle Spark SQL Statistical Analysis Teradata Data Science
    Remove Skill
    Data Scientist/Data Analyst
    Jan 2017 - Aug 2018
    Michaels Stores Inc., doing business as Michaels, is the largest American arts and crafts retail chain that currently operates more than 1,262 stores as of May 31, 2014. Responsibilities:
  • Involved in the design, development and testing phases of application using AGILE methodology.
  • Performed data wrangling to clean, transform and reshape the data utilizing pandas library. Analyzed data using SQL, R, Java, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
  • Worked with different datasets which includes both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
  • Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
  • Implemented public segmentation using unsupervised machine learning algorithms by implementing K-means algorithm by using PySpark using data munging.
  • Experience in Machine learning using NLP text classification using Python.
  • Worked on different Machine Learning models like Logistic Regression, Multi-layer perceptron classifier and K-means clustering.
  • Lead discussions with users to gather business processes requirements and data requirements to develop a variety of conceptual, logical and Physical Data models.
  • Expertise in Business intelligence and Data Visualization tools like Tableau.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce and loaded data into HDFS.
  • Good knowledge in Azure cloud services, Azure Storage to manage and configure the data.
  • Used R and Python for Exploratory Data Analysis to compare and identify the effectiveness of the data.
  • Created clusters to classify control and test groups.
  • Analyzed and calculated the life cost of everyone in a welfare system using 20 years of historical data.
  • Used Python, R, SQL to create statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, SVM for estimating and identifying the risks of welfare dependency.
  • Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend policies for different customers.
  • Performed analysis such as Regression analysis, Logistic Regression, Discriminant Analysis, Cluster analysis using SAS programming.
  • Worked on NoSQL databases including Cassandra, Mongo DB, and HBase to access the advantages and disadvantages of them for a goal of a project. Environment: Hadoop, HDFS, Python 3.x (Scikit -Learn/ Keras/ SciPy/ NumPy/ Pandas/ Matplotlib/ NLTK/ Seaborn), R (ggplot2/ caret/ trees/ arules), Tableau (9.x/10.x), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering / Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), GitHub, Agile/ SCRUM
  • Business Intelligence Data Analysis Data Cleansing Data Mining Data Visualization Hadoop Hbase HDFS Hive Linear Regression Logistic Regression Machine Learning MapReduce MS Azure SAS Spark SQL Tableau
    Remove Skill
    Data Scientist
    Jan 2015 - Nov 2016
    Fremont, CA
    Wells Fargo & Company, a diversified financial services company, provides retail, commercial, and corporate banking services to individuals, businesses, and institutions. Its Community Banking segment offers checking, savings, market rate, and individual retirement accounts, as well as time deposits and remittances. Responsibilities:
  • Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed data analysis by retrieving the data from the Hadoop cluster.
  • Analyzed frequently failing Informatica jobs for Data Quality Issues generated through an Informatica upgrade, duplicates and active indicator updates.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Explored and analyzed the customer specific features by using Matplotlib in Python and ggplot2 in R.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Python (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R to develop a variety of models and algorithms for analytic purposes.
  • Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis
  • Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests, and KNN to predict customer churn.
  • Conducted analysis of customer behaviors and discover the value of customers with RMF analysis applied customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.
  • Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different models' performance.
  • Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis.
  • Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.
  • Environment: Hadoop, HDFS, Python, R, Tableau, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA, GitHub, Agile/ SCRUM, GCP. Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Data Analysis Data Engineering Data Visualization ETL Hadoop HDFS Informatica Logistic Regression Machine Learning Spark SQL Tableau
    Remove Skill
    Data Analyst
    Information Technology
    Dec 2014 - Apr 2015
    Aspect Software, Inc. is an American multinational call center technology and customer experience company. Responsibilities:
  • Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
  • Worked with other teams to analyze customers to analyze parameters of marketing.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Was a part of the complete life cycle of the project from the requirements to the production support
  • Created test plan documents for all back-end database modules
  • Used MS Excel, MS Access and SQL to write and run various queries.
  • Used traceability matrix to trace the requirements of the organization.
  • Recommended structural changes and enhancements to systems and databases.
  • Conducted Design reviews and Technical reviews with other project stakeholders.
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Environment: UNIX, SQL, Oracle 10g, MS Office, MS Visio. Guaranteeing quality in the deliverables.
  • Data Analysis Oracle SQL
    Remove Skill
    Edit Skills
    Non-cloudteam Skill
    Education
    Skills
    Data Analysis
    2021
    4
    SQL
    2021
    4
    Hadoop
    2021
    3
    Machine Learning
    2021
    3
    Spark
    2021
    3
    Data Visualization
    2018
    2
    ETL
    2021
    2
    Hbase
    2021
    2
    HDFS
    2018
    2
    Hive
    2021
    2
    Informatica
    2021
    2
    Logistic Regression
    2018
    2
    Oracle
    2021
    2
    Tableau
    2018
    2
    Business Analysis
    2021
    1
    Business Intelligence
    2018
    1
    Data Architecture
    2021
    1
    Data Cleansing
    2018
    1
    Data Engineering
    2016
    1
    Data Mining
    2018
    1
    Data Science
    2021
    1
    Data Warehousing
    2021
    1
    Linear Regression
    2018
    1
    MapReduce
    2018
    1
    MS Azure
    2018
    1
    SAS
    2018
    1
    Statistical Analysis
    2021
    1
    Teradata
    2021
    1
    AWS
    0
    1
    Data Conversion
    0
    1
    Data Governance
    0
    1
    Data Integration
    0
    1
    Data Marts
    0
    1
    Data Migration
    0
    1
    Data Profiling
    0
    1
    Gap Analysis
    0
    1
    Metadata
    0
    1
    Pig
    0
    1
    Requirements Gathering
    0
    1
    SQL Server
    0
    1
    Sqoop
    0
    1