cloudteam profile

Anddy

anddycabrera@gmail.com

407-508-8940

Tampa, FL 33601

Principal Machine Learning Engineer

20 years experience W2

Recommendations

Average rating

204

Profile views

Summary

PROFESSIONAL SUMMARY:

Around 15 years of experience in Machine Learning, Data mining with large data sets of structured and unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization and discovering meaningful business insights.
Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Factor analysis, PCA, Ensembles and good knowledge on Recommendation Systems .
Expertise in Scrapy and beautiful soup libraries for designing web crawlers.
Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
Experience working with Web languages such as Html and CSS .
Experience working with Weka and Meka (Multi-label classification).
Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
Experience in visualization tools like, Tableau for creating dashboards.
Hands on experience on building Recommendation Engines and Natural Language Processing.
Expertise designing the web crawlers for data gathering and application of LDA.
Strong experience in using Excel and MS Access to dump the data and analyse based on business needs.
Knowledge in Cloud services such as Amazon AWS.
Using Agile methodology to develop a project when working on a team.
Expert in python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning, Theano, TensorFlow, Keras for Deep leaning and NLTK for NLP.
Expert in using Model Pipelines to automate the tasks and put models into production quickly.
Expertise in Dimensionality Reduction techniques like PCA, LDA, Singular Value Decomposition technique.
Expertise in k-Fold Cross Validation and Grid Search for Model Selection.
Have knowledge of relational databases like Oracle SQL and SQLite.
Experience in writing SQL, Sub Queries.
Good knowledge on the five stages of Design Thinking Methodology.
Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive , Pig, MLlib, ELT.
Well experienced in Core Java - asynchronous programming, multithreading, collections and a few design patterns.
Good Knowledge on Version control systems such as Git, SVN, Github, bitbucket.
Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgre SQL, Teradata and Oracle , NoSQL databases such as MongoDB, HBase and Cassandra to handle unstructured data.
Practically engaged in Evaluating Models performance using A/B Testing, K-fold cross validation, R-Square, CAP Curve, Confusion Matrix, ROC plot, Gini Coefficient and Grid Search.
Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team. Adequate with challenging projects and work in ambiguity to solve complex problems. A self-motivated exuberant learner.

Technical Skills

Scikit: learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2, Scrapy, BeautifulSoup, Seaborn, Bokeh, networkx, Stats models, Theano.
Programming Languages: Python, R, SQL, Scala, Pig, C, MATLAB, Java, C++, C
Querying languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL.
Machine Learning: Data Preprocessing, Weighted Least Square, PCR, PLS, Picewise, Spline, Quadratic Discriminant Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means, Ridge and Lasso, Polynomial Regression, Azure, Perceptron, Back Propagation,PCALDA. UML, RDF, SPARQL
Big Data : Spark, AWS, EMR, S3, Kinesis, Lambdas, State-machines, Step-functions, SageMaker, IAM, DynamoDB, Hive, Genie, SQS, SNS, EC2, Spark Sql, Kafka, Scoop, Yarn, HDP, Machine Learning, AI, Cassandra, Druid, Redshift, Hadoop, Azure
Tableau, Python: Matplotlib, Seaborn, Zoho
Databases: SQL Lite, Hive, Oracle, MySql, Postgres, DynamoDB, Cassandra, Druid, Redshift, HBase, MS SqlServer, Aurora, PI Server (OSISoft)
Tools: AWS Athena, SageMaker, Jupyter, Splunk, Informatica, Oracle OBIEE, Matlab, Weka, SPSS, Visio, Tibco Spotfire, Sql Developer, Cloudwatch, Airflow, Genie
IDE Tools: PyCharm, Spyder, Eclipse, Visual Studio and NetBeans, Amazon Sagemaker.
Project Management: JIRA, Share Point
SDLC Methodologies: Agile, Scrum, Waterfall
Anaconda Enterprise, R: Studio, AWS Lambda, Azure Machine Learning Studio, Oozie 4.2.

Experience

Education

Master's in Data Science

University Of California - San Diego, 2017 - 2018

Certifications

Machine Learning Engineer

Harvard University

Inference And Modeling

Harvard University

Data Science: R Programming

Harvard University

Probability

Harvard University

AWS Sagemaker Machine Learning Application Development

AWS

AWS Developer: Building On AWS

AWS

Amazon DynamoDB: Building NoSQL Database-Driven Application

AWS

Analyzing Data With Python

IBM Data Science

Machine Learning

UCSanDiego

Python For Data Science

UCSanDiego

Big Data Analytics Using Spark

USSanDiego

Probability And Statistics in Data Science Python

UCSanDiego

Skills

Python

2021

SQL

2021

Data Profiling

2021

Machine Learning

2021

Data Cleansing

2021

Agile Methodology

2021

Eclipse

2021

Oracle

2019

Scrum

2021

Cassandra

2021

Data Mining

2019

Hadoop

2021

HDFS

2021

Hive

2021

HTML

2021

Java

2019

MapReduce

2021

MongoDB

2019

MS Power BI

2019

Pig

2021

SAS

2021

Sqoop

2021

Tableau

2019

UNIX

2021

CUBE

2019

Data Science

2019

Gap Analysis

2019

JSON

2019

Korn Shell

2019

Perl

2019

Scripting

2019

Shell Scripts

2019

SQL Server

2019

TOAD

2019

Data Analysis

2021

Weka

2011

Data Mapping

2008

PL/SQL

2021

Statistical Modeling

2008

Data Architecture

2010

Data Warehousing

2010

Informatica

2010

PostgreSQL

2010

2021

Spark

2021

Teradata

2010

Triggers

2010

Web Developer

2010

Apache

2011

AWS

2021

Box

2011

Clustering

2011

Data Integrity

2021

Data Migration

2021

Database Design

2011

Elasticsearch

2011

ETL

2021

Kafka

2021

Kibana

2011

Linux

2021

MySQL

2021

Predictive Analytics

2011

Prototyping

2011

REST

2011

Scala

2021

Sequent

2011

Spark Core

2021

Stored Procedure

2011

Windows

2021

Yarn

2021

AWS EC2

AWS EMR

AWS Lambda

AWS S3

azure machine learning

C++

Cloudwatch

CSS

Design Patterns

Git

JIRA

MS Azure

MS SharePoint

Netbeans

OBIEE

Pipeline

Splunk

SQL Developer

SVN

UML

Visual Studio