add photo
Anddy
anddycabrera@gmail.com
407-508-8940
Tampa, FL 33601
Principal Machine Learning Engineer
20 years experience W2
0
Recommendations
Average rating
204
Profile views
Summary

PROFESSIONAL SUMMARY:

  • Around 15 years of experience in Machine Learning, Data mining with large data sets of structured and unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization and discovering meaningful business insights.
  • Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Factor analysis, PCA, Ensembles and good knowledge on Recommendation Systems .
  • Expertise in Scrapy and beautiful soup libraries for designing web crawlers.
  • Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Experience working with Web languages such as Html and CSS .
  • Experience working with Weka and Meka (Multi-label classification).
  • Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
  • Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Experience in visualization tools like, Tableau for creating dashboards.
  • Hands on experience on building Recommendation Engines and Natural Language Processing.
  • Expertise designing the web crawlers for data gathering and application of LDA.
  • Strong experience in using Excel and MS Access to dump the data and analyse based on business needs.
  • Knowledge in Cloud services such as Amazon AWS.
  • Using Agile methodology to develop a project when working on a team.
  • Expert in python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning, Theano, TensorFlow, Keras for Deep leaning and NLTK for NLP.
  • Expert in using Model Pipelines to automate the tasks and put models into production quickly.
  • Expertise in Dimensionality Reduction techniques like PCA, LDA, Singular Value Decomposition technique.
  • Expertise in k-Fold Cross Validation and Grid Search for Model Selection.
  • Have knowledge of relational databases like Oracle SQL and SQLite.
  • Experience in writing SQL, Sub Queries.
  • Good knowledge on the five stages of Design Thinking Methodology.
  • Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive , Pig, MLlib, ELT.
  • Well experienced in Core Java - asynchronous programming, multithreading, collections and a few design patterns.
  • Good Knowledge on Version control systems such as Git, SVN, Github, bitbucket.
  • Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgre SQL, Teradata and Oracle , NoSQL databases such as MongoDB, HBase and Cassandra to handle unstructured data.
  • Practically engaged in Evaluating Models performance using A/B Testing, K-fold cross validation, R-Square, CAP Curve, Confusion Matrix, ROC plot, Gini Coefficient and Grid Search.
  • Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team. Adequate with challenging projects and work in ambiguity to solve complex problems. A self-motivated exuberant learner.

Technical Skills

  • Scikit: learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2, Scrapy, BeautifulSoup, Seaborn, Bokeh, networkx, Stats models, Theano.
  • Programming Languages: Python, R, SQL, Scala, Pig, C, MATLAB, Java, C++, C
  • Querying languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL.
  • Machine Learning: Data Preprocessing, Weighted Least Square, PCR, PLS, Picewise, Spline, Quadratic Discriminant Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means, Ridge and Lasso, Polynomial Regression, Azure, Perceptron, Back Propagation,PCALDA. UML, RDF, SPARQL
  • Big Data : Spark, AWS, EMR, S3, Kinesis, Lambdas, State-machines, Step-functions, SageMaker, IAM, DynamoDB, Hive, Genie, SQS, SNS, EC2, Spark Sql, Kafka, Scoop, Yarn, HDP, Machine Learning, AI, Cassandra, Druid, Redshift, Hadoop, Azure
  • Tableau, Python: Matplotlib, Seaborn, Zoho
  • Databases: SQL Lite, Hive, Oracle, MySql, Postgres, DynamoDB, Cassandra, Druid, Redshift, HBase, MS SqlServer, Aurora, PI Server (OSISoft)
  • Tools: AWS Athena, SageMaker, Jupyter, Splunk, Informatica, Oracle OBIEE, Matlab, Weka, SPSS, Visio, Tibco Spotfire, Sql Developer, Cloudwatch, Airflow, Genie
  • IDE Tools: PyCharm, Spyder, Eclipse, Visual Studio and NetBeans, Amazon Sagemaker.
  • Project Management: JIRA, Share Point
  • SDLC Methodologies: Agile, Scrum, Waterfall
  • Anaconda Enterprise, R: Studio, AWS Lambda, Azure Machine Learning Studio, Oozie 4.2.

Experience
Education
Master's in Data Science
University Of California - San Diego, 2017 - 2018
Certifications
Machine Learning Engineer
Harvard University
Inference And Modeling
Harvard University
Data Science: R Programming
Harvard University
Probability
Harvard University
AWS Sagemaker Machine Learning Application Development
AWS
AWS Developer: Building On AWS
AWS
Amazon DynamoDB: Building NoSQL Database-Driven Application
AWS
Analyzing Data With Python
IBM Data Science
Machine Learning
UCSanDiego
Python For Data Science
UCSanDiego
Big Data Analytics Using Spark
USSanDiego
Probability And Statistics in Data Science Python
UCSanDiego
Skills
Python
2021
15
SQL
2021
15
Data Profiling
2021
14
Machine Learning
2021
13
Data Cleansing
2021
10
Agile Methodology
2021
9
Eclipse
2021
9
Oracle
2019
9
Scrum
2021
9
Cassandra
2021
8
Data Mining
2019
8
Hadoop
2021
8
HDFS
2021
8
Hive
2021
8
HTML
2021
8
Java
2019
8
MapReduce
2021
8
MongoDB
2019
8
MS Power BI
2019
8
Pig
2021
8
SAS
2021
8
Sqoop
2021
8
Tableau
2019
8
UNIX
2021
8
CUBE
2019
7
Data Science
2019
7
Gap Analysis
2019
7
JSON
2019
7
Korn Shell
2019
7
Perl
2019
7
Scripting
2019
7
Shell Scripts
2019
7
SQL Server
2019
7
TOAD
2019
7
Data Analysis
2021
5
Weka
2011
5
Data Mapping
2008
4
PL/SQL
2021
4
Statistical Modeling
2008
4
Data Architecture
2010
2
Data Warehousing
2010
2
Informatica
2010
2
PostgreSQL
2010
2
QA
2021
2
Spark
2021
2
Teradata
2010
2
Triggers
2010
2
Web Developer
2010
2
Apache
2011
1
AWS
2021
1
Box
2011
1
Clustering
2011
1
Data Integrity
2021
1
Data Migration
2021
1
Database Design
2011
1
Elasticsearch
2011
1
ETL
2021
1
Kafka
2021
1
Kibana
2011
1
Linux
2021
1
MySQL
2021
1
Predictive Analytics
2011
1
Prototyping
2011
1
REST
2011
1
Scala
2021
1
Sequent
2011
1
Spark Core
2021
1
Stored Procedure
2011
1
Windows
2021
1
Yarn
2021
1
AWS EC2
0
1
AWS EMR
0
1
AWS Lambda
0
1
AWS S3
0
1
azure machine learning
0
1
C
0
1
C++
0
1
Cloudwatch
0
1
CSS
0
1
Design Patterns
0
1
Git
0
1
JIRA
0
1
MS Azure
0
1
MS SharePoint
0
1
Netbeans
0
1
OBIEE
0
1
Pipeline
0
1
Splunk
0
1
SQL Developer
0
1
SVN
0
1
UML
0
1
Visual Studio
0
1