Uploaded File
add photo
Anddy
anddycabrera@gmail.com
407-508-8940
Tampa, FL 33601
Principal Machine Learning Engineer
17 years experience W2
0
Recommendations
Average rating
18
Profile views
Summary

PROFESSIONAL SUMMARY:

  • Around 15 years of experience in Machine Learning, Data mining with large data sets of structured and unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization and discovering meaningful business insights.
  • Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Factor analysis, PCA, Ensembles and good knowledge on Recommendation Systems .
  • Expertise in Scrapy and beautiful soup libraries for designing web crawlers.
  • Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Experience working with Web languages such as Html and CSS .
  • Experience working with Weka and Meka (Multi-label classification).
  • Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
  • Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Experience in visualization tools like, Tableau for creating dashboards.
  • Hands on experience on building Recommendation Engines and Natural Language Processing.
  • Expertise designing the web crawlers for data gathering and application of LDA.
  • Strong experience in using Excel and MS Access to dump the data and analyse based on business needs.
  • Knowledge in Cloud services such as Amazon AWS.
  • Using Agile methodology to develop a project when working on a team.
  • Expert in python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning, Theano, TensorFlow, Keras for Deep leaning and NLTK for NLP.
  • Expert in using Model Pipelines to automate the tasks and put models into production quickly.
  • Expertise in Dimensionality Reduction techniques like PCA, LDA, Singular Value Decomposition technique.
  • Expertise in k-Fold Cross Validation and Grid Search for Model Selection.
  • Have knowledge of relational databases like Oracle SQL and SQLite.
  • Experience in writing SQL, Sub Queries.
  • Good knowledge on the five stages of Design Thinking Methodology.
  • Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive , Pig, MLlib, ELT.
  • Well experienced in Core Java - asynchronous programming, multithreading, collections and a few design patterns.
  • Good Knowledge on Version control systems such as Git, SVN, Github, bitbucket.
  • Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgre SQL, Teradata and Oracle , NoSQL databases such as MongoDB, HBase and Cassandra to handle unstructured data.
  • Practically engaged in Evaluating Models performance using A/B Testing, K-fold cross validation, R-Square, CAP Curve, Confusion Matrix, ROC plot, Gini Coefficient and Grid Search.
  • Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team. Adequate with challenging projects and work in ambiguity to solve complex problems. A self-motivated exuberant learner.

Technical Skills

  • Scikit: learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2, Scrapy, BeautifulSoup, Seaborn, Bokeh, networkx, Stats models, Theano.
  • Programming Languages: Python, R, SQL, Scala, Pig, C, MATLAB, Java, C++, C
  • Querying languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL.
  • Machine Learning: Data Preprocessing, Weighted Least Square, PCR, PLS, Picewise, Spline, Quadratic Discriminant Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means, Ridge and Lasso, Polynomial Regression, Azure, Perceptron, Back Propagation,PCALDA. UML, RDF, SPARQL
  • Big Data : Spark, AWS, EMR, S3, Kinesis, Lambdas, State-machines, Step-functions, SageMaker, IAM, DynamoDB, Hive, Genie, SQS, SNS, EC2, Spark Sql, Kafka, Scoop, Yarn, HDP, Machine Learning, AI, Cassandra, Druid, Redshift, Hadoop, Azure
  • Tableau, Python: Matplotlib, Seaborn, Zoho
  • Databases: SQL Lite, Hive, Oracle, MySql, Postgres, DynamoDB, Cassandra, Druid, Redshift, HBase, MS SqlServer, Aurora, PI Server (OSISoft)
  • Tools: AWS Athena, SageMaker, Jupyter, Splunk, Informatica, Oracle OBIEE, Matlab, Weka, SPSS, Visio, Tibco Spotfire, Sql Developer, Cloudwatch, Airflow, Genie
  • IDE Tools: PyCharm, Spyder, Eclipse, Visual Studio and NetBeans, Amazon Sagemaker.
  • Project Management: JIRA, Share Point
  • SDLC Methodologies: Agile, Scrum, Waterfall
  • Anaconda Enterprise, R: Studio, AWS Lambda, Azure Machine Learning Studio, Oozie 4.2.

Experience
Principal Machine Learning Engineer
Information Technology
Jun 2019 - Jan 2021
Tampa, FL
  • Analyzed Trading mechanism for real-time transactions and build collateral management tools.
  • Compiled data from various sources to perform complex analysis for actionable results.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, naive bayes, Random Forests, K-means, & KNN for data analysis.
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met.
  • Developed Spark code using Python/Scala and Spark-SQL/Streaming for faster processing of data.
  • Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance.
  • Analyzing the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program. Used TensorFlow to train the model from insightful data and look at thousands of examples.
  • Designing, developing and optimizing SQL code (DDL / DML).
  • Building performant, scalable ETL processes to load, cleanse and validate data.
  • Expertise in Data archival and Data migration, ad-hoc reporting and code utilizing SAS on UNIX and Windows Environments.
  • Tested and debugged SAS programs against the test data.
  • Processed the data in SAS for the given requirement using SAS programming concepts.
  • Imported and Exported data files to and from SAS using Proc Import and Proc Export from Excel and various delimited text-based data files such as .TXT (tab delimited) and .CSV (comma delimited) files into SAS datasets for analysis.
  • Expertise in producing RTF, PDF, HTML files using SAS ODS facility.
  • Providing support for data processes. This will involve monitoring data, profiling database usage, trouble shooting, tuning and ensuring data integrity.
  • Participating in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies.
  • Collaborate with team members and stakeholders in design and development of data environment.
  • Learning new tools and skillsets as needs arise.
  • Preparing associated documentation for specifications, requirements and testing.
  • Optimizing the TensorFlow Model for an efficiency.
  • Used TensorFlow for text summarization.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Developed Kafka producer and consumers for message handling.
  • Responsible for analyzing multi-platform applications using python.
  • Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low latency and high throughput.
  • Developed MapReduce jobs in Python for data cleaning and data processing.

Environment: Machine learning, AWS, Cassandra, SAS, Spark, HDFS, Hive, Pig, Linux, Anaconda Python, MySQL, Eclipse, PL/SQL, SQL connector, SparkML.

AWS Data Migration ETL HTML Linux MySQL Python SAS SQL UNIX Windows Yarn Sqoop Spark Core Spark Scala Pig MapReduce Machine Learning Hive HDFS Hadoop Data Integrity Data Cleansing Data Analysis Cassandra Data Profiling Kafka
Remove Skill
Principal Machine Learning Engineer
Information Technology
Apr 2012 - Apr 2019
Tampa, FL
  • Performed Data Profiling to learn about user behaviour and merged data from multiple data sources.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python and R.
  • Developed Clustering algorithms and Support Vector Machines that improved Customer segmentation and Market Expansion.
  • Professional Tableau user (Desktop, Online, and Server).
  • Data Story teller, Mining Data from different Data Source such as SQL Server, Oracle, Cube Database, Web Analytics, Business Object and hadoop.
  • Providing AD hoc analysis and reports to Executive level management team.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • In Unix development environment, for Financial application reports used batch processes and models using Perl and Korn shell scripts with partitions and sub-partitions on oracle database.
  • Developed analytics and strategy to integrate B2B analytics in outbound calling operations.
  • Implemented analytics delivery on cloud-based visualization using shiny tool for Business Object and Google analytics platform.
  • SPOC Data Scientist and predictive analyst to create annual and quarterly Business forecast reports.
  • Main source of Business Regression report.
  • Creating various B2B Predictive and descriptive analytics using R and Tableau.
  • Creating and automating ad hoc reports.
  • Responsible for planning & scheduling new product releases and promotional offers.
  • Worked on NOSQL databases like Cassandra.
  • Experienced in Agile methodologies and SCRUM process.
  • Parsing data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Worked on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds and retrieving data from Twitter and other social networking platforms. Extensive experience and proficiency in using SAS ODS to create output files in a variety of formats including RTF, HTML and PDF.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in R and Python.

Environment: R, Python, UNIX Scripting, SAS, Cassandra, Java, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Oracle, Eclipse.

HTML Java JSON Korn Shell MongoDB Oracle Perl Python SAS Scripting Shell Scripts SQL SQL Server TOAD UNIX Tableau Sqoop Pig MS Power BI MapReduce Machine Learning Hive HDFS Data Science Data Profiling Data Mining Data Cleansing CUBE Cassandra Hadoop
Remove Skill
Sr. Machine Learning Engineer
Information Technology
Jun 2010 - Jun 2011
Peoria, IL
  • Gathering, retrieving and organising data and using it to reach meaningful conclusions.
  • Developed a system for collecting data and generating their findings into reports that improved the company.
  • Setting up the analytics system to provide insights.
  • Initially the data was stored in MongoDB. Later the data was moved to Elasticsearch.
  • Used Kibana to visualize the data collected from Twitter using Twitter REST APIs.
  • Developed a multi class, multi label 2-stage classification model to identify depression- related tweets and classify depression- indicative symptoms. Utilized the created model to calculate the severity of depression in a patient using Python, Scikit learn, Weka and Meka.
  • Conceptualized and created a knowledge graph database of news events extracted from tweets using Java, Virtuoso, Stanford CoreNLP, Apache Jena, RDF.
  • Producing and maintaining internal and client-based reports.
  • Creating stories with data that a non-technical team could also understand.
  • Worked on Descriptive, Diagnostic, Predictive and Prescriptive analytics.
  • Implementation of Character Recognition using Support vector machine for performance optimization.
  • Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
  • Managed database design and implemented a comprehensive Star-Schema with shared dimensions.
  • Implemented Normalization Techniques and build the tables as per the requirements given by the business users.
  • Developed and maintained stored procedures, implemented changes to database design including tables and views and Documented Source to Target mappings as per the business rules.
  • Analysing end user requirements, communicating and modelling them to the development team.
  • Took responsibility to bridge between technologists and business stakeholders to drive innovation from conception to production.
  • Machine learning automatically scores user assignment based on few manually scored assignments.
  • Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
  • Researching and developing Predictive Analytic solutions and creating solutions for business needs.
  • Worked on data processing on very large datasets that handle missing values, creating dummy variables and various noises in data.
  • Mining large data sets using sophisticated analytical techniques to generate insights and inform business decisions.
  • Building and testing hypothesis, ensuring statistical significance and building statistical models for business application.
  • Developed Machine Learning algorithms with Spark Mlib standalone and Python.
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
  • Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.
  • Implemented various machine learning models such as regression, classification, Tree based and Ensemble models.
  • Performed model Tuning by adjusting the Hyper parameters and raised the model accuracy.
  • Validated different models developed applying appropriate measures such as k-Fold cross validation, AUC, ROC to identify the best performing model.
  • Created Machine Learning and statistical methods, (SVM, CRF, HMM, sequential tagging) or willingness to intensely learn.
  • Building data platforms for analytics, advanced analytics in R.
  • Managing Tickets using basic SQL queries.
  • Segmented the customers based on demographics using K-means Clustering.
  • Implementing various machine learning algorithms in spark using MLLib.
  • Performed Segmentation on customer's data to identify target groups for new loans using Clustering techniques such as K-Means and further processed using Support Vector Regression.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Accomplished multiple tasks from collecting data to organizing and interpreting statistical information.
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.
Box Database Design Elasticsearch Java MongoDB Python REST SQL Stored Procedure Weka Tableau Spark Sequent Prototyping Predictive Analytics MS Power BI Machine Learning Kibana Data Mining Clustering
Remove Skill
Machine Learning Engineer
Information Technology
Feb 2008 - Apr 2010
Tampa, FL
  • Streamlining information by integrating data from multiple data sets into one database system.
  • Creating database triggers and designing tables.
  • Creating statistics out of the data by analyzing and generating reports.
  • Cleaning database by removing data files and unnecessary information.
  • Running SQL queries to serve solutions to customer generated tickets.
  • Performing specific data queries and writing scripts.
  • Collecting data from multiple sources and adding it to the database.
  • Research and reconcile data discrepancies occurring among various information systems and reports.
  • Identifying new sources of data and methods to improve data collection, analysis and reporting.
  • Testing prototype software and participating in approval for a new software.
  • Identifying areas with data inaccuracies and also the trends in growing data inaccuracies.
  • Contributing to the methods using large data sets and complex processes.
  • Finding trends and patterns to make recommendations to the clients.
  • Noting down the patterns weekly, monthly and quarterly.
  • Collaborating with marketers, salespeople, data architects and database developers.
  • Working with web developers to collect data and streamlining the data reuse.
  • Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files with high volume of data.
  • Utilized Python to cluster credit card holders and implemented predictive analysis
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
  • Developed Informatica Mappings using various transformations and PL/SQL Packages to extract, transformation and loading of data.
  • Wrote Python program to parse and upload csv files into PostgreSQL Database. HTTP Request Library was used for Web API call.
  • Wrote SQL for data profiling and developed data quality reports.
Oracle PostgreSQL Python SQL Triggers Web Developer Teradata Informatica Data Warehousing Data Profiling Data Cleansing Data Architecture
Remove Skill
Machine Learning Intern
Information Technology
Mar 2004 - Mar 2008
Arlington, VA
  • Successfully Completed Junior Data Analyst Internship in Confidential .
  • Built an Expense Tracker and Zonal Desk.
  • Identifying inconsistencies, correcting them or escalating the problems to next level.
  • Assisted in development of interface testing and implementation plans.
  • Analysing data for data quality and validation issues.
  • Analysing the websites regularly to ensure site traffic and conversion funnels are performing well.
  • Collaborating with Sales and marketing teams to optimize processes that communicate insights effectively.
  • Creating and maintaining automated reports using SQL.
  • Conducted safety check to make sure that my team is feeling safe for the retrospectives
  • Aided in data profiling by examining the source data
  • Extracting features from the given data set and use them to train and evaluate different classifiers that are available in the WEKA tool. Using these features, we differentiate spam messages from legitimate messages.
  • Created numerous SQL queries to modify data based on data requirements and added enhancements to existing procedures.
  • Implemented statistical modelling techniques in Python.
  • Conducted safety check to make sure that my team is feeling safe for the retrospectives
  • Aided in data profiling by examining the source data
  • Performed data mappings to map the source data to the destination data
  • Developed Use Case Diagrams to identify the users involved. Created Activity diagrams and Sequence diagrams to depict the process flows
Weka Statistical Modeling SQL Python Machine Learning Data Profiling Data Mapping Data Analysis
Remove Skill
Edit Skills
Non-cloudteam Skill
Education
Master's in Data Science
University Of California - San Diego, 2017 - 2018
Certifications
Machine Learning Engineer
Harvard University
Inference And Modeling
Harvard University
Data Science: R Programming
Harvard University
Probability
Harvard University
AWS Sagemaker Machine Learning Application Development
AWS
AWS Developer: Building On AWS
AWS
Amazon DynamoDB: Building NoSQL Database-Driven Application
AWS
Analyzing Data With Python
IBM Data Science
Machine Learning
UCSanDiego
Python For Data Science
UCSanDiego
Big Data Analytics Using Spark
USSanDiego
Probability And Statistics in Data Science Python
UCSanDiego
Skills
Python
2021
15
SQL
2021
15
Data Profiling
2021
14
Machine Learning
2021
13
Data Cleansing
2021
10
Oracle
2019
9
Cassandra
2021
8
Data Mining
2019
8
Hadoop
2021
8
HDFS
2021
8
Hive
2021
8
HTML
2021
8
Java
2019
8
MapReduce
2021
8
MongoDB
2019
8
MS Power BI
2019
8
Pig
2021
8
SAS
2021
8
Sqoop
2021
8
Tableau
2019
8
UNIX
2021
8
CUBE
2019
7
Data Science
2019
7
JSON
2019
7
Korn Shell
2019
7
Perl
2019
7
Scripting
2019
7
Shell Scripts
2019
7
SQL Server
2019
7
TOAD
2019
7
Data Analysis
2021
5
Weka
2011
5
Data Mapping
2008
4
Statistical Modeling
2008
4
Data Architecture
2010
2
Data Warehousing
2010
2
Informatica
2010
2
PostgreSQL
2010
2
Spark
2021
2
Teradata
2010
2
Triggers
2010
2
Web Developer
2010
2
AWS
2021
1
Box
2011
1
Clustering
2011
1
Data Integrity
2021
1
Data Migration
2021
1
Database Design
2011
1
Elasticsearch
2011
1
ETL
2021
1
Kafka
2021
1
Kibana
2011
1
Linux
2021
1
MySQL
2021
1
Predictive Analytics
2011
1
Prototyping
2011
1
REST
2011
1
Scala
2021
1
Sequent
2011
1
Spark Core
2021
1
Stored Procedure
2011
1
Windows
2021
1
Yarn
2021
1
AWS EC2
0
1
AWS Lambda
0
1
AWS S3
0
1
azure machine learning
0
1
C
0
1
C++
0
1
Cloudwatch
0
1
CSS
0
1
JIRA
0
1
MS Azure
0
1
MS SharePoint
0
1
OBIEE
0
1
Splunk
0
1
SQL Developer
0
1
SVN
0
1