Uploaded File
Brady
blmcmcn@gmail.com
775-283-8489
Lake Forest, CA 92630
Physicist and Data Scientist
11 years experience W2
0
Recommendations
Average rating
27
Profile views
Summary

Data Scientist and Physicist with expertise in using advanced mathematical techniques to create actionable solutions to business and scientific problems. Extensive experience using object oriented programming to perform statistical modeling, data mining, machine learning, artificial neural networks and various optimization algorithms. A highly responsible data scientist who is vigilant in sustaining effective communication, writing efficient and well documented code, creating intuitive visualizations, and providing practical results.

Summary of Technical Skills

  • Programming Languages and Software: Python, R, MATLAB, C#, Latex
  • Python Libraries: NumPy, Pandas, SciPy, Matplotlib, scikit-learn, Keras, PyTorch, TensorFlow, PyBrain, Caffe, NLTK, Statsmodels, Seaborn, Selenium
  • Data Systems: SQL, NoSQL, AWS (RDS, RedShift, Kinesis, EC2, EMR, S3), MS Azure
  • Development Tools: GitHub, Git, Jupyter notebook, Trello, SVN
  • IDEs: Spyder, Jupyter, PyCharm, Rstudio, Eclipse
  • Statistical Methods: Bayesian Statistics, Hypothesis Testing, Factor Analysis, Stochastic Modeling, Factorial Design, ANOVA
  • Machine Learning Frameworks: TensorFlow, PyTorch, Torch, Keras, Caffe
  • Supervised Learning: Naive Bayes, Time Series Analysis, Survival Analysis, Linear Regression, Logistic Regression, Elastic Net Regression, Multivariate Regression, Support Vector Machines (SVM), k-Nearest Neighbors (KNN), Decision Trees, Random Forests. Natural Language Processing (NLP)
  • Unsupervised Learning: K-means Clustering, Hierarchical Clustering, Centroid Clustering, Principle Component Analysis, Gaussian Mixture Models, Singular Value Decomposition (SVD)
  • Deep Learning: Artificial Neural Networks, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Transfer Learning, LSTM Networks, Segmentation, Auto encoding/decoding
  • Optimization Techniques: Linear Programming, Dynamic Programming, Convex Optimization, Non-Convex Optimization, Monte Carlo Methods, Network Flows

Tinder Swiper Predictor

  • Predict whether or not a user will swipe right given their swipe history
  • Tried a Content-Based recommender system to tailor a model to each individual’s preferences
  • Convolution neural network used to extract features from images and represent images in a an embedded space
  • Feed embedded space features and swipe history data into XGBoost algorithm to make prediction model – this model had a 20% accuracy improvement over guessing based on average swipe rate
  • Also tried a Collaborative based filtering approach which can leverage swipe information from the whole user base
  • Generated synthetic data comprising of synthetic users having random weighted preferences based on high level features and a random threshold for how attractive they must find a profile to swipe right
  • Using synthetic data, Singular Value Decomposition algorithm was used to predict organic user swipes (who had average swipe rate of 45%) with an accuracy of 81%
  • When predicting synthetic users, the acuracy of model grew to 89% with 1000 users and 200 swipes per user

Forecasting President Trump’s Tweeting Behavior

  • Created ensemble of ARIMAX, Facebook Prophet and Hidden Markov Models to predict the number of tweets Trump will write out in one week
  • Used Natural Language Processing (Named Entity Recognition and Sentiment Analysis) on Presidential tweet data to determine relationship between negative/positive sentiment and the quantity of tweets the following day
  • Web Scraped Trump references and data on various news websites to determine correlation between the number of Trump references in diferent news categories and number of tweets by Trump
  • Used Python Twitter and Reddit API to gather news data on economy and foreign affairs
  • So far have won $800 on betting markets (PredictIt.org) using this predition model (and modest wagers)

Housing Price Prediction in Pima County

  • Pima County Assesor website has useful housing datasets such Affidavit of Sale, Notice of Value, Housing Details and geographical information
  • Only considered sales that represent fair market exchange and so certain had to be removed and some house properties missing were filled in using K-nearest neighbors methods
  • Modeled the price of homes with various features using random forest (R2 =0.893), feed forward neural network L1 regularization (R2 =0.887), K-nearest neighbors (R2 =0.859), and Linear Regression with Partial Gaussian Radial Basis and normalization (R2 =0.630)
  • The median absolute error for best model was 8.7% (Zillow ranges from 6%-8%)
  • Ranked the feature importance in determining house price (year, location, square feet etc.)

Honors

First student of the University of Texas at San Antonio to win the Consortium Research Fellowship awarded by Air Force Research Laboratory.

Selected Publications

  1. Brady McMicken, Robert J. Thomas, and Lorenzo Brancaleon. Photoinduced partial unfolding of tubulin bound to meso-tetrakis(sulfonatophenyl) porphyrin leads to inhibition of microtubule formation in vitro. J. Biophotonics. 7 (11-12), 874-888. DOI: 10.1002/jbio.201300066
  2. Brady McMicken, James Parker, Robert Thomas, and Lorenzo Brancaleon. Resonance Raman and vibrational mode analysis used to predict ligand geometry for docking simulations of a water-soluble porphyrin and tubulin, Journal of Biomolecular Structure and Dynamics, 34 (9), 1998-2010. DOI: 10.1080/07391102.2015.1102082
  3. Brady McMicken, Robert J. Thomas, and Lorenzo Brancaleon. Partial Unfolding of Tubulin Heterodimers Induced by Two-Photon Excitation of Bound meso-tetrakis (sulfonatophenyl) porphyrin. The Journal of Physical Chemistry B. 120 (15), 3653-3665. DOI: 10.1021/acs.jpcb.6b02055

Experience
Data Scientist
Chemicals/Pharmaceuticals
Aug 2018 - Jan 2020
Lake Forest, CA
  • Used SQL to create development database from production database and anonymized data for HIPPA compliance.
  • Gradient boosted trees and random forest models built to predict appropriate power of lens to be implanted during cataract surgery
  • Utilized Tensorflow to design neural network models and Amazon Web Services (AWS) EC2 for model training. Neural networks were used as a benchmark.
  • Used various clustering algorithms (K-Means, Gaussian Mixture Models, DBSCAN, and Heirarchical Agglomerative ) to discover reflective surgery patient groups, and applied multivariate fit models to each patient group type.
  • Developed poynomial regression model of the prediction error made by theoretical formulas. Model developed by utilizing a combination of gradient descent and Random Sample Consensus (RANSAC) which is used to ignore outliers and is analogous to a cross validation process
  • Iterative Grid Search and Newton-Raphson method used to minimize the average absolute error of predictions for optimizing models to specific lens types
  • Provided data visualizations for multiple metrics of project success.
  • Documented solutions and gave presentations to stakeholders explaining project progress, goals, and blocks.
Compliance Data Visualization SQL Data Science HIPAA Database Design
Remove Skill
Data Scientist
Utilities/Energy
Mar 2017 - Jul 2018
Houston, TX

  • Worked with oil drilling subject matter experts to investigate the drilling process and the most likely points of failure, performing hypothesis testing on survival curves with bootstrapping and Monte Carlo permutation test for confidence intervals.
  • Applied Extreme Learning Machines (ELM) and gradient boosted decision trees to predict the optimal drilling rate, with the Mean Absolute Error (MAE) ~0.06
  • Used ARIMA time series modeling including variants such seasonal (SARIMA) and explanatory (ARIMAX) models to predict future values of machine parameters
  • Cox proportional hazard model was used as a regression algorithm to give “time to event of failure” and is capable of handling censored observations
  • Used various algorithms including autoregressive, partial auto-correlation, and survival forests and supervized hidden Markov models to forcast machine parameters
  • Utilized AWS-EMR and EC2 for on demand computation and data analysis
  • Worked with Amazon RDS and Redshift relational databases to query data for statistical analysis
Data Analysis Statistical Analysis
Remove Skill
Air Force Research Lab JBSA-Fort Sam
Information Technology
Feb 2016 - Feb 2017
Houston, TX

  • Performed moecular dynamics simulations with NAMD to calculate vibrational absorption spectrum of microtubules and find their resonance frequency
  • Studied the effects of terahertz irradiation (which contains the resonance frequency) on microtubule polymerization and found them to be less stable
  • Built Hidden Markov Models for gene sequence analysis to find correlations/relationships between various types of Tubulin
  • Utilized clustering algorithms to find patterns in the tubulin gene expression for different states of cell cycle
  • X-ray diffraction image processing algorithm development for crystallized protein structure reconstruction (specialized for tubulin).
No skills were added
Remove Skill
Air Force Research Lab JBSA-Fort Sam
Education
Oct 2010 - Jan 2016
Houston, TX

  • Designed the optical path, built and operated a resonance Raman system to study the vibrational modes of porphyrin when bound and unbound to tubulin
  • Collected the scanning Raman spectra data and removed autofluorescence background signal using Leiber algorithm, then fit a series of Voigt functions to define the characteristic Raman peaks to study the binding triggerd vibrational state changes (Python automation scripts).
  • Performed density functional theory (DFT) calculations on high performance computer (HPC) cluster to predict the minimum energy structure of small molecules and their vibrational modes
  • Used Singular Value Decomposition to correlate Raman Spectra to theoretical vibrational modes.
  • Kramers-Kronig transformation used to predict the conformation of small molecule bound to protein by analyzing its change in Raman spectra upon binding
  • Performed docking simulations with genetic search algorithm to predict most likely binding site between distorted ligand and protein
  • Utilized femtosecond, 1 GigaWatt pulsed lasers to achieve two-photon absorption by porphyrins to induce protein unfolding

No skills were added
Remove Skill
Edit Skills
Non-cloudteam Skill
Education
Doctoral in Physics, Department of Physics & Astronomy
The University of Texas at San Antonio 2015
Bachelor's in Physics, Department of Physics & Astronomy
The University of Texas at El Paso 2009
Bachelor's in Applied and Computational Mathematics
The University of Texas at El Paso 2009
Skills
Compliance
2020
2
Data Science
2020
2
Data Visualization
2020
2
Database Design
2020
2
HIPAA
2020
2
SQL
2020
2
Data Analysis
2018
1
Statistical Analysis
2018
1
Big Data
2020
1
AWS
0
1
Data Mining
0
1
Machine Learning
0
1
MS Azure
0
1
Publications
Brancaleon
, 2020
Brancaleon
, 2020
Brancaleon
, 2020
Awards
First student of the Fellowship awarded by Air Force Research La, 0