Data Scientist and Physicist with expertise in using advanced mathematical techniques to create actionable solutions to business and scientific problems. Extensive experience using object oriented programming to perform statistical modeling, data mining, machine learning, artificial neural networks and various optimization algorithms. A highly responsible data scientist who is vigilant in sustaining effective communication, writing efficient and well documented code, creating intuitive visualizations, and providing practical results.

**Summary of Technical Skills**

- Programming Languages and Software: Python, R, MATLAB, C#, Latex
- Python Libraries: NumPy, Pandas, SciPy, Matplotlib, scikit-learn, Keras, PyTorch, TensorFlow, PyBrain, Caffe, NLTK, Statsmodels, Seaborn, Selenium
- Data Systems: SQL, NoSQL, AWS (RDS, RedShift, Kinesis, EC2, EMR, S3), MS Azure
- Development Tools: GitHub, Git, Jupyter notebook, Trello, SVN
- IDEs: Spyder, Jupyter, PyCharm, Rstudio, Eclipse
- Statistical Methods:
- Machine Learning Frameworks: TensorFlow, PyTorch, Torch, Keras, Caffe
- Supervised Learning:
- Unsupervised Learning:
- Deep Learning:
- Optimization Techniques:

**Tinder Swiper Predictor **

- Predict whether or not a user will swipe right given their swipe history
- Tried a Content-Based recommender system to tailor a model to each individual’s preferences
- Convolution neural network used to extract features from images and represent images in a an embedded space
- Feed embedded space features and swipe history data into XGBoost algorithm to make prediction model – this model had a 20% accuracy improvement over guessing based on average swipe rate
- Also tried a Collaborative based filtering approach which can leverage swipe information from the whole user base
- Generated synthetic data comprising of synthetic users having random weighted preferences based on high level features and a random threshold for how attractive they must find a profile to swipe right
- Using synthetic data, Singular Value Decomposition algorithm was used to predict organic user swipes (who had average swipe rate of 45%) with an accuracy of 81%
- When predicting synthetic users, the acuracy of model grew to 89% with 1000 users and 200 swipes per user

**Forecasting President Trump’s Tweeting Behavior**

- Created ensemble of ARIMAX, Facebook Prophet and Hidden Markov Models to predict the number of tweets Trump will write out in one week
- Used Natural Language Processing (Named Entity Recognition and Sentiment Analysis) on Presidential tweet data to determine relationship between negative/positive sentiment and the quantity of tweets the following day
- Web Scraped Trump references and data on various news websites to determine correlation between the number of Trump references in diferent news categories and number of tweets by Trump
- Used Python Twitter and Reddit API to gather news data on economy and foreign affairs
- So far have won $800 on betting markets (PredictIt.org) using this predition model (and modest wagers)

**Housing Price Prediction in Pima County**

- Pima County Assesor website has useful housing datasets such Affidavit of Sale, Notice of Value, Housing Details and geographical information
- Only considered sales that represent fair market exchange and so certain had to be removed and some house properties missing were filled in using K-nearest neighbors methods
- Modeled the price of homes with various features using random forest (R2 =0.893), feed forward neural network L1 regularization (R2 =0.887), K-nearest neighbors (R2 =0.859), and Linear Regression with Partial Gaussian Radial Basis and normalization (R2 =0.630)
- The median absolute error for best model was 8.7% (Zillow ranges from 6%-8%)
- Ranked the feature importance in determining house price (year, location, square feet etc.)

**Honors**

First student of the University of Texas at San Antonio to win the Consortium Research Fellowship awarded by Air Force Research Laboratory.

**Selected Publications**

**Brady McMicken**, Robert J. Thomas, and Lorenzo Brancaleon.*Photoinduced partial unfolding of tubulin bound to meso-tetrakis(sulfonatophenyl) porphyrin leads to inhibition of microtubule formation in vitro*.*J. Biophotonics.*7 (11-12), 874-888. DOI: 10.1002/jbio.201300066**Brady McMicken**, James Parker, Robert Thomas, and Lorenzo Brancaleon.*Resonance Raman and vibrational mode analysis used to predict ligand geometry for docking simulations of a water-soluble porphyrin and tubulin*, Journal of Biomolecular Structure and Dynamics, 34 (9), 1998-2010. DOI: 10.1080/07391102.2015.1102082**Brady McMicken**, Robert J. Thomas, and Lorenzo Brancaleon.*Partial Unfolding of Tubulin Heterodimers Induced by Two-Photon Excitation of Bound meso-tetrakis (sulfonatophenyl) porphyrin*.*The Journal of Physical Chemistry B.**120*(15), 3653-3665. DOI: 10.1021/acs.jpcb.6b02055

**Data Scientist**

- Used SQL to create development database from production database and anonymized data for HIPPA compliance.
- Gradient boosted trees and random forest models built to predict appropriate power of lens to be implanted during cataract surgery
- Utilized Tensorflow to design neural network models and Amazon Web Services (AWS) EC2 for model training. Neural networks were used as a benchmark.
- Used various clustering algorithms (K-Means, Gaussian Mixture Models, DBSCAN, and Heirarchical Agglomerative ) to discover reflective surgery patient groups, and applied multivariate fit models to each patient group type.
- Developed poynomial regression model of the prediction error made by theoretical formulas. Model developed by utilizing a combination of gradient descent and Random Sample Consensus (RANSAC) which is used to ignore outliers and is analogous to a cross validation process
- Iterative Grid Search and Newton-Raphson method used to minimize the average absolute error of predictions for optimizing models to specific lens types
- Provided data visualizations for multiple metrics of project success.
- Documented solutions and gave presentations to stakeholders explaining project progress, goals, and blocks.

**Data Scientist**

- Worked with oil drilling subject matter experts to investigate the drilling process and the most likely points of failure, performing hypothesis testing on survival curves with bootstrapping and Monte Carlo permutation test for confidence intervals.
- Applied Extreme Learning Machines (ELM) and gradient boosted decision trees to predict the optimal drilling rate, with the Mean Absolute Error (MAE) ~0.06
- Used ARIMA time series modeling including variants such seasonal (SARIMA) and explanatory (ARIMAX) models to predict future values of machine parameters
- Cox proportional hazard model was used as a regression algorithm to give “time to event of failure” and is capable of handling censored observations
- Used various algorithms including autoregressive, partial auto-correlation, and survival forests and supervized hidden Markov models to forcast machine parameters
- Utilized AWS-EMR and EC2 for on demand computation and data analysis
- Worked with Amazon RDS and Redshift relational databases to query data for statistical analysis

**Air Force Research Lab JBSA-Fort Sam**

- Performed moecular dynamics simulations with NAMD to calculate vibrational absorption spectrum of microtubules and find their resonance frequency
- Studied the effects of terahertz irradiation (which contains the resonance frequency) on microtubule polymerization and found them to be less stable
- Built Hidden Markov Models for gene sequence analysis to find correlations/relationships between various types of Tubulin
- Utilized clustering algorithms to find patterns in the tubulin gene expression for different states of cell cycle
- X-ray diffraction image processing algorithm development for crystallized protein structure reconstruction (specialized for tubulin).

**Air Force Research Lab JBSA-Fort Sam**

- Designed the optical path, built and operated a resonance Raman system to study the vibrational modes of porphyrin when bound and unbound to tubulin
- Collected the scanning Raman spectra data and removed autofluorescence background signal using Leiber algorithm, then fit a series of Voigt functions to define the characteristic Raman peaks to study the binding triggerd vibrational state changes (Python automation scripts).
- Performed density functional theory (DFT) calculations on high performance computer (HPC) cluster to predict the minimum energy structure of small molecules and their vibrational modes
- Used Singular Value Decomposition to correlate Raman Spectra to theoretical vibrational modes.
- Kramers-Kronig transformation used to predict the conformation of small molecule bound to protein by analyzing its change in Raman spectra upon binding
- Performed docking simulations with genetic search algorithm to predict most likely binding site between distorted ligand and protein
- Utilized femtosecond, 1 GigaWatt pulsed lasers to achieve two-photon absorption by porphyrins to induce protein unfolding