Summary
Data Scientist and Physicist with expertise in using advanced mathematical techniques to create actionable solutions to business and scientific problems. Extensive experience using object oriented programming to perform statistical modeling, data mining, machine learning, artificial neural networks and various optimization algorithms. A highly responsible data scientist who is vigilant in sustaining effective communication, writing efficient and well documented code, creating intuitive visualizations, and providing practical results.
Summary of Technical Skills
- Programming Languages and Software: Python, R, MATLAB, C#, Latex
- Python Libraries: NumPy, Pandas, SciPy, Matplotlib, scikit-learn, Keras, PyTorch, TensorFlow, PyBrain, Caffe, NLTK, Statsmodels, Seaborn, Selenium
- Data Systems: SQL, NoSQL, AWS (RDS, RedShift, Kinesis, EC2, EMR, S3), MS Azure
- Development Tools: GitHub, Git, Jupyter notebook, Trello, SVN
- IDEs: Spyder, Jupyter, PyCharm, Rstudio, Eclipse
- Statistical Methods: Bayesian Statistics, Hypothesis Testing, Factor Analysis, Stochastic Modeling, Factorial Design, ANOVA
- Machine Learning Frameworks: TensorFlow, PyTorch, Torch, Keras, Caffe
- Supervised Learning: Naive Bayes, Time Series Analysis, Survival Analysis, Linear Regression, Logistic Regression, Elastic Net Regression, Multivariate Regression, Support Vector Machines (SVM), k-Nearest Neighbors (KNN), Decision Trees, Random Forests. Natural Language Processing (NLP)
- Unsupervised Learning: K-means Clustering, Hierarchical Clustering, Centroid Clustering, Principle Component Analysis, Gaussian Mixture Models, Singular Value Decomposition (SVD)
- Deep Learning: Artificial Neural Networks, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Transfer Learning, LSTM Networks, Segmentation, Auto encoding/decoding
- Optimization Techniques: Linear Programming, Dynamic Programming, Convex Optimization, Non-Convex Optimization, Monte Carlo Methods, Network Flows
Tinder Swiper Predictor
- Predict whether or not a user will swipe right given their swipe history
- Tried a Content-Based recommender system to tailor a model to each individual’s preferences
- Convolution neural network used to extract features from images and represent images in a an embedded space
- Feed embedded space features and swipe history data into XGBoost algorithm to make prediction model – this model had a 20% accuracy improvement over guessing based on average swipe rate
- Also tried a Collaborative based filtering approach which can leverage swipe information from the whole user base
- Generated synthetic data comprising of synthetic users having random weighted preferences based on high level features and a random threshold for how attractive they must find a profile to swipe right
- Using synthetic data, Singular Value Decomposition algorithm was used to predict organic user swipes (who had average swipe rate of 45%) with an accuracy of 81%
- When predicting synthetic users, the acuracy of model grew to 89% with 1000 users and 200 swipes per user
Forecasting President Trump’s Tweeting Behavior
- Created ensemble of ARIMAX, Facebook Prophet and Hidden Markov Models to predict the number of tweets Trump will write out in one week
- Used Natural Language Processing (Named Entity Recognition and Sentiment Analysis) on Presidential tweet data to determine relationship between negative/positive sentiment and the quantity of tweets the following day
- Web Scraped Trump references and data on various news websites to determine correlation between the number of Trump references in diferent news categories and number of tweets by Trump
- Used Python Twitter and Reddit API to gather news data on economy and foreign affairs
- So far have won $800 on betting markets (PredictIt.org) using this predition model (and modest wagers)
Housing Price Prediction in Pima County
- Pima County Assesor website has useful housing datasets such Affidavit of Sale, Notice of Value, Housing Details and geographical information
- Only considered sales that represent fair market exchange and so certain had to be removed and some house properties missing were filled in using K-nearest neighbors methods
- Modeled the price of homes with various features using random forest (R2 =0.893), feed forward neural network L1 regularization (R2 =0.887), K-nearest neighbors (R2 =0.859), and Linear Regression with Partial Gaussian Radial Basis and normalization (R2 =0.630)
- The median absolute error for best model was 8.7% (Zillow ranges from 6%-8%)
- Ranked the feature importance in determining house price (year, location, square feet etc.)
Honors
First student of the University of Texas at San Antonio to win the Consortium Research Fellowship awarded by Air Force Research Laboratory.
Selected Publications
- Brady McMicken, Robert J. Thomas, and Lorenzo Brancaleon. Photoinduced partial unfolding of tubulin bound to meso-tetrakis(sulfonatophenyl) porphyrin leads to inhibition of microtubule formation in vitro. J. Biophotonics. 7 (11-12), 874-888. DOI: 10.1002/jbio.201300066
- Brady McMicken, James Parker, Robert Thomas, and Lorenzo Brancaleon. Resonance Raman and vibrational mode analysis used to predict ligand geometry for docking simulations of a water-soluble porphyrin and tubulin, Journal of Biomolecular Structure and Dynamics, 34 (9), 1998-2010. DOI: 10.1080/07391102.2015.1102082
- Brady McMicken, Robert J. Thomas, and Lorenzo Brancaleon. Partial Unfolding of Tubulin Heterodimers Induced by Two-Photon Excitation of Bound meso-tetrakis (sulfonatophenyl) porphyrin. The Journal of Physical Chemistry B. 120 (15), 3653-3665. DOI: 10.1021/acs.jpcb.6b02055