Senior Big Data Engineer
Apr 2018 - present
Kansas City, MO
Accomplishments:
- Wrote Apache Spark data processing pipelines on Sprint's on-premises clusters and AWS/Azure clusters that exceed 1000 nodes and datasets producing up to 1/4 TB per hour.
- Migrated services from AWS to Azure.
- Dockerized services into Kubernetes deployments in the cloud. Technologies used: PySpark, Scala Spark, SparkSQL, Apache Airflow, Zookeeper, Ambari, YARN 2, JVM 8, Hive, AWS EMR, AWS EC2, Python's Fabric Framework, Python's Mamba + Expects, Docker, Docker Compose, Kubernetes, Helm, AKS, InfluxDB, Grafana, Jenkins, Azure DevOps. About Pinsight: Pinsight transforms cellular and mobile device data using advanced data science models, machine learning, and AI to deliver actionable insights for businesses spanning a multitude industries. While respecting individual privacy and conforming to government compliance Pinsight ingests and processes over 100 TB of data daily.
Hadoop Architect
May 2017 - Apr 2018
Kansas City, MO
I'm playing a lead role in getting AMC Theatres Big Data initiative off the ground. Responsibilities and Accomplishments:
- Extended a Spark Sentiment Analyzer written in Scala using Stanford CoreNLP to analyze complex customer feedback.
- wrote a custom Flume Source plugin in Java and Scala + Cats for ingesting a vendor's realtime HTTPS event stream
- used Scala, Akka, Scalatra, and Cats to develop an HTTP-based Custom Flume Client
- Co-Administrator of a CDH5 (Cloudera) cluster
- Training for Hadoop software development and Scala programming to peers/engineers
- Development process and workflow advisor
- Exploratory research and project idea generation
- Develop new solutions/Apps leveraging Hadoop technologies including Flume, Spark, Impala, Hive, and HBase
- Deploy new Hadoop apps and plugins to a Kerberized CDH 5 cluster
- Rig applications to execute through Sysvinit, Upstart, or Systemd
- Rig system-initiated applications to auto-authenticate to Kerberos using keytabs
- Automation Engineer and advisor
- Haskell-style functional programming in Scala using Cats
- Imported deeply nested JSON files into Hive and Impala and flattened it out into a traditional SQL table structure.
- wrote real-time data ingestion to HDFS apps using Linux Shell scripting, Python, Java, and Scala
- Created a Docker CDH 5 development sandbox for prototyping.
Hadoop Architect
Oct 2016 - Apr 2017
Kansas City, MO
Automated full SQL translations of databases with schemas containing thousands of schemas and over 100 thousand queries using Python. This Python translated SQL between multiple data stores such as Teradata, Hive LLAP, and HAWQ/Greenplum. Ran performance profiling, tuning, and analysis on Hive LLAP and HAWQ/HDP/Greenplum databases. Wrote a data comparator using Python and Pandas which provided the differences between data sets and the reason for each difference (precision error vs incorrect value vs NULL value, etc). Wrote a concurrent Python script for automating the transfer of data from Hive into both Netezza and Hbase. Data extraction and data loading were both done using dynamic concurrent processing in Python. PoC'd a Spark SQL on top of Hive LLAP application.
Data Scientist - Software Engineer
Aug 2014 - Oct 2016
Kansas City, MO
Write Predictive Analytics MapReduce jobs for Hadoop using Java. Translate algorithms provided by data scientists into code. Wrote research oriented internal web applications in Elm (a functional language based on Haskell and ML). (www.elm-lang.org) Developed new features, fixed bugs, introduced unit testing, and refactored web application written using Ruby on Rails, Nodejs & Expressjs, and Hadoop MapReduce. Other technologies included Redis, RabbitMQ, MySQL, DB2, and Greenplum database. Applied research on BaconJS, Async, and 'q' for simplifying asynchronous and event-driven JavaScript. Prototyped replacing one of our Expressjs servers with LinkedIn's job scheduler, Azkaban, and scripted custom jobs in Go. Used Docker to 'Dockerize' system components and make it quicker and easier to automate the construction of the development environments for software projects. Researched Spark with Scala for near real-time big data processing. Researched Matrix Algebra and Probability Theory as they apply to Big Data applications. Researched the use of Algebraic Topological type systems for proving distributed calculations over data sets in order to provide better code stability and more dynamic algorithms. Collaborated in Agile team environment. Built a prototype Data Audit application using Python, NumPy, and Pandas. Wrote a collaborative filter in Spark as a research project. Wrote alerting scripts in Greenplum SQL and maintained Analytics jobs written in Greenplum SQL.
Software Engineer
Jan 2012 - Aug 2014
Kansas City, MO
Debugger and troubleshooter for enterprise Java applications and legacy VB6 applications, C++ servers, and Oracle 11g and 10g databases. Software developer for a very large Ruby on Rails application with a Java service layer. The Ruby on Rails team uses Agile methodologies and behavior driven development. In this role, I used HTML5, JavaScript, AJAX, PJAX, Ruby, and Rails regularly. Software developer for the Java service layer behind a Rails server to act as an ETL store to Hadoop/HBase using Solr with a REST API to provide data to Rails. Unit and integration testing was done in Scala. The data pipelines were written in Apache Crunch (an abstraction over Hadoop MapReduce).
Jr. Software Engineer
Oct 2010 - Dec 2011
Reston, VA
Programming for fingerprint identification systems. Here, I used C# .NET 2010, C, C++, COM, VB6.0, and XML. The role entailed writing desktop applications that used device wrappers and connected to various sized databases, some of which entailed overriding various Windows environment defaults and settings. I also wrote a .NET wrapper and a COM wrapper for some low-level C device APIs.
Software Engineering Associate
Sep 2008 - Oct 2010
Software Engineering - jack of all trades. I developed a financial database from scratch, prototyped a distributed system with C++, Python, and CORBA, wrote part of an embedded C driver/api, and did some work in a large-scale Java web application here.
Software Engineering Intern
Dec 2006 - Dec 2007
State College, PA
Researched WPF technology and refactored C++ code base.