Uploaded File
Garrett
josiah.berkebile@protonmail.com
814-442-8281
Kansas City, MO 64101
Senior Big Data Engineer
13 years experience W2
0
Recommendations
Average rating
20
Profile views
Summary

I'm a polyglot software engineer with a focus on data science, distributed computing, and machine learning. I have developed my skill set since college graduation with the key goal of being the most flexible and adaptable developer possible. I am a true polyglot, productive in procedural object-oriented languages, functional languages, statically typed languages, dynamically typed languages, interpreted languages, and compiled languages. The breadth of experience I have has given me the ability to learn any programming language, fast, as my career has exposed me to database programming, embedded C programming, desktop programming, distributed systems (Hadoop, Spark, etc.), and full-stack web development. I thrive in highly collaborative team environments, and reason often through discussion and teaching. This has made Agile a best fit for my working style, and I love to pass on what I know to other developers.

Experience
Senior Big Data Engineer
Apr 2018 - present
Kansas City, MO
Accomplishments:
  • Wrote Apache Spark data processing pipelines on Sprint's on-premises clusters and AWS/Azure clusters that exceed 1000 nodes and datasets producing up to 1/4 TB per hour.
  • Migrated services from AWS to Azure.
  • Dockerized services into Kubernetes deployments in the cloud. Technologies used: PySpark, Scala Spark, SparkSQL, Apache Airflow, Zookeeper, Ambari, YARN 2, JVM 8, Hive, AWS EMR, AWS EC2, Python's Fabric Framework, Python's Mamba + Expects, Docker, Docker Compose, Kubernetes, Helm, AKS, InfluxDB, Grafana, Jenkins, Azure DevOps. About Pinsight: Pinsight transforms cellular and mobile device data using advanced data science models, machine learning, and AI to deliver actionable insights for businesses spanning a multitude industries. While respecting individual privacy and conforming to government compliance Pinsight ingests and processes over 100 TB of data daily.
AWS EC2 Big Data Compliance Data Engineering Devops Docker Containers Jenkins Machine Learning Mobile Devices MS Azure Python Spark Hadoop
Remove Skill
Hadoop Architect
May 2017 - Apr 2018
Kansas City, MO
I'm playing a lead role in getting AMC Theatres Big Data initiative off the ground. Responsibilities and Accomplishments:
  • Extended a Spark Sentiment Analyzer written in Scala using Stanford CoreNLP to analyze complex customer feedback.
  • wrote a custom Flume Source plugin in Java and Scala + Cats for ingesting a vendor's realtime HTTPS event stream
  • used Scala, Akka, Scalatra, and Cats to develop an HTTP-based Custom Flume Client
  • Co-Administrator of a CDH5 (Cloudera) cluster
  • Training for Hadoop software development and Scala programming to peers/engineers
  • Development process and workflow advisor
  • Exploratory research and project idea generation
  • Develop new solutions/Apps leveraging Hadoop technologies including Flume, Spark, Impala, Hive, and HBase
  • Deploy new Hadoop apps and plugins to a Kerberized CDH 5 cluster
  • Rig applications to execute through Sysvinit, Upstart, or Systemd
  • Rig system-initiated applications to auto-authenticate to Kerberos using keytabs
  • Automation Engineer and advisor
  • Haskell-style functional programming in Scala using Cats
  • Imported deeply nested JSON files into Hive and Impala and flattened it out into a traditional SQL table structure.
  • wrote real-time data ingestion to HDFS apps using Linux Shell scripting, Python, Java, and Scala
  • Created a Docker CDH 5 development sandbox for prototyping.
Flume Hadoop Hadoop Architect Hbase HDFS Hive impala Scripting Spark SQL
Remove Skill
Hadoop Architect
Oct 2016 - Apr 2017
Kansas City, MO
Automated full SQL translations of databases with schemas containing thousands of schemas and over 100 thousand queries using Python. This Python translated SQL between multiple data stores such as Teradata, Hive LLAP, and HAWQ/Greenplum. Ran performance profiling, tuning, and analysis on Hive LLAP and HAWQ/HDP/Greenplum databases. Wrote a data comparator using Python and Pandas which provided the differences between data sets and the reason for each difference (precision error vs incorrect value vs NULL value, etc). Wrote a concurrent Python script for automating the transfer of data from Hive into both Netezza and Hbase. Data extraction and data loading were both done using dynamic concurrent processing in Python. PoC'd a Spark SQL on top of Hive LLAP application.
Hadoop Hadoop Architect Hbase Hive Netezza Spark SQL Teradata
Remove Skill
Data Scientist - Software Engineer
Aug 2014 - Oct 2016
Kansas City, MO
Write Predictive Analytics MapReduce jobs for Hadoop using Java. Translate algorithms provided by data scientists into code. Wrote research oriented internal web applications in Elm (a functional language based on Haskell and ML). (www.elm-lang.org) Developed new features, fixed bugs, introduced unit testing, and refactored web application written using Ruby on Rails, Nodejs & Expressjs, and Hadoop MapReduce. Other technologies included Redis, RabbitMQ, MySQL, DB2, and Greenplum database. Applied research on BaconJS, Async, and 'q' for simplifying asynchronous and event-driven JavaScript. Prototyped replacing one of our Expressjs servers with LinkedIn's job scheduler, Azkaban, and scripted custom jobs in Go. Used Docker to 'Dockerize' system components and make it quicker and easier to automate the construction of the development environments for software projects. Researched Spark with Scala for near real-time big data processing. Researched Matrix Algebra and Probability Theory as they apply to Big Data applications. Researched the use of Algebraic Topological type systems for proving distributed calculations over data sets in order to provide better code stability and more dynamic algorithms. Collaborated in Agile team environment. Built a prototype Data Audit application using Python, NumPy, and Pandas. Wrote a collaborative filter in Spark as a research project. Wrote alerting scripts in Greenplum SQL and maintained Analytics jobs written in Greenplum SQL.
Agile Methodology Big Data DB2 Docker Containers Hadoop JavaScript MapReduce MySQL node.js Python Rabbitmq Ruby Ruby on Rails Software Engineer Spark SQL
Remove Skill
Software Engineer
Jan 2012 - Aug 2014
Kansas City, MO
Debugger and troubleshooter for enterprise Java applications and legacy VB6 applications, C++ servers, and Oracle 11g and 10g databases. Software developer for a very large Ruby on Rails application with a Java service layer. The Ruby on Rails team uses Agile methodologies and behavior driven development. In this role, I used HTML5, JavaScript, AJAX, PJAX, Ruby, and Rails regularly. Software developer for the Java service layer behind a Rails server to act as an ETL store to Hadoop/HBase using Solr with a REST API to provide data to Rails. Unit and integration testing was done in Scala. The data pipelines were written in Apache Crunch (an abstraction over Hadoop MapReduce).
Agile Methodology AJAX Apache C++ ETL Java JavaScript Oracle REST Ruby Ruby on Rails Software Engineer
Remove Skill
Jr. Software Engineer
Oct 2010 - Dec 2011
Reston, VA
Programming for fingerprint identification systems. Here, I used C# .NET 2010, C, C++, COM, VB6.0, and XML. The role entailed writing desktop applications that used device wrappers and connected to various sized databases, some of which entailed overriding various Windows environment defaults and settings. I also wrote a .NET wrapper and a COM wrapper for some low-level C device APIs.
.NET C C# C++ Software Engineer Windows
Remove Skill
Software Engineering Associate
Sep 2008 - Oct 2010
Software Engineering - jack of all trades. I developed a financial database from scratch, prototyped a distributed system with C++, Python, and CORBA, wrote part of an embedded C driver/api, and did some work in a large-scale Java web application here.
Java Python Software Engineer
Remove Skill
Software Engineering Intern
Dec 2006 - Dec 2007
State College, PA
Researched WPF technology and refactored C++ code base.
Software Engineer WPF
Remove Skill
Edit Skills
Non-cloudteam Skill
Education
Computer Engineering
Penn State University, 2003 - 2008
Record has not been verified.
Somerset Area High School
Record has not been verified.
Skills
Software Engineer
2016
8
Hadoop
2021
6
Python
2021
6
Spark
2021
6
Agile Methodology
2016
4
Big Data
2021
4
Docker Containers
2021
4
Java
2014
4
JavaScript
2016
4
Ruby
2016
4
Ruby on Rails
2016
4
SQL
2018
4
C++
2014
3
AJAX
2014
2
Apache
2014
2
AWS EC2
2021
2
Compliance
2021
2
Data Engineering
2021
2
DB2
2016
2
Devops
2021
2
ETL
2014
2
Hadoop Architect
2018
2
Hbase
2018
2
Hive
2018
2
Jenkins
2021
2
Machine Learning
2021
2
MapReduce
2016
2
Mobile Devices
2021
2
MS Azure
2021
2
MySQL
2016
2
node.js
2016
2
Oracle
2014
2
Rabbitmq
2016
2
REST
2014
2
.NET
2011
1
C
2011
1
C#
2011
1
Flume
2018
1
HDFS
2018
1
impala
2018
1
Netezza
2017
1
Scripting
2018
1
Teradata
2017
1
Windows
2011
1
WPF
2007
1
Akka
0
1
Apache SOLR
0
1
Bootstrap
0
1
CSS
0
1
Git
0
1
Greenplum
0
1
Linux
0
1
Mac OS
0
1
Maven
0
1
MVC
0
1
RHadoop
0
1
SASS
0
1
Scala
0
1