Description: AT&T is an American multinational telecommunication corporation. It is the largest provider for both mobile and landline telephone service, and also provides broadband subscription television services. Being one of the largest telecommunication providers AT&T has huge customer data that can be analyzed and taken advantage of. To consumer marketing professionals, data about the users of mobile network are highly valuable so that the US-based network operator is turning access to and collaboration on its data into a new business service. In order to ensure secure data sharing and at the same time easing access and use of data, good management of data is required which involves data aggregation from multiple sources. AT&T has created programmable interfaces to each of its data sets that ensure read-only access to the data.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Developed Simple to complex Map/reduce Jobs using Hive and Pig
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Used UDF's to implement business logic in Hadoop
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team