Big Data Engineer
Big Data Engineer with Bachelor’s Degree preferably in Computer Science, Information Technology or related area of study.
Job Duties and Responsibilities:
- Involve in project Life Cycle - from analysis to production implementation, with emphasis on identifying the source and source data validation, developing logic and transformation as per the requirement and creating mappings and loading the data into different targets.
- Design and develop data ingestion frameworks leveraging open source tools such as Hive, Java, Python, as well as data processing/transformation frameworks leveraging open source tools.
- Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements.
- Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions.
- Automate the process for extraction of data from various sources like Oracle, MySQL, Teradata, SQL Server in to Hive tables by developing workflows and coordinator jobs in Autosys.
- Work with Artificial Intelligence (AI) platform team and other Enterprise Information Technology (EIT) team to provision data computing environment to evaluate/validate various approaches for integrating model pipeline with application systems before it is released to production.
- Create aggregated tables on data using PySpark.
- Participate in regular status meetings to track progress, resolve issues, mitigate risks and escalate concerns in a timely manner.
- Maintain and assist in data model documentation, data dictionary, data flow, data mapping and other MDM and Data Governance documentation.
- Work closely with IT application teams, Enterprise architecture, infrastructure, information security, and LOB stakeholders to translate business and technical strategies into data-driven solutions for the company.
- Contribute to the development, review, and maintenance of requirements documents, technical design documents and functional specifications.
- Optimize the data ingestion using various Big Data technologies like Hive, Pig, Flume, Mango DB, Sqoop, Zookeeper, Spark, MapReduce2, YARN, HBase, Kafka and Strom.
- Translate load and exhibit unrelated data sets in various formats and sources like JSON, text files, Kafka queues and log data.
- Develop and deploy Chef scripts on centralized DEV, QA, PROD servers for installing Java , Apache Spark / Flink /Apex, Stunnel, nginx http server, influx data and Apache Zookeeper.
- Use Jenkins for release process.
- Use JIRA for incident creation, bug tracking and change management process.
- Work with GIT repository for all development and code maintenance.
- Provide Support and troubleshooting for data platforms whenever required. Provide escalated on-call Support for complicated and/or critical incidents.
- Constantly support the business users and with their errors from Hive, Hue.
Skills / Knowledge required:
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Experience building and optimizing big data pipelines, architectures and data sets.
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- A successful history of manipulating, processing and extracting value from large disconnected datasets.
- Working knowledge of message queuing, stream processing, and highly scalable No-SQL data stores.
- Strong project management and organizational skills.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- At least with 2+ years of experience in a Data Engineer or similar roles, who has attained a Bachelor’s degree in Bachelor’s degree in Engineering, Computer Science, Information Technology, related field or equivalent work experience.
- Experience with big data tools: Hadoop, Spark, Kafka, Hive, Mapreduce2, YARN, etc.
- Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Experience with data pipeline and workflow management tools: Oozie, Autosys, Airflow, etc.
- Experience with AWS cloud services: EC2, EMR, RDS, Redshift
- Experience with stream-processing systems: Storm, Spark-Streaming, etc.
- Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
CDH, HDFS, Hadoop, Hive, Pig, Flume, Mongo DB, Zookeeper, Mapreduce2, YARN, HBase, Storm, Sqoop, Spark, Kafka, SQL Server, Oracle, Teradata, WinScp, Putty, UNIX Shell Scripting, Microsoft Visio, MS Word, MS Excel, GIT, JIRA and Confluence etc.
Work location is Portland, ME with required travel to client locations throughout USA.
Rite Pros is an equal opportunity employer (EOE).
Please Mail Resumes to:
Rite Pros, Inc.
415 Congress St, Suite # 201 & 202
Portland, ME 04101