Big Data Developer

Job Location :- 15040 Capital One Drive Richmond VA 23238

Job Description :-

  • As a Big Data developer, will work on open source & distributed computing systems like Spark.
  • Design, develop & deploy data pipelines for streaming data using AWS EMR Spark, Java, MessageQueues & Kafka, these pipelines should work in real-time.
  • Continuous improvement of data pipelines performance, security practices, fine grained access controls in our data platform then following respective compliance rules & regulations by adhering to them.
  • Use Big Data tools like Hadoop, Spark, Kafka, AWS to conduct analysis of petabytes complex numerical data in storage systems like S3, HFDS, work on data modeling, data ETL activities on relational & NoSQL databases.
  • Develop Spring integrations for Message Queues & write complicated serialized data structures to streaming systems then follow proper secure enterprise integrations with reliable credentials management to mitigate unnecessary security leakages.
  • Tools like Spring Boot, Spring Kafka, Spring spark, Spring Integrations, Spring REST will be used to develop lot of builds in Data streaming pipelines, development, unit testing & deployment are part of activities working with Spring eco-system.
  • Source code management will be accomplished using Git for version control management &GitHub Enterprise will be used for code repositories. CI CD is dependent on these code repositories, separate branches are maintained for each release.
  • Implement Agile deployment practices such as CI CD (Continuous Integration & Continuous Deployment) will be strictly followed using internal deployment tools to build containerized deployment builds & initiate testing & deployment builds in production & QA using ChatOps practices.
  • Develop AWS Infrastructure management responsibilities such as rehydration of EC2 instances periodically & deploying Elasting Load Balancers to high available systems and performing a fault tolerant architecture.
  • Agile tools which will be used are ServiceNow for ticketing, incident records & change order, JIRA for Sprints development management, User stories creating & fulfilling with timelines.
  • Google Suite for cloud document management system & Slack for internal communication systems.
  • Implement Cyber Security practices while deploying CI CD builds in production, every integration of each component in data pipelines such as client integration with server & Database integration always with SSL, strictly avoid plaintext communication of data.
  • Deploy jobs in production are continuously monitored using AWS CloudWatch, DataDog for metrics logs & traces, use these observability principles, make sure that performance of application will be monitored real-time & troubleshoot any issues and fulfill SLA.
  • Develop REST API client tools in Python & Spring framework to communicate securely and retrieve data from each endpoint then collect data to convert readable formats & ingest them to relational databases to correlate with huge streaming data.
  • Implement Core data source for streaming & REST API data as an internal IOT enterprise server, use this specific source to develop data engineering pipelines & communicate with internal teams for knowledge sharing within teams. IPT server streams data from every controller, routers & sensor devices. Masking sensitive data is core part of our development.
  • Use AWS Athena as end data reference from S3 buckets to create optimized data model by handling complex data formats such as JSON & Parquet for optimal performance.
  • Develop Complex SQL queries on servers like Athena, RDS, DynamoDB, snowflake, HBase, Hive, to extract ETL, data cleansing, aggregation queries on databases.
  • Generate Business Intelligence (BI) reports from data sources like S3, Athena, Snowflake using AWS QuickSight service, design dashboards, plotting graphs, & finally generate reports for presentation within teams & clients.
  • Programming languages like Python, Java (Spring framework), Scala should be part of code development. Spark Core, Spart Streaming, PySpark, SparkSQL are crucial for data pipelines. Kafka core, Kafka Streams, Kafka REST API & schema server are core components of our streaming data pipelines.
  • Use AWS Lambda for scheduling Serveless jobs, AWS Kinesis for heavy-throughput real-time data streaming from IOT Servers. AWS S3 for data storage & AWS IAM for proper security & identity management.
  • Work within and across agile teams and lead design discussions with availability, resilience & scalability of solutions designed in a highly collaborative & agile environment.

Minimum Education Required: – This position requires a candidate with a minimum of a Bachelor’s degree in computer science, computer information systems, information technology, relevant engineering (computer engineering, software engineering, electronic engineering, or related), or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.