Showing posts with label Amazon AWS. Show all posts
Showing posts with label Amazon AWS. Show all posts

Friday, 7 February 2025

Resolving SFTP Import Failures on Amazon AWS: Best Practices for Cloud Development and SRE Engineering

In the fast‐paced world of Cloud Development and SRE Engineering, encountering transient errors can disrupt your workflow. Recently, our team noticed an error during an SFTP import job that failed at 2020-02-20

The error log indicated an AccessDenied issue when the Lambda function attempted to perform an s3:ListBucket operation on an S3 bucket (e.g. arn:aws:s3:::example-bucket). In this article, we will explore how to diagnose and resolve this issue using Amazon AWS–native techniques, offering insights and practical examples for engineers working in cloud environments.


Understanding the Issue

The Error Scenario

During an SFTP import job, our logs reported the following error:

AccessDenied: User: arn:aws:sts::111111xxxxxx:assumed-role/AxLambdaRole-user-envname/EventBusListenerFunction-envname-user is not authorized to perform: s3:ListBucket on resource: "arn:aws:s3:::example-bucket" because no identity-based policy allows the s3:ListBucket action
Status Code: 403
Request ID: RTDG456GRHUUO1

Our investigation revealed that the job was triggered twice:

  1. First Invocation (Failure): The job started when the SFTP stream began writing to the S3 bucket. At this point, the file was incomplete, and the Lambda function did not have permission to list the bucket.
  2. Second Invocation (Success): When the file upload finished, a subsequent trigger led to a successful run.

Team Discussion and Client Feedback

Our internal discussion concluded that the SFTP file was being streamed into the bucket. One of our SRE engineers proposed that the initial trigger occurred when the file stream began, and the second one when the stream completed. However, this double-triggering results in false-positive alerts, as the email notifications indicate.

The client explained:

"Cynthetic Client writes files directly to the SFTP server without first creating a local copy. Changing this behavior would require additional infrastructure and significant time to propose and approve an architectural change in the Data Platform."

They are exploring alternatives, but in the meantime, we need a solution that minimizes false alerts using existing Amazon AWS features.


Amazon AWS–Native Strategies to Resolve the Issue

Here are several strategies that leverage AWS-native services and configurations to address the issue without needing major infrastructure changes.

1. Refine S3 Event Notification Triggers

Use Specific Event Types:

  • Instead of triggering on every object creation event, configure your S3 bucket to send notifications only when the file upload is complete.
  • If your process uses multipart uploads, configure the event to trigger on the s3:ObjectCreated:CompleteMultipartUpload event.
  • Refer to the Amazon S3 Event Notifications documentation for more details.

Apply Object Key Filters:

  • Encourage your partners to add a specific suffix (e.g., .complete) to file names when the file is fully uploaded.
  • Update the event configuration to trigger only on objects that match this pattern.

2. Utilize SQS with Delayed Processing

Route Events Through an SQS Queue:

  • Instead of triggering the Lambda function directly from the S3 event, configure the bucket to send events to an Amazon SQS queue.
  • This allows you to introduce a delay or buffer time before processing the event.

Implement Delay or Validation in Lambda:

  • Within your Lambda function, add logic to check whether the file upload is complete by verifying object metadata (e.g., size or custom tags).
  • If the file is incomplete, the function can ignore the event or re-queue it for later processing.
  • Learn more about integrating SQS with Lambda in the AWS Lambda Developer Guide.

3. Adjust the IAM Policy for the Lambda Role (Temporary Workaround)

Grant s3:ListBucket Permission:

  • As a short-term measure, update the IAM policy attached to the Lambda’s role to allow s3:ListBucket on the S3 bucket.
  • This can prevent the Lambda function from failing the initial check.
  • However, note that even with the permission granted, if the file is incomplete, your processing logic may still encounter issues.
  • For more on IAM policies, refer to the AWS IAM User Guide.

4. Work with Your SFTP Partner

Discuss Alternative Upload Approaches:

  • Engage with your SFTP provider (in our case, Synthetic Client) and ask if they could temporarily stream the file to a staging area or use a naming convention to indicate when the file is fully uploaded.
  • Although the client mentioned that changing this process requires additional infrastructure and lengthy approvals, a dialogue might uncover interim solutions.

Example: Implementing an SQS-Delayed Lambda Trigger

Below is an example configuration for routing S3 events through SQS with a delay, minimizing false alerts.

  1. Configure S3 to Send Notifications to SQS:

  2. Set a Delay on the SQS Queue:

  3. Update Your Lambda Function:

    • Modify the Lambda code to check if the object exists in its complete form. For example, use the HeadObject API to verify file size or metadata.
    • Process the event only if the file meets the completeness criteria.

Conclusion

By refining your S3 event triggers, leveraging Amazon AWS services like SQS for delayed processing, and, if necessary, temporarily adjusting IAM policies, you can reduce false alarms caused by premature Lambda invocations during SFTP streaming uploads. This approach aligns with best practices in Cloud Development and SRE Engineering, ensuring your systems remain robust and secure without incurring significant additional costs.

For further reading, check out these official resources:

Implementing these solutions can help your team manage SFTP imports more reliably while reducing unwanted alerts. This method exemplifies how strategic configuration in Amazon AWS can streamline Cloud Development processes and enhance the operational efficiency expected in modern SRE Engineering practices.

Happy cloud developing!

Monday, 9 September 2024

Can you explain what is the different between is MTTR and MTTD use in Amazon AWS?

Amazon AWS MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detect) are key metrics used to measure and improve system reliability and incident management. Here's a breakdown of each: 


  1. MTTR (Mean Time to Recovery):

    • What it is for: MTTR represents the average time it takes to recover from a system failure or incident. This metric is critical in evaluating the efficiency and speed of your recovery process after something goes wrong.
    • Use in AWS: AWS services and infrastructure are built with high availability, but incidents like configuration issues, downtime, or hardware failures can still occur. MTTR helps DevOps teams understand how quickly they can restore normal operations after an incident.
    • Example: If your EC2 instance crashes, MTTR measures how long it takes to identify the issue, apply a fix, and restore the service to full functionality.
  2. MTTD (Mean Time to Detect):

    • What it is for: MTTD measures the average time it takes to detect an issue or incident from the moment it occurs. Identifying how responsive your monitoring systems are in catching problems early is critical.
    • Use in AWS: In AWS, MTTD can be improved by using services like CloudWatch, AWS X-Ray, and GuardDuty, which help detect performance degradation, security threats, or failures in your system. The sooner you detect a problem, the faster you can work on fixing it.
    • Example: MTTD would measure how long your monitoring systems take to detect a spike in error rates in an application hosted on AWS Lambda.

Key Benefits:

  • Lower MTTR means quicker recovery from incidents, minimising downtime and reducing impact on end users.
  • Lower MTTD means quicker detection, allowing teams to act before incidents escalate into bigger problems.


Both metrics are crucial in assessing and improving the resilience and reliability of your AWS-based infrastructure. 💡

Monday, 8 July 2024

Kubernetes CKA Exam preparation Links

 Here is a link to of links for the material to help me prepare for the Kubernetes Administration Exam.








How to Prepare for Certified Kubernetes Administration (CKA) Exam - https://vmtechie.blog/2019/01/12/how-to-prepare-for-certified-kubernetes-administration-cka-exam/
PASSING CKA IS NOT HARD, BUT PREPARATION IS --- https://thecloudcaptain.com/blog/passing_cka_is_not_hard/
Studying for the Certified Kubernetes Administrator Exam - http://www.kubernet.io/

Thursday, 4 July 2024

Azure tutorials with Terraform and Packer and Ansible ..

 Here are some links to be used to build and deploy Packer Image Automated Build, Provisioned with Ansible and Deployed into infrastructure with Terraform. 


Links list; 

- Microsoft Tutorial to Implement as tutorial and publish as Video Tutorial - https://cloudblogs.microsoft.com/opensource/2018/05/23/immutable-infrastructure-azure-vsts-terraform-packer-ansible/  ( This article my have few bugs and incompatibilities as it was done ages ago and the application has since moved into more advanced versions ) 


- Here - Creating VM Images with Packer in Microsoft Azure - https://medium.com/slalom-build/azure-packer-592c4dc0e23a  

- This tutorial is more oriented for Amazon AWS  instead of Azure - https://devopscube.com/packer-tutorial-for-beginners/   

- This another tutorial buils Terraform and Packer image build automation for Amazon AWS - https://www.bogotobogo.com/DevOps/Terraform/Terraform-state-tfstate-import.php  





4 – Videos to be implemented on Jobudo DevTeam Channel – Or create another Playlist on my existent TDLM Channel ..


I was glad to help! If you still have any questions, then be sure to contact us.
✅ Be sure to subscribe to our official social network:

Friday, 4 August 2023

From Silos to Success: How DevOps Transforms Development and Operations


In the rapidly evolving landscape of software development, the term "DevOps" has gained significant prominence.

DevOps, short for combination of work and efforts from Development teams and Operations teams, represents a collaborative and holistic approach to software development and deployment...                


It aims to break down traditional silos between development and IT operations teams, fostering a culture of seamless communication, continuous integration, and rapid delivery. This article provides an introduction to the concept of DevOps, its principles, benefits, and its role in modern software development.

**Understanding DevOps:**

DevOps is a methodology that emphasises the collaboration and cooperation of software development (Dev) and IT operations (Ops) teams throughout the entire software development lifecycle. 

Traditionally, these two functions worked in isolation, leading to communication gaps, slower release cycles, and a lack of accountability in case of issues. DevOps seeks to bridge this gap by promoting shared responsibilities and a more streamlined approach.

**Key Principles of DevOps:**

1. **Collaboration:**DevOps encourages open communication and cooperation between developers, testers, and operations teams. This helps in identifying and addressing potential problems early in the development process.


2. **Automation:** Automation is a core principle of DevOps. By automating tasks like testing, deployment, and infrastructure provisioning, teams can reduce human errors, improve efficiency, and ensure consistent processes.



e.g. Example of DevOps LifeCycle - planning your platform and mapping out what you need to accomplish at each step


3. **Continuous Integration (CI):**CI involves integrating code changes from multiple developers into a shared repository several times a day. This ensures that new code is regularly tested and merged, reducing integration issues and improving software quality.


4. **Continuous Delivery (CD):** CD builds upon CI by automating the deployment process. It allows for the rapid and reliable release of software updates to production environments, minimising manual interventions and reducing deployment risks.


5. **Monitoring and Feedback:**DevOps emphasises real-time monitoring of applications and infrastructure. This helps teams identify performance bottlenecks, security vulnerabilities, and other issues, enabling quick remediation.


DevOps LifeCycle
e.g - Of DevOps Lifecycle

[ “ While Talking to customers, we found that while automating the continuous delivery pipeline was important, the missing part was enabling the feedback loop,” Monitoring and logging software packages are rapidly converging on the notion of becoming “DevOps hubs” ]

**Benefits of DevOps:**

1. **Faster Time to Market:** DevOps practices enable quicker development cycles and faster release of features or updates, allowing businesses to respond to market demands more effectively.


2. **Improved Collaboration:** DevOps breaks down barriers between teams, fostering better understanding and cooperation, which ultimately leads to improved software quality.


3. **Enhanced Reliability:** Automation and continuous testing ensure that changes are thoroughly tested and consistently deployed, reducing the likelihood of failures in production environments.


4. **Scalability:** DevOps practices, combined with cloud technologies, allow applications to scale seamlessly according to demand.


5. **Higher Quality Software:**Continuous testing and feedback loops lead to higher software quality, as issues are identified and addressed early in the development process.


**Conclusion:**

DevOps represents a paradigm shift in software development, moving away from traditional, siloed approaches towards a collaborative, automated, and customer-focused methodology. By promoting a culture of collaboration, automation, and continuous improvement. 

DevOps has become an essential framework for organisations looking to accelerate their software development lifecycle, enhance software quality, and meet the ever-changing demands of the modern market. Embracing DevOps principles can lead to more efficient, reliable, and successful software development projects.

Resolving SFTP Import Failures on Amazon AWS: Best Practices for Cloud Development and SRE Engineering

In the fast‐paced world of Cloud Development and SRE Engineering , encountering transient errors can disrupt your workflow. Recently, our t...