Kolaparthi's Tech Blog: May 2020

Sunday, May 31, 2020

S3 Service

· Number of objects you can store in an Amazon S3 bucket is virtually unlimited.

· Allows you to write, read, and delete objects containing from 1 byte to 5 terabytes of data each.

· Provides data lifecycle management capabilities, allowing users to define rules to automatically archive Amazon S3 data to Amazon Glacier, or to delete data at end of life.

· In S3,objects are encrypted using server-side encryption with either Amazon S3-managed keys (SSE-S3) or AWS KMS-managed keys (SSE-KMS).

· Amazon S3 stores the archived objects in Amazon Glacier. However, these are Amazon S3 objects, and you can access them only by using the Amazon S3 console or the Amazon S3 API. You cannot access the archived objects through the Amazon Glacier console or the Amazon Glacier API

· 100 buckets per account can be created.

· For objects greater than 5gb upto 5tb,u can use "multipart upload api".

· S3 Cross-Region Replication (CRR) is configured to a source S3 bucket and replicates objects into a destination bucket in another AWS Region.

· Amazon S3 Same-Region Replication (SRR), replicates objects between buckets in the same region.

· SSE Data EncryptionWithin Amazon S3, Server Side Encryption (SSE) is the simplest data encryption option available.

· SSE encryption manages the heavy lifting of encryption on the AWS side, and falls into two types: SSE-S3 and SSE-C.The SSE-S3 option lets AWS manage the key for you, which requires that you trust them with that information.

· While Amazon S3 is ideal for hosting static websites, dynamic websites requiring server side interaction, scripting or database interaction cannot be hosted and should rather be hosted on Amazon EC2.

· S3 also regularly verifies the integrity of data stored using checksums. If Amazon S3 detects data corruption, it is repaired using redundant data.

· In addition, S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data

· Data protection against accidental overwrites and deletions can be added by enabling Versioning to preserve, retrieve and restore every version of the object stored

· S3 also provides the ability to protect data in-transit (as it travels to and from S3) and at rest.

· S3 Object lifecycle management allows 2 types of behavior

· Transition in which the storage class for the objects change

· Expiration where the objects are permanently deleted.

Friday, May 29, 2020

Edge Computing

Edge computing is a networking philosophy focused on bringing computing as close to the source of data as possible in order to reduce latency and bandwidth use.

In simpler terms, edge computing means running fewer processes in the cloud and moving those processes to local places, such as on a user’s computer, an IoT device, or an edge server. Bringing computation to the network’s edge minimizes the amount of long-distance communication that has to happen between a client and server.

The increase of IoT devices at the edge of the network is producing a massive amount of data to be computed at data centers, pushing network bandwidth requirements to the limit.Despite the improvements of network technology, data centers cannot guarantee acceptable transfer rates and response times, which could be a critical requirement for many applications. Furthermore, devices at the edge constantly consume data coming from the cloud, forcing companies to build content delivery networks to decentralize data and service provisioning, leveraging physical proximity to the end user.

In a similar way, the aim of Edge Computing is to move the computation away from data centers towards the edge of the network, exploiting smart objects, mobile phones or network gateways to perform tasks and provide services on behalf of the cloud.

By moving services to the edge, it is possible to provide content caching, service delivery, storage and IoT management resulting in better response times and transfer rates

Thursday, May 28, 2020

Docker Volumes

A Docker image is a collection of read-only layers. When you launch a container from an image, Docker adds a read-write layer to the top of that stack of read-only layers. Docker calls this the Union File System.

Any time a file is changed, Docker makes a copy of the file from the read-only layers up into the top read-write layer. This leaves the original (read-only) file unchanged.

When a container is deleted, that top read-write layer is lost. This means that any changes made after the container was launched are now gone.

A volume allows data to persist, even when a container is deleted. Volumes are also a convenient way to share data between the host and the container.

Mounting a volume is a good solution if you want to:

· Push data to a container.

· Pull data from a container.

· Share data between containers.

A Docker volume "lives" outside the container, on the host machine.

From the container, the volume acts like a folder which you can use to store and retrieve data. It is simply a mount point to a directory on the host.

To create a volume, use the command:

sudo docker volume create --name [volume name]

List Volumes

To list all Docker volumes on the system, use the command:

sudo docker volume ls

This will return a list of all of the Docker volumes which have been created on the host.

Inspect a Volume

To inspect a named volume, use the command:

sudo docker volume inspect [volume name]

Remove a Volume

To remove a named volume, use the command:

sudo docker volume rm [volume name]

Monday, May 25, 2020

What is Ansible

Ansible is an open-source automation tool, or platform, used for IT tasks such as configuration management, application deployment, intraservice orchestration, and provisioning. Automation is crucial these days, with IT environments that are too complex and often need to scale too quickly for system administrators and developers to keep up if they had to do everything manually.

Automation simplifies complex tasks, not just making developers’ jobs more manageable but allowing them to focus attention on other tasks that add value to an organization.

Here are some important reasons for using Ansible, such as:

Ansible is free.
Ansible is very consistent and lightweight, and no constraints regarding the operating system or underlying hardware are present.
It is very secure due to its agentless capabilities and open SSH security features.
No need of any special system administrator skills to install and use it.
Its modularity regarding plugins, inventories, modules, and playbooks make Ansible perfect companion orchestrate large environments.

Installing Ansible on Ubuntu:

>sudo yum update -y

>yum install ansible -y

>Add remote Server’s IP that you want to manage, in the Ansible Inventory file.

Ansible Inventory is managed by the file – /etc/ansible/hosts.

sudo nano/etc/ansible/hosts

>ansible-inventory –list -y

>Establish SSh Connection:

ssh-keygen -t rsa

The above command will create two files id_rsa and id_rsa.pub inside the .ssh folder. Copy the content of public-key (id_rsa.pub).

>ls -al ~/.ssh

>cat ~/.ssh/id_rsa.pub

>Create a folder .ssh on the home directory of any user such as ec2-user on Managed node(Remote Servers). Create a file name authorized_keys inside .ssh folder

$ mkdir /home/ec2-user/.ssh

$ touch /home/ec2-user/.ssh/authorized_keys

>Copy the content of id_rsa.pub from inside the file authorized_keys

>Check connectivity to all Slaves using the following command.

$ansible all -m ping -u ec2-user

>Check connectivity to a host group using the following command.

$ ansible WebServer -m ping -u ec2-user

Sample Playbook: Sample Playbook

Sunday, May 24, 2020

Container Orchestration

Container orchestration is all about managing the lifecycles of containers, especially in large, dynamic environments.

Software teams use container orchestration to control and automate many tasks:

>Provisioning and deployment of containers

>Redundancy and availability of containers

>Scaling up or removing containers to spread application load evenly across host infrastructure

>Movement of containers from one host to another if there is a shortage of resources in a host, or if a host dies

>Allocation of resources between containers

>External exposure of services running in a container with the outside world

>Load balancing of service discovery between containers

>Health monitoring of containers and hosts

>Configuration of an application in relation to the containers running it

How does container orchestration work?

When you use a container orchestration tool, like Kubernetes or Docker Swarm (more on these shortly), you typically describe the configuration of your application in a YAML or JSON file, depending on the orchestration tool. These configurations files (for example, docker-compose.yml) are where you tell the orchestration tool where to gather container images (for example, from Docker Hub), how to establish networking between containers, how to mount storage volumes, and where to store logs for that container.

Containers are deployed onto hosts, usually in replicated groups. When it’s time to deploy a new container into a cluster, the container orchestration tool schedules the deployment and looks for the most appropriate host to place the container based on predefined constraints (for example, CPU or memory availability). You can even place containers according to labels or metadata, or according to their proximity in relation to other hosts—all kinds of constraints can be used.

Once the container is running on the host, the orchestration tool manages its lifecycle according to the specifications you laid out in the container’s definition file (for example, its Dockerfile).

The beauty of container orchestration tools is that you can use them in any environment in which you can run containers. And containers are supported in just about any kind of environment these days, from traditional on-premise servers to public cloud instances running in Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure.

Chaos Monkey

Chaos Monkey is a software tool that was developed by Netflix engineers to test the resiliency and recoverability of their Amazon Web Services (AWS).

The software simulates failures of instances of services running within Auto Scaling Groups (ASG) by shutting down one or more of the virtual machines. According to the developers, Chaos Monkey was named for the way it wreaks havoc like a wild and armed monkey set loose in a data center.

Chaos Monkey works on the principle that the best way to avoid major failures is to fail constantly. However, unlike unexpected failures, which seem to occur at the worst possible times, the software is opt-out by default. It can also be configured for opt-in.

Chaos Monkey has a configurable schedule that allows simulated failures to occur at times when they can be closely monitored. In this way, it’s possible to prepare for major unexpected errors rather than just waiting for catastrophe to strike and seeing how well you can manage.

Chaos Monkey was the original member of Netflix’s Simian Army, a collection of software tools designed to test the AWS infrastructure.

Other Simian Army members have been added to create failures and check for abnormal conditions, configurations and security issues. Chaos Gorilla, another member of the Simian Army, simulates outages for entire regions.

Netflix engineers plan to add more monkeys to the army, some based on community suggestions.

Saturday, May 23, 2020

Apache Kafka- Docker Installation

Docker Pull Command : docker pull ches/kafka

# A non-default bridge network enables convenient name-to-hostname discovery

$ docker network create kafka-net

$ docker run -d --name zookeeper --network kafka-net zookeeper:3.4

$ docker run -d --name kafka --network kafka-net --env ZOOKEEPER_IP=zookeeper ches/kafka

$ docker run --rm --network kafka-net ches/kafka \

>>> kafka-topics.sh --create --topic test --replication-factor 1 --partitions 1 --zookeeper zookeeper:2181

Created topic "test".

# In separate terminals:

$ docker run --rm --interactive --network kafka-net ches/kafka \

> kafka-console-producer.sh --topic test --broker-list kafka:9092

$ docker run --rm --network kafka-net ches/kafka \

> kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server kafka:9092

Major Challenges in MicroServices

These are the ten major challenges of Microservices architecture and proposed solutions:

1)Data Synchronization: We have event sourcing

architecture to address this issue using the async messaging platform. The

SAGA design pattern can address this challenge.

2)Security: An API Gateway can solve these challenges. Kong is very

popular and is open-source, and is being used by many companies in

production. Custom solutions can also be developed for API security using

JWT token, Spring Security, and Netflix Zuul/ Zuul2.

There are enterprise solutions available, too, like Apigee and Okta (2-step authentication).

Openshift is used for public cloud security for its top features, like Red Hat Linux Kernel-based security and namespace-based app-to-app security.

3)Services Communication: It’s complex to communicate between

microservices. There are different ways to communicate - Point to point

using API Gateway and pub/sub-event-driven model.

4)Discovery: This will be addressed by API discovery tools like

Kubernetes orchestration, Pivotal Application Services (PAS), container

services (PKS) and OpenShift.

It can also be done using Netflix Eureka at the code level. However, doing it with the orchestration layer will be better and can be managed by these tools rather than doing and maintaining it through

code and configuration.

5) Data Staleness: The database should be always updated to give recent

data. The API will fetch data from the recent and updated database.

A timestamp entry can also be added with each record in the database to

check and verify the recent data. Caching can be used and customized with

an acceptable eviction policy based on business requirements.

6)Distributed Logging, Cyclic Dependencies of Services, and Debugging :

There are multiple solutions for this.

Externalized logging can be used by pushing log messages to an async

messaging platforms like Kafka, Google PubSub, Built-in Kibana dashboard

of OpenShift PAAS solution, which reads console logs from the container and

aggregate in its ElasticSearch persistence on the server, It's persisted even

when POD is crashed and restarted, etc.

It's difficult to identify issues between microservices when services are

dependent on each other and they have a cyclic dependency. Correlation ID

can be provided by the client in the header to REST APIs to track the

relevant logs across all the pods/Docker containers.

7)Testing: This issue can be addressed with unit testing by mocking

REST APIs or integrated/dependent APIs which are not available for testing

using WireMock, BDD, Cucumber, integration testing, performance testing

using JMeter, and any good profiling tool like Jprofiler, DynaTrace,

YourToolKit, VisualVM, etc.

8)Monitoring & Performance: Monitoring can be done using opensource

tools like Prometheus in combination with Grafana by creating

gauge and matrices, GCP StackDriver, Kubernetes/OpensShift, Influx DB,

combined with Grafana, Dynatrace, Amazon CloudWatch, VisualVM,

JProfiler, YourToolKit, Graphite, etc.

Tracing can be done by the latest Open tracing project or Uber's open-source

source Jaeger. It will trace all microservices communication and show

requests/response, and errors on its dashboard.

9)DevOps Support: Microservices deployment and support-related

challenges can be addressed using state-of-the-art DevOps tools like GCP,

Kubernetes, OpenShift, and PCF with Jenkins.

10)Fault Tolerance: Netflix Hystrix can be used to break the circuit if

there is no response from the API for the given SLA/ETA and provide a

mechanism to re-try and graceful shutdown services without any data loss.

For Microservices design patterns: https://skolaparthi.com/microservices-design-patterns/