Thursday, March 23, 2023

what is AWS EC2 and how do you launch

Amazon Elastic Compute Cloud (EC2) is a web service provided by Amazon Web Services (AWS) that allows you to rent virtual computing resources on the cloud. It provides you with the flexibility to create and manage virtual machines (instances) in the cloud, giving you complete control over the computing environment.

To launch an EC2 instance, you can follow these general steps:

  1. Log in to your AWS Management Console and navigate to the EC2 service.
  2. Choose the appropriate region for your instance.
  3. Click on the "Launch Instance" button.
  4. Choose an Amazon Machine Image (AMI) that contains the operating system and software you want to use.
  5. Select the instance type, which specifies the computing resources you want to allocate to the instance.
  6. Configure the instance details, including network settings, storage, and security settings.
  7. Review and launch the instance.
  8. Select or create a key pair for secure access to the instance.

After launching the instance, you can access it via SSH or Remote Desktop, depending on the operating system you chose. You can also modify the instance settings, such as changing the instance type, adding storage, or modifying network settings.

AWS Regions and Availability zones

AWS Regions and Availability Zones

 

AWS (Amazon Web Services) is a cloud computing platform that provides a variety of services to users. To provide high availability, AWS has divided its infrastructure into regions and availability zones.

AWS Regions are geographical locations where AWS has data centers. Each region is completely independent and isolated from other regions in terms of network, power, and cooling infrastructure.

There are currently 26 regions around the world, with more being added regularly. Some of the regions include US East (N. Virginia), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo), etc.

 

Availability Zones (AZs) are physical data centers within a region. Each AZ is isolated from the others in terms of power, network, and cooling infrastructure.

These AZs are connected with high-speed, low-latency links to form a region. By spreading resources across multiple AZs, users can ensure their applications remain available even if one or more AZs experience an outage.

 

For example, if you have an application running in the US East (N. Virginia) region, you could launch instances in one or more availability zones within that region. By doing so, your application will be highly available and fault-tolerant.

 

In summary, AWS Regions and Availability Zones are designed to provide high availability and redundancy for AWS customers. By spreading their resources across multiple regions and AZs, users can ensure their applications are highly available and can withstand unexpected failures. 

AWS Cloud Service & Deployment Models

AWS (Amazon Web Services) provides a wide range of cloud services and deployment models to meet the needs of different customers. Below are the most common AWS cloud services and deployment models:

Cloud Services:

  1. Compute Services: This includes services such as Amazon EC2 (Elastic Compute Cloud), AWS Lambda, and AWS Elastic Beanstalk. These services enable customers to run their applications and workloads on the cloud.

  2. Storage Services: This includes services such as Amazon S3 (Simple Storage Service), Amazon EBS (Elastic Block Store), and Amazon Glacier. These services enable customers to store and retrieve data on the cloud.

  3. Database Services: This includes services such as Amazon RDS (Relational Database Service), Amazon DynamoDB, and Amazon Redshift. These services enable customers to manage their databases on the cloud.

  4. Networking Services: This includes services such as Amazon VPC (Virtual Private Cloud), Amazon Route 53, and Amazon CloudFront. These services enable customers to manage their networking infrastructure on the cloud.

  5. Security and Identity Services: This includes services such as AWS IAM (Identity and Access Management), AWS KMS (Key Management Service), and AWS WAF (Web Application Firewall). These services enable customers to manage their security and identity on the cloud.

  6. Management and Governance Services: This includes services such as AWS CloudFormation, AWS CloudTrail, and AWS Config. These services enable customers to manage and monitor their AWS resources.

  7. Analytics Services: This includes services such as Amazon EMR (Elastic MapReduce), Amazon Kinesis, and Amazon Redshift. These services enable customers to perform data analytics on the cloud.

  8. Application Integration Services: This includes services such as Amazon SNS (Simple Notification Service), Amazon SQS (Simple Queue Service), and Amazon SWF (Simple Workflow Service). These services enable customers to integrate their applications on the cloud.

Deployment Models:

  1. Public Cloud: In a public cloud deployment model, the cloud infrastructure is owned and operated by a third-party cloud provider, such as AWS. Customers can access the cloud services over the internet.

  2. Private Cloud: In a private cloud deployment model, the cloud infrastructure is owned and operated by an organization, either on-premises or in a third-party data center. The cloud services are accessed by the organization's users only.

  3. Hybrid Cloud: In a hybrid cloud deployment model, the cloud infrastructure is a combination of public and private cloud resources. This model enables customers to leverage the benefits of both public and private clouds.

  4. Multi-Cloud: In a multi-cloud deployment model, an organization uses multiple cloud providers, such as AWS, Azure, and Google Cloud, to meet their specific requirements.

Thursday, January 6, 2022

 Create Environment with a default python version


//Crate spark context

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("test").getOrCreate()

//Read CSV File from a location including header as column names and schema

df = session.read.csv('archive/Case.csv', header=True,inferSchema = True)

//To show records

df.show()

//To Rename a column name

df = df.withColumnRenamed('Existing Column','new Column') 

//Select Columns

df.select('col1','col2','col3'....).show()

//sort

df.sort(''col1','col2','col3'....).show()

 //to specify asc or desc

from pySpark.sql import function as f

df.sort(f.desc( 'col1','col2','col3'....))show()

Cast

Though we don’t face it in this dataset, there might be scenarios where Pyspark reads a double as integer or string, In such cases, you can use the cast function to convert types.

 

 

Friday, March 16, 2018

sudo apt install curl

install Anaconda Python using curl

curl -O https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh


Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

sudo rm /var/lib/dpkg/lock
sudo dpkg --configure -a

Monday, February 8, 2016

Teradata architecture

Teradata Architecture:

Teradata mainly contains the following architectural components.
1. Parsing Engine (PE)
2. BYNET
3. AMP (Access module processor)
4. Disks

Parsing Engine:
Parsing Engine is a virtual processor (vproc).It has the following software components.
1. Session control
2. Parser
3. Optimiser
4. Dispatcher




 Whenever a SQL request is given to the parsing engine the session control verifies for the session authorisation (user name and password) and based on that processes or will reject the request.

 Parser verifies the sql request for proper syntax and evaluates them. Checks the data dictionary if all the objects exist and if the user has authority to access them.

 Optimiser develops the least expensive (time) plan to execute the request .optimiser must know about system configuration, available AMP’S and PE’S. Optimiser enables Teradata to handle complex queries efficiently.

 Dispatcher controls the sequence in which steps are executed an passes the steps to the BYNET. BYNET is a messaging layer in between parsing engine and access module processor. After the AMPs process the steps, the PE receives their responses over the BYNET. The Dispatcher builds a response message and sends the message back to the user.

 Teradata uses hashing algorithm to distribute rows evenly across the amps .Based on the index column Teradata generates the hash value and based on that value data is sent to different amps. Data having same hash value will be sent to one amp (duplicate data).so depending on hash value data is evenly distributed across all the amps. when index is not selected properly data will not be distributed equally and hence leads to more data in one amp and less in one amp which is called skewness .