Thursday, January 6, 2022

 Create Environment with a default python version


//Crate spark context

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("test").getOrCreate()

//Read CSV File from a location including header as column names and schema

df = session.read.csv('archive/Case.csv', header=True,inferSchema = True)

//To show records

df.show()

//To Rename a column name

df = df.withColumnRenamed('Existing Column','new Column') 

//Select Columns

df.select('col1','col2','col3'....).show()

//sort

df.sort(''col1','col2','col3'....).show()

 //to specify asc or desc

from pySpark.sql import function as f

df.sort(f.desc( 'col1','col2','col3'....))show()

Cast

Though we don’t face it in this dataset, there might be scenarios where Pyspark reads a double as integer or string, In such cases, you can use the cast function to convert types.