Create Environment with a default python version
//Crate spark context
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("test").getOrCreate()
//Read CSV File from a location including header as column names and schema
df = session.read.csv('archive/Case.csv', header=True,inferSchema = True)
//To show records
df.show()
//To Rename a column name
df = df.withColumnRenamed('Existing Column','new Column')
//Select Columns
df.select('col1','col2','col3'....).show()
//sort
df.sort(''col1','col2','col3'....).show()
//to specify asc or desc
from pySpark.sql import function as f
df.sort(f.desc( 'col1','col2','col3'....))show()
Cast
Though we don’t face it in this dataset, there might be scenarios where Pyspark reads a double as integer or string, In such cases, you can use the cast function to convert types.
No comments:
Post a Comment