Thursday, January 6, 2022

 Create Environment with a default python version


//Crate spark context

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("test").getOrCreate()

//Read CSV File from a location including header as column names and schema

df = session.read.csv('archive/Case.csv', header=True,inferSchema = True)

//To show records

df.show()

//To Rename a column name

df = df.withColumnRenamed('Existing Column','new Column') 

//Select Columns

df.select('col1','col2','col3'....).show()

//sort

df.sort(''col1','col2','col3'....).show()

 //to specify asc or desc

from pySpark.sql import function as f

df.sort(f.desc( 'col1','col2','col3'....))show()

Cast

Though we don’t face it in this dataset, there might be scenarios where Pyspark reads a double as integer or string, In such cases, you can use the cast function to convert types.

 

 

Friday, March 16, 2018

sudo apt install curl

install Anaconda Python using curl

curl -O https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh


Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

sudo rm /var/lib/dpkg/lock
sudo dpkg --configure -a

Time Intelligence Functions in Power BI: A Comprehensive Guide

Time intelligence is one of the most powerful features of Power BI, enabling users to analyze data over time periods and extract meaningful ...