Thursday, January 6, 2022

 Create Environment with a default python version


//Crate spark context

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("test").getOrCreate()

//Read CSV File from a location including header as column names and schema

df = session.read.csv('archive/Case.csv', header=True,inferSchema = True)

//To show records

df.show()

//To Rename a column name

df = df.withColumnRenamed('Existing Column','new Column') 

//Select Columns

df.select('col1','col2','col3'....).show()

//sort

df.sort(''col1','col2','col3'....).show()

 //to specify asc or desc

from pySpark.sql import function as f

df.sort(f.desc( 'col1','col2','col3'....))show()

Cast

Though we don’t face it in this dataset, there might be scenarios where Pyspark reads a double as integer or string, In such cases, you can use the cast function to convert types.

 

 

Friday, March 16, 2018

sudo apt install curl

install Anaconda Python using curl

curl -O https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh


Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

sudo rm /var/lib/dpkg/lock
sudo dpkg --configure -a

What is the TRL library

  ⚡ What is the TRL library trl stands for Transformers Reinforcement Learning . It is an open-source library by Hugging Face that lets ...