How to upgrade Linux Kernel

Here are the steps to upgrade any Linux system (Fedora, Centos, etc) to a newer version of Linux:

  1.  # wget
  2.  # tar xvfJ linux-4.4.2.tar.xz
  3. # cd linux-4.4.2
  4. # mkdir -p /home/name/build/kernel
  5. # make O=/home/name/build/kernel defconfig
  6. # vi /home/name/build/kernel/.config
  7.   CONFIG_R8169=y   (for your network card)
  8.   CONFIG_XFS_FS=y
  9.   CONFIG_EFI=y
  12.   # make O=/home/name/build/kernel
  13.  # make O=/home/name/build/kernel modules_install install


Reboot and you will have a new version of  Linux.



How to upgrade g++ compiler

  1. Download the latest source package from a mirror site
  2. tar zxf gcc-4.9.3.tar.gz
  3. cd  gcc-4.9.3
  4. ./contrib/download_prerequisites
  5. cd ..
  6. mkdir -p objdir
  7. cd objdir
  8. ../gcc-4.9.3/configure –prefix=$HOME/install/gcc-4.9.3 –enable-languages=c,c++,go –disable-multilib
  9. make
  10. make install

You are ready to use the $HOME/install/gcc-4.9.3/bin/g++   compiler.

Jaguar and Cassandra

This blog compares Jaguar and Cassandra (NoSQL distributed database).

1.  Install and setup Cassandra on multiple nodes

Install the package on each host in your Cassandra cluster:

a) Visit  and download  apache-cassandra-3.0.0-bin.tar.gz

b) tar zxf  apache-cassandra-3.0.0-bin.tar.gz

c) cd  apache-cassandra-3.0.0/conf

d) vi   cassandra.yaml


(This is the IP address of the host you are on).


– seeds: “”

(seeds are the host IP addresses of seed hosts)

You can specify one seed host or multiple seed hosts. If you want to use multiple seed hosts, then specify them as:

– seeds: “,”

e) vi

if [ “x$LOCAL_JMX” = “x” ]; then


After the configuration has been completed on all hosts in the cluster, you can login to the seed host(s), and start the cassandra server:


Then start the cassandra server on all non-seed hosts with the same command.

2.  Execute Cassandra commands

From any host in the Cassandra cluster, run this command:

$ bin/cqlsh
cqlsqh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> use mykeyspace;
cqlsh:mykeyspace> CREATE TABLE users ( user_id int PRIMARY KEY, fname text,lname text );
cqlsh:mykeyspace> INSERT INTO users (user_id,  fname, lname) VALUES (1745, 'john', 'smith');
cqlsh:mykeyspace> INSERT INTO users (user_id, fname, lname)  VALUES (1744, 'john', 'doe'); 
cqlsh:mykeyspace> INSERT INTO users (user_id, fname, lname)  VALUES (1746, 'john', 'smith');
cqlsh:mykeyspace> select * from users where user_id > 100 and user_id < 200;
InvalidRequest: code=2200 [Invalid query] message="Only EQ and IN relation are supported on the partition key (unless you use the token() function)"
cqlsh:mykeyspace> help select

 SELECT <selectExpr>
 FROM [<keyspace>.]<table>
 [WHERE <clause>]
 [ORDER BY <colname> [DESC]]
 [LIMIT m];

 SELECT is used to read one or more records from a CQL table. 

So Cassandra does not support range query and multi-table join.

   3. Jaguar Supports Range Query and Table Joins

jaguar> select * from users where user_id > 10000; 
jaguar> select * join ( TABLE, TABLE1, TABLE2, ...) [WHERE];
jaguar> select * starjoin (tab1, tab2, tab3 ) [WHERE];
jaguar> select * indexjoin ( index(idex), tab1, tab2 ) [WHERE];

Jaguar Integration with SparkR

Once you have R and SparkR packages installed, you can start the SparkR program by executing the following command:



export JAVA_HOME=/usr/lib/java/jdk1.7.0_75

sparkR \
–driver-class-path $JDBCJAR \
–driver-library-path $LDLIBPATH \
–conf spark.executor.extraClassPath=$JDBCJAR \
–conf spark.executor.extraLibraryPath=$LDLIBPATH


Then in the SparkR command line prompt, you can execute the following R commands:



sc <- sparkR.init(master=”spark://mymaster:7077″, appName=”MyTest”)

sqlContext <- sparkRSQL.init(sc )

drv <- JDBC(“”, “/home/exeray/jaguar/lib/jaguar-jdbc-2.0.jar”, “`”)

conn <- dbConnect(drv, “jdbc:jaguar://localhost:8888/test”, “test”, “test” )


df <- dbGetQuery(conn, “select * from int10k where uid > ‘anxnfkjj2329’ limit 5000;”)

head( df )

> cor(df$uid,df$score)
[1] 0.05107418

#build the simple linear regression
> model<-lm(uid~score,data=df)
> model

lm(formula = uid ~ score, data = df)

(Intercept) score
2.115e+07 1.025e-03

#get the names of all of the attributes
> attributes(model)
[1] “coefficients” “residuals” “effects” “rank”
[5] “fitted.values” “assign” “qr” “df.residual”
[9] “xlevels” “call” “terms” “model”

[1] “lm”



Jaguar’s successful integration with Spark and SparkR  allows wide range of data analytics  over the underlying fast Jaguar data engine.


Jaguar Supports R

R is a powerful language and environment for statistical computing and graphics.  Jaguar’s JDBC API can integrate with R for extensive data modelling and analysis.  To use R with Jaguar, the RJDBC library needs to be installed first:



           $ sudo apt-get install r-cran-rjava

$ sudo R

> install.packages(“RJDBC”, dep=true)

> q()


$ unset JAVA_HOME

$ R

> library(RJDBC)

> drv <- JDBC(“”, “/pathtomy/jaguar-jdbc-2.0.jar”, “`”)

> conn <- dbConnect(drv, “jdbc:jaguar://localhost:8888/test”, “test”, “test” )

> dbListTables(conn)

> dbGetQuery(conn, “select count(*) from mytable;”)

> d <- dbReadTable(conn, “mytable;”)

> q()

Jaguar Supports Spark

Since now Jaguar provides JDBC connectivity, developers can use Apache Spark to load data from Jaguar and perform data analytics and machine learning. The advantages provided by Jaguar is that Spark can load data faster, especially for loading data satisfying complex conditions, from Jaguar than from other data sources. The following code is based on two tables that have the following structure:

create table int10k ( key: uid int(16), score float(16.3), value: city char(32) );
create table int10k_2 ( key: uid int(16), score float(16.3), value: city char(32) );

Scala program:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import scala.collection._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.log4j.Logger
import org.apache.log4j.Level

object TestScalaJDBC {
def main(args: Array[String]) {

def sparkfunc()
val sparkConf = new SparkConf().setAppName(“TestScalaJDBC”)
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._


val people =“jdbc”)
Map( “url” -> “jdbc:jaguar://”,
“dbtable” -> “int10k”,
“user” -> “test”,
“password” -> “test”,
“partitionColumn” -> “uid”,
“lowerBound” -> “2”,
“upperBound” -> “2000000”,
“numPartitions” -> “4”,
“driver” -> “”

// work fine

val people2 =“jdbc”)
Map( “url” -> “jdbc:jaguar://”,
“dbtable” -> “int10k_2”,
“user” -> “test”,
“password” -> “test”,
“partitionColumn” -> “uid”,
“lowerBound” -> “2”,
“upperBound” -> “2000000”,
“numPartitions” -> “4”,
“driver” -> “”

// sort by columns

people.sort($”score”.desc, $”uid”.asc).show()
people.orderBy($”score”.desc, $”uid”.asc).show()

// select by expression
people.selectExpr(“score”, “uid” ).show()
people.selectExpr(“score”, “uid as keyone” ).show()
people.selectExpr(“score”, “uid as keyone”, “abs(score)” ).show()

// select a few columns
val uid2 =“uid”, “score”);

// filter rows
val below60 = people.filter(people(“uid”) > 20990397 ).show()

// group by

// groupby and average

“score” -> “avg”,
“uid” -> “max”

// rollup
“uid” -> “avg”,
“score” -> “max”

// cube
“uid” -> “avg”,
“score” -> “max”

// describe statistics
people.describe( “uid”, “score”).show()

// find frequent items
people.stat.freqItems( Seq(“uid”) ).show()

// join two tables
people.join( people2, “uid” ).show()
people.join( people2, “score” ).show()
people.join(people2).where ( people(“uid”) === people2(“uid”) ).show()
people.join(people2).where ( people(“city”) === people2(“city”) ).show()
people.join(people2).where ( people(“uid”) === people2(“uid”) and people(“city”) === people2(“city”) ).show()
people.join(people2).where ( people(“uid”) === people2(“uid”) && people(“city”) === people2(“city”) ).show()
people.join(people2).where ( people(“uid”) === people2(“uid”) && people(“city”) === people2(“city”) ) .limit(3).show()

// union

// intersection

// exception

// Take samples
people.sample( true, 0.1, 100 ).show()

// distinct

// same as distinct

// cache and persist

// SQL dataframe
val df = sqlContext.sql(“SELECT * FROM int10k where uid < 200000000 and city between ‘Alameda’ and ‘Berkeley’ “)

The class generated from the above Scala program can be submitted to Spark as follows:

/bin/spark-submit –class TestScalaJDBC \
–master spark://masterhost:7077 \
–driver-class-path /path/to/your/jaguar-jdbc-2.0.jar \
–driver-library-path $HOME/jaguar/lib \
–conf spark.executor.extraClassPath=/path/to/your/jaguar-jdbc-2.0.jar \
–conf spark.executor.extraLibraryPath=$HOME/jaguar/lib \


A very useful tool in cluster environment

Distributed Shell (dsh) is a very powerful tool for system administrators in a cluster environment. Here are some tips for installing and using it:

sudo apt-get install dsh


sudo yum install dsh

In /etc/dsh/dsh.conf change remoteshell:

remoteshell =ssh

Here is how to make your public key if you do not have one yet (~/.ssh/

$ ssh-keygen -t rsa -P “”
(no passphrase , ~/.ssh/ will be created )
$ ssh-copy-id -i ~/.ssh/   ALL_OTHER_HOSTS

Then in /etc/dsh/machines.list  put all your hosts:


Finally you can issue commands to ALL the hosts in your cluster:

$ dsh –aM –c YOUR_COMMAND

For example:
$ dsh –aM –c uptime