Anant example-cassandra-spark-job-scala
License: No License Provided
Language: Scala
cd
into itgit clone https://github.com/Anant/example-cassandra-spark-job-scala.git
cd example-cassandra-spark-job-scala
sbt
assembly
in sbt serverassembly
./sbin/start-master.sh
Navigate to localhost:8080
and copy the master URL.
./sbin/start-slave.sh <master-url>
docker run --name cassandra -p 9042:9042 -d cassandra:latest
docker exec -it cassandra CQLSH
demo
keyspaceCREATE KEYSPACE demo WITH REPLICATION={'class': 'SimpleStrategy', 'replication_factor': 1};
In this job, we will look at a CSV with 100,000 records and load it into a dataframe. Once read, we will display the first 20 rows.
./bin/spark-submit --class sparkCassandra.Read \
--master <master-url> \
--files /path/to/example-cassandra-spark-job-scala/previous_employees_by_title.csv \
/path/to/example-cassandra-spark-job-scala/target/scala-2.12/example-cassandra-spark-job-scala-assembly-0.1.0-SNAPSHOT.jar
In this job, we will do the same read; however, we will now take the first_day
and last_day
columns and calculate the absolute value difference in days worked. Again, then display the top 20 rows.
./bin/spark-submit --class sparkCassandra.Manipulate \
--master <master-url> \
--files /path/to/example-cassandra-spark-job-scala/previous_employees_by_title.csv \
/path/to/example-cassandra-spark-job-scala/target/scala-2.12/example-cassandra-spark-job-scala-assembly-0.1.0-SNAPSHOT.jar
In this job, we will do the same thing we did in the manipulate job; however, we will now write the outputted dataframe to Cassandra instead of just displaying it to the console.
./bin/spark-submit --class sparkCassandra.Write \
--master <master-url> \
--conf spark.cassandra.connection.host=127.0.0.1 \
--conf spark.cassandra.connection.port=9042 \
--conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions \
--files /path/to/example-cassandra-spark-job-scala/previous_employees_by_title.csv \
/path/to/example-cassandra-spark-job-scala/target/scala-2.12/example-cassandra-spark-job-scala-assembly-0.1.0-SNAPSHOT.jar
In this job, we will write the CSV data into one Cassandra table and then pick it up using SparkSQL and transform it at the same time. We will then write the newly transformed data into a new Cassandra table.
./bin/spark-submit --class sparkCassandra.ETL \
--master <master-url> \
--conf spark.cassandra.connection.host=127.0.0.1 \
--conf spark.cassandra.connection.port=9042 \
--conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions \
--files /path/to/example-cassandra-spark-job-scala/previous_employees_by_title.csv \
/path/to/example-cassandra-spark-job-scala/target/scala-2.12/example-cassandra-spark-job-scala-assembly-0.1.0-SNAPSHOT.jar
Join Our Newsletter!
Sign up below to receive email updates and see what's going on with our company.