Anant example-cassandra-cql-copy

License: No License Provided

Language: No Language Provided

cql-copy

We cover how we can use CQL Copy for data operations. A step-by-step demo is shown below to walkthrough how you can do data operations such as exporting a Cassandra table as a CSV and then importing data from a CSV into a Cassandra table.

Prerequisites

Demo

  1. We will begin by downloading / cloning this GitHub repository. Once extracted / cloned, we will need to cd into it.
cd cql-copy
  1. Now that we are in the cql-copy directory, we can start our Apache Cassandra docker container. We will also be mounting our cql-copy directory to the docker container.
docker run -d -it --name cassandra -v "$(pwd)":/cql-copy cassandra:latest
  1. We can confirm that the directory was added by opening a new terminal tab / terminal and running the following:
docker exec -it cassandra bash
ls
  1. Now we can open another terminal tab / terminal and run the following to start a CQLSH terminal in our docker container:
docker exec -it cassandra cqlsh
  1. Once loaded, we can use the cql file that is in our cql-copy directory when we mounted it.
source '/cql-copy/spacecraft_journey_catalog.cql'

In this cql file, we create 2 tables: spacecraft_journey_catalog and duration_by_journey_summary. We also insert 1000 records into spacecraft_journey_catalog. As a sneak-peek of the scenario, we will then export this table as CSV, do some “transformations” on it, and load the “transformed” data CSV into the duration_by_journey_summary table.

  1. To confirm that our data is in our Cassandra instance, we can run the following code block. We should get back 1000 rows.
use demo ;
describe tables ;
select count(*) from spacecraft_journey_catalog ;
  1. As mentioned above, we can now export this data as CSV using CQL Copy. We will be using COPY TO and only export 4 of the columns. We want to take the summary and journey_id of each journey and calculate the duration in days of each journey versus just having the end time and start time. We can then load that new data into duration_by_journey_summary, which is partitioned by the summary with a clustering column of journey_id. The use case of this kind of table could be for BI to analyze discrepancies between duration lengths of journies that had the same purpose / summary. We will point the CSV to be created at the root level of the docker container.
copy spacecraft_journey_catalog (summary, journey_id, end, start) to'../spacecraft_journey_catalog.csv' with header = true ;
  1. We can then confirm that the CSV was created by running the following in the terminal / terminal tab that is running Docker bash:
ls
  1. Then, we will need to export the CSV to do some “transformations” on it. We will need to run the following in the first terminal or a new terminal tab to get our Docker container ID:
docker ps
  1. Then we can run the following command with your ID substituted for the placeholder id:
docker cp container_ID:/spacecraft_journey_catalog.csv .

This will export the CSV to our local cql-copy directory. You can open your file manager to visualize this.

That covers how to export your Cassandra data using CQL COPY TO. Now we will cover how to import data using CQL COPY FROM.

As mentioned before, we will “transform” the CSV that we exported. However, to speed up the demo, I have already updated and calculated the duration in days for you. The “transformed” data is in the CSV file called duration_by_journey_summary.csv. You can see this in the cql-copy directory. Technically, we have already imported the file when we mounted it, but we will show you how to import it from the local directory into the root level of the docker container.

  1. Make sure you are in the cql-copy directory on your local machine. Then run the following command using the Docker container ID that we got before:
docker cp duration_by_journey_summary.csv container_ID:/
  1. To confirm that the file was added to the root level of the Docker container, run the following in the terminal / terminal tab running Docker bash:
ls
  1. Once you have confirmed it, we can now move onto importing the data from the CSV into our duration_by_journey_summary table. Use the following command to do so:
copy duration_by_journey_summary (summary, journey_id, duration_in_days) from '../duration_by_journey_summary.csv' WITH HEADER = TRUE ;
  1. We can now move the terminal / terminal tab running CQLSH and confirm that the data was transferred. We should get back 1000 rows.
select count(*) from duration_by_journey_summary ;

And with that we have done CQL Copy for data operations by exporting and importing CSV data from and to our Cassandra tables.

Resources

Related Tools

Java Driver

License : Apache License 2.0

Language : Java

1302

20

829

Fake

License : fake

Language : fake

Cassandra

License : Other

Language : JavaScript

Example Cassandra Alpakka Twitter

License : No License

Language : Scala

Need Cassandra Training?

WE GOT YOU COVERED.

Anant US provides online training for Apache Cassandra that covers all the important skills you need to know in order to work with this high performance, open source NoSQL database.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company.