Anant example-cassandra-alpakka-twitter

License: No License Provided

Language: Scala

Alpakka Cassandra and Twitter

This project is a Scala application which uses Alpakka Cassandra 2.0, Akka Streams and Twitter4S (Scala Twitter Client) to pull new Tweets from Twitter for a given hashtag (or set of hashtags) using Twitter API v1.1 and write them into a local Cassandra database.

NOTE: The project will only save tweets which are not a retweet of another tweet and currently only saves the truncated version of tweets (<=140 chars).

Img


Requirements

  • Scala 2.12+
  • JDK 8
  • sbt (this project uses 1.4.9)
  • Docker (and required RAM for running a Cassandra container)

Table of Contents

  1. Setup and run local Cassandra using Docker
  2. Configure Twitter API keys
  3. Setup hashtags and run the project using SBT
  4. Observe results in Cassandra using cqlsh

1. Cassandra Setup

1.1 - Make sure you have docker installed on your machine. Run the following docker command to pull up a local Cassandra container with port 9042 exposed:

docker run -p 9042:9042 --rm --name my-cassandra -d cassandra

1.2 - Make sure your container is running (may need to give the container a few minutes to boot up):

docker ps -a

Screenshot
The above output shows that the container has been running for 3 minutes, and also shows that port 9042 locally is bound to port 9042 in the container. (default port for Cassandra)

1.3 - Afterwards, run CQLSH on the container in interactive terminal mode to setup keyspace and tables:

docker exec -it my-cassandra cqlsh

1.4 - Once CQLSH comes up, create the necessary keyspace and table for this demo.

CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

CREATE table testkeyspace.testtable(id bigint PRIMARY KEY, excerpt text);  

INSERT INTO testkeyspace.testtable(id, excerpt)
VALUES (37, 'appletest');

exit

2. Twitter Setup

2.1 - From the root folder of this repository, browse to the application.conf.example file found in /src/main/resources/application.conf.example. Copy this file into this same directory and rename it application.conf

mv /src/main/resources/application.conf.example /src/main/resources/application.conf

2.2 - Go to the twitter developer dashboard website, register an application and insert these four twitter api keys into this portion of application.conf:

twitter {
  consumer {
    key = "consumer-key-here"
    secret = "consumer-secret-here"
  }
  access {
    key = "access-key-here"
    secret = "access-token-here"
  }
}

3. Running The Project

3.1 - Navigate to /src/main/scala/com/alptwitter/AlpakkaTwitter.scala and change the following line to indicate what hashtags you wish to look at new tweets for val trackedWords = Seq("#myHashtag"):

vim /workspace/example-cassandra-alpakka-twitter/src/main/scala/com/alptwitter/AlpakkaTwitter.scala

If you want to track more than one hashtag, add more by adding more strings and separating with commas.

3.2 - The project can then be run by navigating to the root folder of the project and running:

sbt run

As new tweets are posted which contain any of the hashtags listed in the trackedWords variable, a message will print in the console which says whether the tweet was a retweet or a unique tweet.


4. Observe Tables

4.1 - As new tweets (not retweets of tweets) with your entered hashtags are posted and found, they will be saved to Cassandra as a (tweet id, text of tweet) entry in testkeyspace.testtable. To check that the tweets are being saved to Cassandra, run CQLSH on the cassandra container and observe the table:

docker exec -it my-cassandra cqlsh
SELECT * FROM testkeyspace.testtable; 

Twitter4S (Twitter for Scala) Github Repository

Twitter4S definition of Tweet object

Alpakka Cassandra Documentation

Related Tools

Docker Images

License : Other

Language : Shell

73

30

69

Playbook

License : GNU General Public License v3.0

Language : JavaScript

1

N/A

N/A

Casskop

License : Apache License 2.0

Language : Go

Cassandra Jdbc Driver

License : Apache License 2.0

Language : Java

Need Cassandra Training?

WE GOT YOU COVERED.

Anant US provides online training for Apache Cassandra that covers all the important skills you need to know in order to work with this high performance, open source NoSQL database.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company.