- Summary
- Gimel Standalone
- Overview
- Install Docker
- Download the Gimel Jar
- Run Gimel Quickstart Script
- Common Imports and Initializations
Summary
- Install Docker
- Note: At any point if there is any failure , power down gimel
quickstart/stop-gimel down
- Clone Gimel Repo
- Download Gimel Jar
- Run the bootstrap module
- Once spark-session is ready : play with Gimel Data API / GSQL
Gimel Standalone
Overview
The Gimel Standalone feature will provide capability for developers / users alike to
- Try Gimel in local/laptop without requiring all the ecosystems on a hadoop cluster.
- Standalone would comprise of docker containers spawned for each storage type that the user would like to explore. Storage type examples : kafka , elasticsearch.
- Standalone would bootstrap these containers (storage types) with sample flights data.
- Once containers are spawned & data is bootstrapped, the use can then refer the connector docs & try the Gimel Data API / Gimel SQL on the local laptop.
- Also in the future : the standalone feature would be useful to automate regression tests & run standalone spark JVMs for container based solutions.
Install Docker
- Install docker on your machine
- MAC - Docker Installation
- Start Docker Service
- Increase the memory by navigating to Preferences > Advanced > Memory
- (Optional) Clear existing containers and images
- Check for existing Docker containers running -
docker ps -aq
- Kill existing Docker containers (if any) -
docker kill $(docker ps -aq)
- Remove existing Docker containers(if any) -
docker rm $(docker ps -aq)
- Check for existing Docker containers running -
Download the Gimel Jar
cd gimel
- Navigate to the folder gimel-dataapi/gimel-standalone/ -
cd gimel-dataapi/gimel-standalone/
- Create lib folder in gimel-standalone -
mkdir lib
- Copy the downloaded jar in lib
Run Gimel Quickstart Script
- Navigate back to GIMEL_HOME
cd $GIMEL_HOME
- To install all the dockers and bootstrap storages, please execute the following command
quickstart/start-gimel {STORAGE_SYSTEM}
- STORAGE_SYSTEM can be either
all
or comma seperated list like as follows
quickstart/start-gimel kafka,elasticsearch,hbase-master,hbase-regionserver
Note: This script will do the following * Start docker containers for each storage * Bootstrap the physical storages (Create Kafka Topic and HBase tables)
- To start the spark shell run the following command
docker exec -it spark-master bash -c \
"export USER=an;export SPARK_HOME=/spark/;export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin; \
/spark/bin/spark-shell --jars /root/gimel-sql-2.0.0-SNAPSHOT-uber.jar"
Note: You can view the Spark UI here
Common Imports and Initializations
import org.apache.spark.sql.{DataFrame, SQLContext};
import org.apache.spark.sql.hive.HiveContext;
import com.paypal.gimel.sql.GimelQueryProcessor
val gsql = GimelQueryProcessor.executeBatch(_:String,spark)