Overview¶

This is the main documentation for DataStore's contained within the gora-core module which (as it's name implies) holds most of the core functionality for the gora project.

Every module in gora depends on gora-core therefore most of the generic documentation about the project is gathered here as well as the documentation for AvroStore, DataFileAvroStore and MemStore. In addition to this, gora-core holds all of the core MapReduce, GoraSparkEngine, Persistency, Query, DataStoreBase and Utility functionality.

AvroStore
DataFileAvroStore
MemStore
GoraSparkEngine
- Description

AvroStore¶

Description¶

AvroStore can be used for binary-compatible Avro serializations. It supports Binary and JSON serializations.

gora.properties¶

Property Key	Property Value	Required	Description
gora.datastore.default=	org.apache.gora.avro.store.AvroStore	Yes	Implementation of the persistent Java storage class
gora.avrostore.input.path=	hdfs://uri/path/to/hdfs/input/path \|\| file:///uri/path/to/local/input/path	Yes	This value should point to the input directory on hdfs (if running Gora in a distributed Hadoop environment) or to some location input directory on the local file system (if running Gora locally).
gora.avrostore.output.path=	hdfs://uri/path/to/hdfs/output/path \|\| file:///uri/path/to/local/output/path	Yes	This value should point to the output directory on hdfs (if running Gora in a distributed Hadoop environment) or to some location output location on the local file system (if running Gora locally).
gora.avrostore.codec.type=	BINARY \|\| JSON	No	The property key specifying avro encoder/decoder type to use. Can take values `BINARY` or `JSON` but resolves to BINARY is one is not supplied.

AvroStore XML mappings¶

In the stores covered within the gora-core module, no physical mappings are required.

DataFileAvroStore¶

Description¶

DataFileAvroStore is file based store which extends <codeAvroStore to use Avro's DataFile{Writer,Reader}'s as a backend. This datastore supports MapReduce.

gora.properties¶

DataFileAvroStore would be configured exactly the same as in AvroStore above with the following exception

Property Key	Property Value	Required	Description
gora.datastore.default=	org.apache.gora.avro.store.DataFileAvroStore	Yes	Implementation of the persistent Java storage class

Gora Core mappings¶

In the stores covered within the gora-core module, no physical mappings are required.

MemStore¶

Description¶

Essentially this store is a ConcurrentSkipListMap in which operations run as follows

put(K key, T Object) - expect average log(n)
get(K key, String [] fields) - expect average log(n)
delete(K key) - expect average log(n)

gora.properties¶

MemStore would be configured exactly the same as in AvroStore above with the following exception

Property Key	Property Value	Required	Description
gora.datastore.default=	org.apache.gora.memory.store.MemStore	Yes	Implementation of the Java class used to hold data in memory

MemStore XML mappings¶

In the stores covered within the gora-core module, no physical mappings are required.

GoraSparkEngine¶

Description¶

GoraSparkEngine is Spark backend of Gora. Assume that input and output data stores are:

DataStore<K1, V1> inStore;
DataStore<K2, V2> outStore;

First step of using GoraSparkEngine is to initialize it:

GoraSparkEngine<K1, V1> goraSparkEngine = new GoraSparkEngine<>(K1.class, V1.class);

Construct a JavaSparkContext. Register input data store’s value class as Kryo class:

SparkConf sparkConf = new SparkConf().setAppName("Gora Spark Integration Application").setMaster("local");
Class[] c = new Class[1];
c[0] = inStore.getPersistentClass();
sparkConf.registerKryoClasses(c);
JavaSparkContext sc = new JavaSparkContext(sparkConf);

JavaPairRDD can be retrieved from input data store:

JavaPairRDD<Long, Pageview> goraRDD = goraSparkEngine.initialize(sc, inStore);

After that, all Spark functionality can be applied. For example running count can be done as follows:

long count = goraRDD.count();

Map and Reduce functions can be run on a JavaPairRDD as well. Assume that this is the variable after map/reduce is applied:

JavaPairRDD<String, MetricDatum> mapReducedGoraRdd;

Result can be written as follows:

Configuration sparkHadoopConf = goraSparkEngine.generateOutputConf(outStore);
mapReducedGoraRdd.saveAsNewAPIHadoopDataset(sparkHadoopConf);