Contents
Configuring Cassandra
Cassandra can be configured prior to installation by editing the files located in %CASSANDRA_HOME%\conf. Within the conf directory, are cassandra.yaml, logback.xml, and other files which may be edited to tune Cassandra's performance, to customize the Cassandra cluster settings or even change logging settings.
Basic Configuration
Prior to creating a Cassandra cluster, it is important to first modify a few core settings in cassandra.yaml:
cluster_name:
- Name of the Cassandra cluster. Must be identical for all nodes in cluster.
num_tokens
Leave the default value – unless a cluster is being migrated from a version 1.1.x cluster and the data needs to be maintained. Refer to the reference in the yaml for more information.
initial_token:
Leave the default value.
data_file_directories:
commitlog_directory:
saved_caches_directory:
Ensure that the above are all pointing to valid directories.
seeds: (default: "127.0.0.1")
Specifies a comma-delimited list of IP addresses. New nodes will contact the seed nodes to determine the ring topology and to obtain gossip information about other nodes in the cluster. Every node should have the same list of seeds.
start_native_transport: false (default is true)
start_rpc: true (default is false)
listen_address: (default: localhost)
The IP address that other Cassandra nodes will use to connect to this node. If left blank, uses the hostname configuration of the node.
rpc_address: (default: localhost)
The listen address for remote procedure calls. To listen on all interfaces, set to 0.0.0.0. If left blank, uses the hostname configuration of the node.
rpc_port: (default: 9160)
The port for remote procedure calls and the Thrift service.
NOTE: For Orchestration the Thrift interface is required. Assure that the start_native_transport is set to false, and that the start_rpc is set to true.
storage_port: (default: 7000)
The port for inter-node communication.
endpoint_snitch: (default: SimpleSnitch)
This option determines how Cassandra views the cluster, SimpleSnitch for a single cluster and PropertyFileSnitch, or other snitch chosen in the yaml, for a multiple data center cluster.
Note that the PropertyFileSnitch requires the cassandra-topology.properties file to describe the multiple data center cluster, for example within that file the following will need to be provided:
# Cassandra Node IP=Data Center:Rack
135.225.58.81=DC1:RAC1
135.225.58.82=DC1:RAC2
135.225.58.83=DC2:RAC1
135.225.58.90=DC2:RAC2
Next, modify %CASSANDRA_HOME%\bin\cassandra.bat (if Windows), or %CASSANDRA_HOME%/conf/cassandra-env.sh (if Unix-based) to configure the JVM. It is important to verify that the JMX port does not conflict with other configured services:
-Dcassandra.jmx.local.port=7199 (in cassandra.bat) JMX_PORT="7199" (in cassandra-env.sh)
Note that remote access via the JMX port is not recommended due to the possibility of unintended access to that port which could disrupt Cassandra operation.
Storage Schema
Creation of the schema is performed by Orchestration on startup if not done so manually. Before starting Orchestration in this case, assure that the Cassandra cluster is started first, then start one Orchestration instance. The schema will be created and propagated to all Cassandra instances. Manual schema creation can be done with the Cassandra CLI, note that the cassandra-cli is not available in Cassandra versions after 2.1.x, see Useful Tools section for more details. Seen below is a schema example for Orchestration on Cassandra 2.x (note that it conforms to the cassandra-cli syntax, again see Useful Tools section). Note that the replication factor is set to 1 in this sample and is the only allowed value for a single node deployment.
For a multiple node cassandra cluster this should be increased to increase availability. Refer to the following web site to determine the replication factor required for your deployment, noting that Orchestration performs all operations at consistency level of ONE.
For more discussion on this topic, please refer to http://www.datastax.com/docs/1.1/dml/data_consistency for a discussion of consistency and the replication factor (RF).
Sample Orchestration Schema for Cassandra 2.2.x and Orchestration versions prior to 8.1.3
/*This file contains an example of the Orchestration keyspace, which should be tailored to the deployed cassandra instance capabilities.
This file should be copied to the cassandra install conf directory.
The schema can be loaded using the cassandra-cli command line interface from the cassandra root install directory as follows:
./bin/cassandra-cli -host ip-address-of-cassandra-host --file conf/orchestration-schema.txt
where ip-address-of-cassandra-host is the ip form of the host - ie. 135.225.58.81
note that the above assumes that the thrift port is the default of 9160.
The cassandra-cli includes online help that explains the statements below. You can access the help without connecting to a running
cassandra instance by starting the client and typing "help;"
NOTE: Please assure that the replication_factor is set correctly. Use Cassandra version 2.2.x or later.
*/
create keyspace Orchestration
with strategy_options={replication_factor:1}
and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
use Orchestration;
create column family Document
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'JSON form of the scxml document, keyed by md5 of document';
create column family Session
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'JSON form of the session, keyed by session GUUID';
create column family ScheduleByTimeInterval
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'Column names are the concatenation of scheduled ActionGUUID, action type, and idealtime in msecs,
column values are the action content. The keys are in form of time since the epoch in msecs divided by some time increment,
say 60000, for 1 minute intervals.';
create column family ScheduleBySessionID
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'Column names are the concatenation of scheduled ActionGUUID, action type, and idealtime in msecs,
column values are the idealtime in msecs, keyed by session id';
create column family SessionIDServerInfo
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'Session id and assigned node, keyed by session id';
create column family SessionIDServerInfoRIndex
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'Columns are session ids and the column values are also the session id, keyed by the string
form of the server node which owns the session.';
create column family RecoverSessionIDServerInfoRIndex
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'Columns are session ids and the column values are also the session id, keyed by the string form
of the server node which owns the session. Entries are only those sessions for which recovery is enabled.';
create column family ORS8130000
with comparator = UTF8Type
and column_type = Standard
and memtable_throughput = 128
and memtable_operations = 0.29
and read_repair_chance = 1.0
and max_compaction_threshold = 32
and min_compaction_threshold = 4
and gc_grace = 86400
and comment = 'Dummy column family to designate the Orchestration schema version.'; conf directory.
In the above examples, one may notice that during the creation of keyspaces and column families, it is possible to configure various attributes. These attributes are described in the table below:
Keyspace Attributes
|
Option |
Default |
Description |
|---|---|---|
|
name |
N/A (Required) |
Name for the keyspace. |
|
placement_strategy |
org.apache.casandra.locator.SimpleStrategy |
Determines how replicas will be distributed among nodes in a Cassandra cluster. Allowed values:
A simple strategy simply distributes replicas to the next N-1 nodes in the ring for a replication_factor of N. A network topology strategy requires the Cassandra cluster to be location-aware (able to determine location of rack/datacentre). In this case, the replication_factor is set on a per-datacentre basis. |
|
strategy_options |
N/A |
Specifies configuration options for the replication strategy. For SimpleStrategy, one must specify For NetworkTopologyStrategy, one must specify |
Column Family Attributes
|
Option |
Default |
Description |
|---|---|---|
|
comparator |
BytesType |
Defines data type to use when validating or sorting column names. The comparator cannot be changed once a column family has been created. |
|
column_type |
Standard |
Determines whether column family is a regular column family or a super column family.
Use |
|
read_repair_chance |
0.1 |
Specifies probability that read repairs should be invoked on non-quorum reads. Value must be between 0 and 1. Lower values improve read throughput but increases chances of stale values when not using a strong consistency level. |
|
min_compaction_threshold |
4 |
Sets the minimum number of SSTables to trigger a minor compaction when |
|
gc_grace_seconds |
864000 (10 days) |
Specifies the time to wait before garbage collecting tombstones (items marked for deletion). In a single node cluster, it can be safely set to zero. |
|
comment |
N/A |
A human readable comment describing the column family. |
|
column_metadata |
N/A |
Defines the attributes of a column. For each column, values for |
Performance Tuning
Besides configuring keyspaces and column families, it is possible to further tweak the performance of Cassandra by editing cassandra.yaml (Node and Cluster Configuration) or by editing cassandra-env.sh (JVM Configuration).
Descriptions of tunable properties can be found in both cassandra.yaml and cassandra-env.sh. A summary of these properties can be seen in the tables below:
Performance Tuning Properties (cassandra.yaml)
|
Option |
Default |
Description |
|---|---|---|
|
column_index_size_in_kb |
64 |
The size at which column indexes are added to a row. Value should be kept small if only a select few columns are consistently read from each row as a higher value implies that more row data must be deserialized for each read (until index is added). |
|
commitlog_sync |
periodic |
Allowed values are |
|
commitlog_sync_period_in_ms |
10000 (10 seconds) |
Determines how often (in milliseconds) to sync commitlog to disk when |
|
commitlog_total_space_in_mb |
4096 |
When commitlog reaches specified size, Cassandra flushes memtables to disk for oldest commitlog segments. Reduces amount of data to replay on startup. |
|
compaction_throughput_mb_per_sec |
16 |
Throttles compaction to the given total throughput across entire system. Value should be proportional to rate of write throughput (16 to 32 times). Setting to 0 disables compaction throttling. |
|
concurrent_compactors |
1 (per CPU core) |
Max number of concurrent compaction processes allowed on a node. |
|
concurrent_reads |
32 |
Recommended setting is 16 * number_of_drives. This allows enough operations to queue such that the OS and drives can reorder them and minimize disk fetches. |
|
concurrent_writes |
32 |
Number of concurrent writes should be proportional to number of CPU cores in system. Recommended setting is (8 * number_of_cpu_cores). |
|
in_memory_compaction_limit_in_mb |
64 |
Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass compaction process. Recommended value is 5 to 10 percent of available Java heap size. |
|
index_interval |
128 |
Influences granularity of SSTable indexes in memory. Smaller value indicates higher sampling of the index files, resulting in more effective indexes at the cost of memory. Recommended value is between 128 and 512 with a large column family key cache; larger value for small rows, or smaller value to increase read performance. |
|
memtable_flush_writers |
1 per data directory |
Number of memtable flush writer threads. Influences flush performance and can be increased if you have a large Java heap size and many data directories. |
|
memtable_total_space_in_mb |
1/3 of heap |
Total memory used for all column family memtables on a node. |
|
multithreaded_compaction |
false |
When true, each compaction operation uses one thread per SSTable being merged in addition to one thread per core. Typically only useful on nodes with SSD hardware. |
|
reduce_cache_capacity_to |
0.6 |
Sets target max cache capacity when Java heap usage reaches threshold defined by |
|
reduce_cache_sizes_at |
0.85 |
When Java heap usage exceeds this percentage (after CMS garbage collection), Cassandra reduces the cache capacity as specified by reduce_cache_capacity_to. Set to 1.0 to disable. |
|
sliced_buffer_size_in_kb |
64 |
Buffer size to use for reading contiguous columns. Should match size of columns typically retrieved using query operations involving a slice predicate. |
|
stream_throughput_outbound_megabits_per_sec |
400 |
Max outbound throughput on a node for streaming file transfers. |
JVM configuration settings Linux: conf/cassandra-env.sh Windows: bin\cassandra.bat
|
Option |
Default |
Description |
|---|---|---|
|
MAX_HEAP_SIZE |
Half of available physical memory |
Maximum heap size for the JVM. Same value is used for minimum heap size, allowing heap to be locked in memory. Should be set in conjunction with HEAP_NEWSIZE. |
|
HEAP_NEWSIZE |
100 MB per physical CPU core |
Size of young generation. Larger value leads to longer GC pause times while smaller value will typically lead to more expensive GC. Set in conjunction with MAX_HEAP_SIZE. |
|
com.sun.management.jmxremote.port |
7199 |
Port on which Cassandra listens for JMX connections. |
|
com.sun.management.jmxremote.ssl |
false |
Enable/disable SSL for JMX. |
|
com.sun.management.jmxremote.authenticate |
false |
Enable/disable remote authentication for JMX. |
|
-Djava.rmi.server.hostname |
N/A |
Sets the interface hostname or IP that JMX should use to connect. Set if you have trouble connecting. |
Logging
Changes to logging are made through the log4j-server.properties and log4j-tools.properties files.
Within these files, it is possible to change the default logging level (log4j.rootLogger), the logging handlers, log message templates (ConversionPattern), as well as the default log file path (log4j.appender.R.File).
Example:
# output messages into a rolling log file as well as stdout
log4j.rootLogger=DEBUG,stdout,R
# stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p %d{HH:mm:ss,SSS} %m%n
# rolling log file
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.maxFileSize=20MB
log4j.appender.R.maxBackupIndex=50
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n
# Edit the next line to point to your logs directory
log4j.appender.R.File=C:\Cassandra\logs\system.log
# Application logging options
#log4j.logger.org.apache.cassandra=DEBUG
#log4j.logger.org.apache.cassandra.db=DEBUG
#log4j.logger.org.apache.cassandra.service.StorageProxy=DEBUG
# Adding this to avoid thrift logging disconnect errors.
log4j.logger.org.apache.thrift.server.TNonblockingServer=ERROR
