Revision as of 22:38, September 23, 2016 by Bonniem (talk | contribs) (Step 2.1: Edit cassandra.yaml)
Jump to: navigation, search

Installation

Note: These instructions may vary for 64-bit versions. The following examples use 32-bit versions of Java and prunsrv.

Step 1: Downloading and Setting Environmental Variables

Extract the contents of the Cassandra (and Commons Daemon) archive(s) on each node. On Windows, use WinZip "Extract here..." or equivalent. On Linux, use gunzip or tar -xvf. If following the example below, the directories should be placed as such:

 Windows: C:\Cassandra\apache-cassandra-2.2.x
 Linux: /cassandra/apache-cassandra-2.2.x

The contents of the Commons Daemon archive are placed in the above installation directories in the bin folder, in a subdirectory named daemon, as below:

 Windows: C:\Cassandra\apache-cassandra-2.2.x\bin\daemon
 Linux: /cassandra/apache-cassandra-2.2.x/bin/daemon

Set the JAVA_HOME environment variable to the Java JRE/JDK root, for example:

 set JAVA_HOME= C:\Program Files\Java\jdk1.8.0_73
 or
 export JAVA_HOME=/usr/java/jdk1.x.x_x

For Linux-like installs, edit JAVA_HOME in %CASSANDRA_HOME%\bin\cassandra-in.sh.

Step 2: Edit configuration files

The Cassandra distribution comes with a number of configuration files that should be edited (located in %CASSANDRA_HOME%\conf directory).

Step 2.0

The Cassandra distribution comes with a number of configuration files that should be edited (located in %CASSANDRA_HOME%\conf directory).

Step 2.1: Edit cassandra.yaml

The included cassandra.yaml contains default configurations for the Cassandra cluster.

When Cassandra versions 2.2.5 virtual nodes have been implemented, the initial_token should be left as is, with the exception of nodes that are being migrated from older 1.x.x versions. If this is the case, refer to the documentation specified in the yaml.

Ensure that the following options are pointing to the desired paths. Cassandra will create the directories on startup. Paths specified below are examples:

data_file_directories:
     - C:\Cassandra\apache-cassandra-2.2.x\data
 commitlog_directory: C:\Cassandra\apache-cassandra-2.2.x\commitlog
 saved_caches_directory: C:\Cassandra\apache-cassandra-2.2.x\saved_caches

Also in cassandra.yaml, configure the cluster_name, key_cache_size_in_mb, counter_cache_size_in_mb, seeds, listen_address, start_rpc, and rpc_address.

Follow the instructions in the yam regarding the settings for the following items, which all relate to memory allocation and number of processors that the installation platform has available.

concurrent_reads
concurrent_writes
concurrent_counter_writes
file_cache_size_in_mb
memtable_heap_space_in_mb
memtable_offheap_space_in_mb
commitlog_total_space_in_mb


  • The cluster_name must be identical for all nodes within a Cassandra cluster.
  • key_cache_size_in_mb should be set to 0 to disable key caching.
  • counter_cache_size_in_mb should be set to 0 to disable counters.
  • The seeds must be provided as a comma-delimited list of IP addresses to which new nodes will be able to contact for information about the Cassandra cluster. It is recommended that all nodes have the same list of seeds specified, and that all nodes be specified as seed nodes.
  • listen_address is the IP address that other Cassandra nodes use to connect to this node.
  • The rpc_address is the listen address for remote procedure calls (and clients, such as the cassandra-cli).

Make sure that start_rpc is set to true. If authentication is required, leave the start_native_transport at true. This port will be required to set the user name and passwords in Cassandra. If authentication is not required set start_native_transport to false.

Tip
If authentication is required – set the authenticator to PasswordAuthenticator and set the authorizer to CassandraAuthorizer. Refer to Step 5 for further information on setting the username and password.

The addresses are defaulted to localhost; it is recommended to set these to the IP address.

  • Verify that storage_port and rpc_port do not conflict with other configured services. The storage_port, which defaults to 7000, is the port used by Cassandra nodes for inter-node communication. The rpc_port, which defaults to 9160, is used for remote procedure calls (e.g. cassandra-cli) and the Thrift service. This is the port to use when building clients for the Cassandra API.

If Cassandra is being deployed in a multiple data center configuration, the endpoint_snitch should be modified from the default of SimpleSnitch to PropertyFileSnitch, where the snitch then employs the cassandra-topology.properties to determine the nodes in the cluster. There are other snitch types available; please refer to the cassandra.yaml for the descriptions of these types. If a multiple data center deployment is chosen, the schema will require the correct replication information to be provided in the strategy option pairs that represent the cluster. An example for manually loading with the PropertyFileSnitch is described below. For Orchestration loading, the pairs in the strategy option should be the same in the persistence configuration.

Step 2.2: Edit cassandra-env.sh and cassandra.bat

To configure Cassandra's JVM (Java Virtual Machine) and the JMX (Java Management Extensions) interface, edit conf/cassandra-env.sh for Linux, or bin/cassandra.bat for Windows.

The JMX port is used for management connections (such as nodetool). If necessary, edit the following line and ensure that there are no port conflicts with existing services. To enable remote JMX access see: https://wiki.apache.org/cassandra/JmxSecurity.

  • For cassandra-env.sh:
#Specifies the default port over which Cassandra will be available for 
# local JMX connections.
# JMX_PORT="7199"

For cassandra.bat:

  • Set SERVICE_JVM to the desired Windows Service name/
  • Set PATH_PRUNSRV to the daemon directory, and PR_LOGPATH to the logs directory.
cassandra.jmx.local.port=7199>
...
:doInstallOperation
set SERVICE_JVM="cassandra_gre_01"
rem location of Prunsrv

set PATH_PRUNSRV=%CASSANDRA_HOME%\bin\daemon\
# For x64 installations, (OS and JAVA) set 
PATH_PRUNSRV=%CASSANDRA_HOME%\bin\daemon\amd64
set PR_LOGPATH=%CASSANDRA_HOME%\logs

Step 2.3: Edit logback.xml

Logging options can be found in conf/logback.xml. The default directory for logging is %CASSANDRA_HOME%\logs, with log file name system.log.n, where n is the wrap number. Cassandra versions 2.2.x and later, by default, enables debug level logging to separate file names. In order to disable debug.log, comment-out the ASYNCDEBUGLOG appender reference in the root level section.

Step 2B: Set up the Cassandra service (Windows)

Install the Cassandra service if desired.

To install: 'bin\cassandra.bat INSTALL'
To uninstall: 'bin\cassandra.bat UNINSTALL'

Once installed, you will be able to find and start up the Cassandra service from the Windows Services GUI. The name of the service depends on the value of SERVICE_JVM.

Step 3: Start up Cassandra

Linux:

Start up Cassandra by invoking bin/cassandra -f. It will start up in the foreground and will log to std-out. If you don't see any error or fatal log messages or Java stack traces, then chances are you've succeeded.
Press "Control-C" to stop Cassandra.
If you start up Cassandra without "-f" option, it will run in background, so you need to kill the process to stop.

Windows:

To start the service from Windows, there are two options:
  1. Use the Windows Services GUI
  2. From commandline, in the daemon dir (as set above):
  • start: prunsrv.exe start <Cassandra Service Name>
  • stop: prunsrv.exe stop <Cassandra Service Name>

NOTE: There is currently a bug in prunsrv version 1.0.15.0 which prevents the service from being stopped.  Use version 1.0.14.0 to prevent this, available from http://archive.apache.org/dist/commons/daemon/binaries/windows/ 

Step 4: Using nodetool

Once all Cassandra nodes have been started, we can check the status of the Cassandra cluster using nodetool.

On one of the Cassandra nodes, from %CASSANDRA_HOME%, run

/bin/nodetool -h <listen_address> -p <jmx_port> status

The output of this command should be similar to the example found in the Useful Tools section. There should be as many addresses in the list as the number of Cassandra nodes configured. If there are fewer nodes than expected, make sure that all nodes have unique initial tokens.

Comments or questions about this documentation? Contact us for support!