狗万大网,狗万体育怎么样

Apache Storm - Installation and Configuration Tutorial

Installation and Configuration

Welcome to the third chapter of the Apache Storm tutorial (part of the Apache Storm course). This chapter will help you get familiarized the steps for installation and configuration of Storm. Now, let us start with exploring the objectives of this lesson.

Objectives

By the end of this lesson, you will be able to:

  • 狗万大网,狗万体育怎么样Choose proper hardware for Storm installation

  • Install Storm on an Ubuntu system

  • Configure Storm

  • 狗万大网,狗万体育怎么样Run Storm on an Ubuntu system

  • 狗万大网,狗万体育怎么样Describe the steps to set up a multi-node Storm cluster.

狗万大网,狗万体育怎么样Moving on, let us discuss the Storm versions.

Steps involved in Installation and Configuration of Apache Storm

Following are the steps involved in installation and configuration of apache storm.

  1. 狗万大网,狗万体育怎么样Choosing the Storm Version

  2. OS Selection

  3. Machine Selection

  4. 狗万大网,狗万体育怎么样Preparation for Installation

  5. Downloading Kafka

  6. 狗万大网,狗万体育怎么样Downloading Storm

  7. Installing Kafka

Wish to have in-depth knowledge of Apache Storm? Check out our Course Preview!

Choosing the Storm Version

Storm has multiple versions. You need first to choose the latest stable version for installation. Zookeeper狗万大网,狗万体育怎么样 is a prerequisite to run Storm on the system. So, you start by installing zookeeper.

Since Zookeeper comes as a part of Kafka and you will learn about Kafka interface to Strom later in the lesson, let us start by installing Kafka before installing Storm. Version 0.8.2 is the current stable version of Kafka and the current stable version of Storm is Version 0.9.5.

The stable version of Kafka can be downloaded from: http://www.apache.org/dyn/closer.cgi?path=/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz

狗万大网,狗万体育怎么样The stable version of Storm can be downloaded from: http://apache.mirrors.pair.com/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz

Next, let us look at selecting the right Operating System.

OS selection

You can choose any of the for installation.

Ubuntu 12.04 or later – Installed on our virtual machine is a good choice. You can also choose other Linux systems like Red Hat Enterprise Linux (referred to as RHEL) or CentOS (Free version of RHEL) or Debian systems.

狗万大网,狗万体育怎么样Let’s move on to understand how to select an appropriate machine for Storm installation

Machine Selection

Storm needs a good memory and adequate processing power. Below are the recommended machine configurations.

狗万大网,狗万体育怎么样For development systems,

  • Minimum of 2GB RAM.

  • 1 CPU for Storm.

  • 1 TB hard disk.

For production systems,

  • 狗万大网,狗万体育怎么样Minimum 16GB RAM/

  • Up to 32GB of RAM per machine (recommended)

  • 狗万大网,狗万体育怎么样At least 6-Core CPUs (recommended)

  • Processors which are 2GHz or more.

  • 4x2TB hard disks.

  • 1 GB Ethernet.

Next, let us look at how to prepare the machine for installation.

Preparing for Installation

狗万大网,狗万体育怎么样Following are the prerequisite software for installing Storm.

Java JRE 1.7 or higher Oracle JRE recommended but works with Open JRE as well Zookeeper, which can be installed from Kafka repository

Now, let us look at the steps to download the software.

Download Kafka

狗万大网,狗万体育怎么样Kafka can be downloaded directly from the Apache Kafka website:

wget http://mirrors.advancedhosters.com/apache/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz

狗万大网,狗万体育怎么样You may choose a different mirror based on your location by checking: http://www.apache.org/dyn/closer.cgi?path=/Kafka/0.8.2.1/Kafka_2.9.1-0.8.2.1.tgz

Do this on each machine where Zookeeper has to be installed. The file with .tgz or .tar.gz extension is called tarball. Tarball is a compressed tar archive on Linux.

狗万大网,狗万体育怎么样Next, let us learn how to download Storm.

Download Storm

狗万大网,狗万体育怎么样Storm can be downloaded directly from the Apache Storm website:

wget http://apache.mirrors.pair.com/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz

Do this on each machine where Storm has to be installed.

Now, we will learn how to install Kafka.

Install Kafka Demo 01

狗万大网,狗万体育怎么样After the download, the archives have to be unzipped and moved to the proper location.

Step 1

狗万大网,狗万体育怎么样Unzip the package using tar utility:

tar –xzf Kafka_2.9.1-0.8.2.1.tgz

Step 2

狗万大网,狗万体育怎么样Move to proper directory:

狗万大网,狗万体育怎么样sudo mv Kafka_2.9.1-0.8.2.1 /usr/local/Kafka

Note that sudo may ask for the password.

Now that Kafka is installed, let us install Storm.

Install Storm

狗万大网,狗万体育怎么样You need to follow the same steps to download Storm download:

Step 1

Unzip the package using tar utility:

狗万大网,狗万体育怎么样tar -xzf apache-storm-0.9.5.tar.gz

Step 2

Move to proper directory:

sudo mv apache-storm-0.9.5.tar.gz /usr/local/storm

Next, let us set up a path for Kafka and Storm.

Set up Path for Kafka and Storm

Step 1

Edit the .bashrc file in the home directory and add Kafka directory to the path Access the home directory by using the cd command as below: cd

Step 2

Edit the .bashrc file using the vi command as below: vi .bashrc In vi, the below-mentioned lines are added at the end of the file. i command is used to go to the insert mode in vi and escape command to get out of the insert mode in this manner:

export KAFKA_PREFIX=/usr/local/Kafka export PATH=$PATH:$KAFKA_PREFIX/bin export STORM_PREFIX=/usr/local/storm export PATH=$PATH:$STORM_PREFIX/bin

Step 3

狗万大网,狗万体育怎么样Now, get out of the insert mode using the escape command and save the file with :wq

Note that all the above commands are case sensitive, so you need to type exactly as shown. Restart bash for changes to take effect as shown below: exec bash This will set up the path to include the Kafka and Storm directory.

狗万大网,狗万体育怎么样Moving on, let us see how to configure memory settings.

Configuring Low Memory Settings

Some development systems have low memory, so by default heap memory settings will not work on them.

Below changes are required for a development cluster with low memory:

Step 1

Change the directory to bin directory of kafka installation.

cd /usr/local/Kafka/bin

Step 2

狗万大网,狗万体育怎么样Next, edit the zookeeper-server-start.sh file using vi editor. You can use i to enter the insert mode in vi and escape to get out of the insert mode. (Escape key is normally located at the top left corner of the keyboard) vi zookeeper-server-start.sh.

In the above file, replace the line

export KAFKA_HEAP_OPTS="-Xmx512M -Xms512M" with export KAFKA_HEAP_OPTS="-Xmx64M -Xms64M"

狗万大网,狗万体育怎么样Press escape and save the file with :wq.

狗万大网,狗万体育怎么样Next, edit the Kafka-server-start.sh file using vi.Kafka-server-start.sh.

In the above file, replace the line

狗万大网,狗万体育怎么样export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G" with export KAFKA_HEAP_OPTS="-Xmx128M -Xms128M"

狗万大网,狗万体育怎么样Press escape and save the file with :wq.

狗万大网,狗万体育怎么样Now, let us look at how to configure zookeeper for Storm.

Configuring Zookeeper for Storm

Since Storm uses zookeeper for distributed coordination, you need to configure zookeeper to work with Kafka.

Modify the zookeeper.properties file in the Kafka configuration directory a shown below:

狗万大网,狗万体育怎么样cd /usr/local/Kafka/config vi zookeeper.properties

Check the file and add the following lines if already not present: initLimit=5 syncLimit=2 maxClientCnxns=0 server.1=localhost:2888:3888

Press Escape and save the file with :wq.

狗万大网,狗万体育怎么样Then, exit the editor. Use the below command to create a myid file for zookeeper: echo 1 > /tmp/myid sudo cp /tmp/myid /tmp/zookeeper/myid

Next, let us learn how to configure Kafka.

Configuring Kafka

狗万大网,狗万体育怎么样The changes given below are required for Kafka configuration.

cd /usr/local/Kafka/config vi server.properties

Replace the line given below:

broker.id=0 with broker.id=1.

Check that the default port is set to 9092: port=9092.

狗万大网,狗万体育怎么样Check that zookeeper is set to connect at port 2081: zookeeper.connect=localhost:2181.

If you have multiple zookeeper instances, you can specify them as mentioned above, separated by commas.

狗万大网,狗万体育怎么样Now, you will learn to modify some more Kafka properties.

Few more changes are to be done on the server.properties file.

Add the following two lines at the end of the file:

狗万大网,狗万体育怎么样queued.max.requests=1000 auto.create.topics.enable=false

The last line ensures that the topics are explicitly created before creating a message for the topic. Press escape to exit insert mode and save the file with :wq

Moving on, you will learn how to start the Kafka server.

Start the Zookeeper and Kafka servers

First, you need to start the zookeeper server with the below command:

sudo nohup

狗万大网,狗万体育怎么样/usr/local/kafka/bin/zookeeper-serverstart.sh

狗万大网,狗万体育怎么样/usr/local/kafka/config/zookeeper.properti

狗万大网,狗万体育怎么样es > /tmp/zk.out 2>/tmp/zk.err &

狗万大网,狗万体育怎么样Enter the Simplilearn password, if asked for a password.

Note that sudo is used so that you have proper permissions.

The & (ampersand) is added at the end so that the process runs in the background.

For background processes, nohup is added in the beginning so that the background process does not end, even if your session is terminated. The standard output from the server is sent to /tmp/zk.out file and the standard error is sent to /tmp/zk.err file with the 2> option.

Next, start the Kafka server with the below command:

sudo nohup /usr/local/kafka/bin/kafkaserver-start.sh

狗万大网,狗万体育怎么样/usr/local/kafka/config/server.properties >

/tmp/kafka.out 2>/tmp/kafka.err &

狗万大网,狗万体育怎么样sudo and nohup are used here in the same way as explained in the previous command.

狗万大网,狗万体育怎么样Next, let us look at creating directories for Storm.

Create Directories for Storm

狗万大网,狗万体育怎么样Create lib directory for Storm:

  • 狗万大网,狗万体育怎么样sudo mkdir -p /var/lib/storm

  • sudo chmod 777 /var/lib/storm

狗万大网,狗万体育怎么样As you have learned how to install Storm and create the directories for Storm.

狗万大网,狗万体育怎么样Next, you will learn how to configure Storm.

Configuring Storm

狗万大网,狗万体育怎么样Given here are the changes required for Storm configuration.

Change directory to the Storm configuration directory using the command mentioned below

狗万大网,狗万体育怎么样cd /usr/local/storm/conf

Edit the storm.yaml file using the command: vi storm.yaml.

狗万大网,狗万体育怎么样Replace these lines #storm.zookeeper.servers: # - “server1” with storm.zookeeper.servers: - “localhost”

狗万大网,狗万体育怎么样Specify the address of the nimbus host: nimbus.host: “nimbus1” Change nimbus1 to the IP address of the machine.

狗万大网,狗万体育怎么样Now, you will configure the storm memory parameters.

Configuring Storm Memory Parameters

狗万大网,狗万体育怎么样Continue modifying the same file to specify the memory for Java processes, you can start with 128MB for all the processes. Add the lines mentioned below.

狗万大网,狗万体育怎么样If the lines already exist, modify to change the numbers.

狗万大网,狗万体育怎么样nimbus.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

狗万大网,狗万体育怎么样ui.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"

worker.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true“

狗万大网,狗万体育怎么样Specify the data directory for Storm.

storm.local.dir: "/var/lib/storm"

Press escape and save the storm.yaml file with :wq.

狗万大网,狗万体育怎么样Next, let us look at starting the Storm servers.

Start Storm Servers

Let us start the Storm nimbus and supervisor servers. You will also need to start the Storm UI to monitor through a web interface.

Start Storm nimbus server on master node:

nohup bin/storm nimbus >/tmp/nimbus.out 2>/tmp/nimbus.err &

Start Storm supervisor on each worker node:

nohup bin/storm supervisor > /tmp/supervisor.out 2>/tmp/supervisor.err &

狗万大网,狗万体育怎么样Start the Storm UI for monitoring; you can check this at port 8080 using your favourite browser.

nohup storm ui >/tmp/ui.out 2>/tmp/ui.err &  

Next, let us run a sample Storm program.

Run a Sample Storm Program

Here, you will run a sample program created by Simplilearn that processes the logfile.

cd /tmp

wget simplilearncdn/logfile

wget simplilearncdn/LogProcessTopology.jar

storm jar LogProcessTopology.jar

狗万大网,狗万体育怎么样storm.starter.LogProcessTopology test1

storm list

狗万大网,狗万体育怎么样The command mentioned above will give the following output:

狗万大网,狗万体育怎么样Topology_name

Status

Num_tasks

狗万大网,狗万体育怎么样Num_workers

Uptime_secs

test1

Active

7

1

23

狗万大网,狗万体育怎么样Next, let us check the output of the sample Storm program.

Check the Output

The output of the sample program is in /tmp/stormoutput.txt directory. You can check the content of this file with the command: cat /tmp/stormoutput.txt.

The output will be displayed as shown below:

INFO:1

ERROR:1

WARNING:1

ERROR:2

WARNING:2

INFO:2

ERROR:3

WARNING:3

ERROR:4

WARNING:4

狗万大网,狗万体育怎么样Note that the actual output might be different in your case.

狗万大网,狗万体育怎么样Next, let us check the Storm UI.

Check the UI

狗万大网,狗万体育怎么样You can check the storm processes using Storm UI at port 8080.

狗万大网,狗万体育怎么样Use your browser (Firefox or Chrome) and IP_address:8080, where IP_address will be the IP address of your virtual machine.

The diagram shows the Storm UI from the browser at port 8080. It shows the cluster summary, topology summary, supervisor summary as well as Nimbus server configuration parameters. You can see that the topology test1 is currently running.  

Storm process using UI

狗万大网,狗万体育怎么样Now, you will learn how to stop the Storm topology.

Stop the Storm Topology

狗万大网,狗万体育怎么样You can stop the running storm topology with the help of the following command: storm kill test1 Verify that the topology is not running with storm list.

狗万大网,狗万体育怎么样This will produce the following output:

Topology_name

Status

Num_tasks

Num_workers

Uptime_secs

test1

KILLED

7

1

315

Let us now check the log files.

Looking for more information on Apache Storm? Watch our Course Preview!

Check the Log Files

狗万大网,狗万体育怎么样The log files are created in the folder /user/local/storm/logs

cd /usr/local/storm/logs

This directory will have the following files:

Nimbus.log

Supervisor.log

狗万大网,狗万体育怎么样Worker-pid.log

vi can be used to check the content of the log files.

Setting Up Multi-node Storm Cluster

To set up a multi-node cluster, let us take an example of setting up a 3 node cluster with nodes and IP addresses: node1, node2, and node3

狗万大网,狗万体育怎么样First, install Kafka and Storm on each machine as discussed earlier. That is, download the Kafka and Storm tarballs, unzip the compressed archive and move the expanded directory to /usr/local/Kafka and /usr/local/storm respectively.

This has to be done on each of the three nodes.

Moving on to the second step of setting up multi-node Storm cluster.

Setup zookeeper on each node: 

狗万大网,狗万体育怎么样cd /usr/local/kafka/config

狗万大网,狗万体育怎么样vi zookeeper.properties

狗万大网,狗万体育怎么样Add the following lines if not present already:

狗万大网,狗万体育怎么样initLimit=5

狗万大网,狗万体育怎么样syncLimit=2

狗万大网,狗万体育怎么样maxClientCnxns=0

server.1=node1:2888:3888

server.2=node2:2888:3888

狗万大网,狗万体育怎么样server.3=node3:2888:3888

Press the escape key, save the file with :wq and then exit the editor

狗万大网,狗万体育怎么样Note that node1, node2, node3 are the IP addresses of the 3 servers.

Setup the myid file for Zookeeper

狗万大网,狗万体育怎么样The third step is to Setup the myid file for zookeeper.

The command mentioned below is used to create the myid file for zookeeper:

On node1:

echo 1 > /tmp/myid

sudo cp /tmp/myid /tmp/zookeeper/myid

On node2:

狗万大网,狗万体育怎么样echo 2 > /tmp/myid

狗万大网,狗万体育怎么样sudo cp /tmp/myid /tmp/zookeeper/myid

On node3:

狗万大网,狗万体育怎么样echo 3 > /tmp/myid

狗万大网,狗万体育怎么样sudo cp /tmp/myid /tmp/zookeeper/myid

Note that the content of myid file is different on each server.

Moving on to the fourth step.

The storm broker properties need to be set up for which the changes mentioned below are required for storm configuration on each machine:

狗万大网,狗万体育怎么样cd /usr/local/storm/conf vi storm.yaml

Replace the following lines:

#storm.zookeeper.servers: # - “server1”

with

storm.zookeeper.servers: - “node1” -”node2” -”node3”

Specify the address of the nimbus host.

狗万大网,狗万体育怎么样Please note that you will have Nimbus running on only the master node – node1 in our cluster. nimbus.host: “node1”

Moving on to the fifth step of the set up.

Some more changes are made to storm.yaml file.

For childopts, specify the memory for child opts, you can start with 128M

狗万大网,狗万体育怎么样nimbus.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

ui.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"

狗万大网,狗万体育怎么样worker.childopts: "-Xmx128m -Djava.net.preferIPv4Stack=true"

Specify the data directory for storm: storm.local.dir: "/var/lib/storm"

狗万大网,狗万体育怎么样Press escape and save the storm.yaml file with :wq.

狗万大网,狗万体育怎么样Finally, the last step of the set up is to Start the zookeeper server on each node.

sudo nohup /usr/local/Kafka/bin/zookeeper-server-start.sh /usr/local/Kafka/config/zookeeper.properties > /tmp/zk.out 2>/tmp/zk.err &

狗万大网,狗万体育怎么样Start the Storm Nimbus server on node1:

狗万大网,狗万体育怎么样nohup storm nimbus > /tmp/nimbus.out 2>/tmp/nimbus.err &

Start the Storm supervisor process on each node:

nohup storm supervisor > /tmp/supervisor.out 2>/tmp/supervisor.err &

This completes the setup of the multi-node Storm cluster.

Summary

狗万大网,狗万体育怎么样Here are the key takeaways.

  • Storm has multiple versions and the latest stable version of Storm is 0.9.5.

  • 狗万大网,狗万体育怎么样Proper OS and machine configurations should be chosen before starting

  • 狗万大网,狗万体育怎么样the installation.

  • Kafka installation is used to install zookeeper.

  • 狗万大网,狗万体育怎么样Storm can be installed by downloading the latest tarball.

  • After the installation of zookeeper and Storm, both of them need to be configured.

  • 狗万大网,狗万体育怎么样After the configuration of zookeeper and Storm changes, the zookeeper server has to be

  • 狗万大网,狗万体育怎么样started before starting Storm.

  • Storm command can be used to submit a topology to Storm.

  • To set up a multi-node Storm cluster, a six-stepped process needs to be followed.

Conclusion

This concludes the lesson: Introduction to the Installation and Configuration of Storm. In the next lesson, you will learn about the Advanced Storm Concepts.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

We use cookies on this site for functional and analytical purposes. By using the site, you agree to be cookied and to our Terms of Use. Find out more

Request more information

For individuals
For business
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.