1 - Operations Overview

This section is for architects and cloud administrators who plan to deploy Eucalyptus in a production environment. It is not intended for end users or proof-of-concept installations.To run Eucalyptus in a production environment, you must be aware of your hardware and network resources. This guide is to help you make decisions about deploying Eucalyptus. It is also meant to help you keep Eucalyptus running smoothly.

2 - Planning Your Deployment

To decide on your deployment’s scope, determine the use case for your cloud. For example, will this be a small dev-test environment, or a large and scalable web services environment?To help with scoping your deployment, see Plan Your Installation in the Installation Guide . There you will find solution examples and physical resource information.

3 - Testing Your Deployment

This topic details what you should test when you want to make sure your deployment is working. The following suggested test plan contains tasks that ensure DNS, imaging, and storage are working.

DNS

  • Verify that instances can ping their:
  • Verify that instances are pingable on their public DNS names from:

Imaging

  • Verify that an EBS-backed image boots successfully
  • Verify that you can create an image from a running EBS-backed instance
  • Verify that you can install a new Ubuntu image
  • Verify that you can deregister an image
  • Verify that you can import an instance
  • Verify that you can import a volume

Walrus

  • Verify that you can make a basic s3cmd request
  • Verify that you can successfully perform a multi-part upload (use a 1G+ file)

4 - Customizing Your Deployment

This section describes the most commonly applied post-install customizations and the issues they pose:

  • Over-subscription
  • Networking changes (EDGE mode)
  • CloudWatch tweaks/customizations
  • Capacity changes

Over-subscription

Over-subscription refers to the practice of expanding your computer beyond its limits. Over-subscription applies only to node controllers. You may modify disks and cores to allow enough usage buffer for your instance. Navigate to /etc/eucalyptus/ and locate the eucalyptus.conf file. Edit the following values to define the appropriate size buffers for your instances: NC_WORK_SIZE Defines the amount of disk space available for instances to be run. Defaults to 1/3 of the currently available disk space on the NC, and NC_CACHE_SIZE defaults to the other 2/3.

NC_CACHE_SIZE Defines how much disk space is needed for images to be cached. MAX_CORES Defines the maximum number of cores that can be provided to VMs on each NC. If it is 0 or not present, then the only limit on the number of instances is the number of cores available on the NC. If it is present, any value greater than 256 is treated as 256. In order for these changes to take effect, you must restart the NC.

Networking Changes (EDGE modes)

You can modify the default by adding network IPs to your cloud. Adding public IPs does not require shutting down the whole system.

To add network IPs:In EDGE mode, adding or changing the IP involves creating a JSON file and uploading it the Cloud Controller (CLC). See Configure for Edge Mode for more details. No restart needed, changes apply automatically.

Change CloudWatch Properties

You can change the following CloudWatch properties:

PropertyDescription
cloud.monitor.default_poll_interval_minsThis is how often the CLC sends a request to the CC for sensor data. Default value is 5 minutes. If you set it to 0 = no reporting. The more often you poll, the more hit on system performance.
cloud.monitor.history_sizeThis is how many data value samples are sent in each sensor data request. The default value is 5. How many samples per poll interval.
cloudwatch.enable_cloudwatch_serviceDisables CloudWatch when set to false.

Change Capacity

Capacity changes refer to adding another zone or more nodes. To add another zone, install , start , and register . To add more nodes, see Add a Node Controller .

5 - Managing Policies

This topic details best practices for managing your cloud policies.

  • Establish a workflow for account creation, including the initial request for a cloud account and the email containing credentials.
  • Limit your use of individual policies. Focus your policies on groups and add individuals to the group.
  • Use groups to assign permissions to individual users. Limit the use of policies for individual users. For more information about policy best practices, see IAM Best Practices .

6 - Networking

This topic addresses networking in the Eucalyptus cloud.

Networking Modes

Eucalyptus offers different modes to provide you with a cloud that will fit in your current network. For information what each networking mode has to offer, see Plan Networking Modes .

EC2-Classic Networking

Eucalyptus EDGE networking mode supports EC2-Classic networking. Your instances run in a single, flat network that you share with others. For more information about EC2-Classic networking, see EC2 Supported Platforms .

EC2-VPC Networking

Eucalyptus VPCMIDO networking mode resembles the Amazon Virtual Private Cloud (VPC) product wherein the network is fully configurable by users. For more information about EC2-VPC networking, see Differences Between Instances in EC2-Classic and EC2-VPC .

7 - Monitoring

This topic includes details about which resources you should monitor.

ComponentRunning Processes
Cloud Controller (CLC)eucalyptus-cloud, postgres, eucanetd (VPCMIDO mode)
User-facing services (UFS)eucalyptus-cloud
Walruseucalyptus-cloud
Cluster Controller (CC)eucalyptus-cluster
Storage Controller (SC)eucalyptus-sc, tgtd (for DAS and Overlay)
Node Controller (NC)eucalyptus-node, httpd, dhcpd, eucanetd (EDGE mode), qemu-kvm / 1 per instance
Management Consoleeucaconsole

8 - Backup and Recovery

Backup and Recovery

This section provides details on important files to back up and recover.

8.1 - Back Up Eucalyptus Cloud Data

Back Up Cloud Data

This section explains what you need to back up and protect your cloud data.We recommend that you back up the following data:

  • The cloud database: see
  • Object storage. For objects in Walrus, the frequency depends on current load. Use your own discretion to determine the backup plan and strategy. You must have Walrus running.
  • EBS volumes in each cluster (DAS and Overlay)
  • The configuration file for the cloud is stored on the CLC: .
  • Any configuration file for the cloud stored on any other host (UFS, CC, etc.): .
  • The cloud security credentials on all hosts (you already backed up the CLC keys as part of the database backup). Use the tar command: .
  • The CC and NC configuration files, stored on every CC and NC: .
  • Any Euca2ools (.ini) configuration files, which reside on any Euca2ools host machine. Files can be found in:
  • Management Console config files in should be backed up. Typical files:
  • Ensure you have your instances’ so you can access the instances later.
  • and LVM snapshots Users are responsible for volume backups using EBS snapshots on their defined schedules.

8.1.1 - Back Up the Database

To back up the cloud database follow the steps listed in this topic.Bucket and object metadata are stored in the Eucalyptus cloud database. To back up the database

Log in to the CLC. The cloud database is on the CLC. Extract the Eucalyptus PostgreSQL database cluster into a script file.

pg_dumpall --oids -c -h/var/lib/eucalyptus/db/data -p8777 -U root -f/root/eucalyptus_pg_dumpall-backup.sql

Back up the cloud security credentials in the keys directory.

tar -czvf ~/eucalyptus-keydir.tgz /var/lib/eucalyptus/keys

8.2 - Recover Eucalyptus Cloud Data

Recover Cloud Data

This topic explains what to include when you recover your cloud.Recovering Your Cloud Data

We recommend that you recover the following data:

  • The cloud database: see
  • Object storage. For objects in Walrus, the frequency depends on current load. Use your own discretion to determine the restore plan and strategy.
  • EBS volumes in each cluster (DAS and Overlay)
  • The configuration file for the cloud is stored on the CLC: .
  • Any configuration file for the cloud stored on any other host (UFS, CC, etc.): .
  • The cloud security credentials on all hosts (you already restored the CLC keys as part of the database restore). Use the tar command: .
  • The CC and NC configuration files, stored on every CC and NC: .
  • Any Euca2ools (.ini) configuration files, which reside on any Euca2ools host machine. Files in these directories:
  • Management Console config files you backed up from should be restored. Typical files:
  • Ensure you have your instances’ so you can access the instances.
  • and LVM snapshots Users are responsible for volume restore using EBS snapshots.

8.2.1 - Restore the Database

To restore the cloud database follow the steps listed in this topic.

To restore the database

Stop the CLC service.

systemctl stop eucalyptus-cloud.service

Remove traces of the old database.

rm -rf /var/lib/eucalyptus/db

Restore the cloud security credentials in the keys directory.

tar -xvf ~/eucalyptus-keydir.tgz -C /

Re-initialize the database structure.

clcadmin-initialize-cloud

Start the database manually.

su eucalyptus -s /bin/bash -c "/usr/bin/pg_ctl start -w \
-s -D/var/lib/eucalyptus/db/data -o '-h0.0.0.0/0 -p8777 -i'"

Restore the backup.

psql -U root -d postgres -p 8777 -h /var/lib/eucalyptus/db/data -f/root/eucalyptus_pg_dumpall-backup.sql

Stop the database manually.

su eucalyptus -s /bin/bash -c "/usr/bin/pg_ctl stop -D/var/lib/eucalyptus/db/data"

Start CLC service

systemctl start eucalyptus-cloud.service

9 - Troubleshooting

Troubleshooting

This topic details how to find information you need to troubleshoot most problems in your cloud. To troubleshoot Eucalyptus, you must have the following:

  • a knowledge about which machines each Eucalyptus component is installed on
  • root access to each machine hosting Eucalyptus components
  • an understanding of the network mode (EDGE, VPCMIDO)
  • an understanding of eucanetd and the configuration connecting the Eucalyptus components

For most problems, the procedure for tracing problems is the same: start at the bottom to verify the bottom-most component, and then work your way up. If you do this, you can be assured that the base is solid. This applies to virtually all Eucalyptus components and also works for proactive, targeted monitoring.

9.1 - Eucalyptus Log Files

Usually when an issue arises in Eucalyptus, you can find information that points to the nature of the problem either in the Eucalyptus log files or in the system log files. This topic details log file message meanings, location, configuration, and fault log information.

9.2 - Network Information

When you have to troubleshoot, it’s important to understand the elements of the network on your system.Here are some ideas for finding out information about your network:

  • It is also important to understand the elements of the network on your system. For example, you might want to list bridges to see which devices are enslaved by the bridge. To do this, use the command.
  • You might also want to list network devices and evaluate existing configurations. To do this, use these commands: , , and .
  • You can use to check status, or to force eucanetd to run in the foreground, sending log messages to the terminal.
  • You can get further information if you use the commands with the options. For example, returns all instances running by all users on the system. Other describe commands are:

9.3 - Common Problems

Common Problems

This section describes common problems and workarounds.

9.3.1 - Problem: can't communicate with instance

Use ping from a client (not the CLC). Can you ping it?

Yes: Check the open ports on security groups and retry connection using SSH or HTTP. Can you connect now? Yes. Okay, then. You’re work is done. No: Try the same procedure as if you can’t ping it up front. No: Is your cloud running in Edge networking mode?

  • Yes: Run euca-describe-nodes . Is your instance there?

  • No, it is not in Edge networking mode:

9.3.2 - Problem: install-time checks

Eucalyptus offers installation checks for any Eucalyptus component or service (CLC, Walrus, SC, NC, SC, services, and more). When Eucalyptus encounters an error, it presents the problem to the operator. These checks are used for install-time problems. They provide resolutions to some of the fault conditions.

Each problematic condition contains the following information:

HeadingDescription
ConditionThe fault found by Eucalyptus
CauseThe cause of the condition
InitiatorWhat is at fault
LocationWhere to go to fix the fault
ResolutionThe steps to take to resolve the fault

image

For more information about all the faults we support, go to https://github.com/eucalyptus/eucalyptus/tree/master/util/faults/en_US .

9.3.3 - Problem: instance runs but fails

Run euca-describe-nodes to verify if instance is there. Is the instance there?

Yes: Go to the NC log for that NC and grep your instance ID. Did you find the instance?

  • Yes: Is there an error message?

No: Go to the CC log and grep the instance ID. Is it there error message?

  • Yes: The error message should give you some helpful information.

  • No: grep the instance ID in cloud-output.log . Is there error message?

No: Log in as admin and run euca-describe-instance . Is the instance there?

  • Yes:
  • No: Start over and run a new instance, recreate failure, and start these steps over.

9.3.4 - Problem: snapshot creation failed

On the SC, depending on the backend used for storage:

  • For Overlay, use the command to check the disk space in .
  • For DAS, use the command to check the disk space in the DAS volumes.
  • For any other backend, use its specific commands to check the free space for storage allocated for volumes. Is there enough space?

Yes: On the OSG host, depending on the backend used for object storage:

  • For Walrus, use the command to check the disk space in .

  • For RiakCS or Ceph-RGW, use its specific commands to check the free space for storage allocated for buckets and objects. Is there enough space? Yes.

  • Use and note the IP addresses for the OSG and SC.

  • SSH to SC and ping the OSG. Are there error messages?

No: Delete volumes or add disk space. No: Delete volumes or add disk space.

9.3.5 - Problem: volume creation failed

Symptom: Went from available to fail. This is typically caused by the CLC and the SC.On the SC, use df or lvdisplay to check the disk space. Is there enough space?

Yes: Check the SC log and grep the volume ID. Is there error message? Yes. This provides clues to helpful information. No: Check cloud-output.log for a volume ID error. No: Delete volumes or add disk space.

9.4 - Component Workarounds

Component Workarounds

This section contains troubleshooting information for Eucalyptus components and services.

9.4.1 - Access and Identities

This topic contains information about access-related problems and solutions. Need to verify an existing LIC file.

  1. Enter the following command: The output from the example above shows the name of the LIC file and status of the synchronization (set to false).

9.4.2 - Elastic Load Balancing

This topic explains suggestions for problems you might have with Elastic Load Balancing (ELB). Can’t synchronize with time server Eucalyptus sets up NTP automatically for any instance that has an internet connection to a public network. If an instance doesn’t have such a connection, set the cloud property loadbalancing.loadbalancer_vm_ntp_server to a valid NTP server IP address. For example:

euctl loadbalancing.loadbalancer_vm_ntp_server=169.254.169.254
PROPERTY	loadbalancing.loadbalancer_vm_ntp_server	169.254.169.254 was {}

Need to debug an ELB instance To debug an ELB instance, set the loadbalancing.loadbalancer_vm_keyname cloud property to the keypair of the instance you want to debug. For example:

# euctl loadbalancing.loadbalancer_vm_keyname=sshlogin
PROPERTY	loadbalancing.loadbalancer_vm_keyname	sshlogin was {}

9.4.3 - Imaging Worker

This topic contains troubleshooting tips for the Imaging Worker.Some requests that require the Imaging Worker might remain in pending for a long time. For example: an import task or a paravirtual instance run. If request remains in pending, the Imaging Worker instance might not able to run because of a lack of resources (for example, instance slots or IP addresses).

You can check for this scenario by listing latest AutoScaling activities:

euscale-describe-scaling-activities -g asg-euca-internal-imaging-worker-01

Check for failures that indicate inadequate resources such as:

ACTIVITY        1950c4e5-0db9-4b80-ad3b-5c7c59d9c82e    2014-08-12T21:05:32.699Z        asg-euca-internal-imaging-worker-01    Failed   Not enough resources available: addresses; please stop or terminate unwanted instances or release unassociated elastic IPs and try again, or run with private addressing only

9.4.4 - Instances

This topic contains information to help you troubleshoot your instances. Inaccurate IP addresses display in the output of euca-describe-addresses. This can occur if you add IPs from the wrong subnet into your public IP pool, do a restart on the CC, swap out the wrong ones for the right ones, and do another restart on the CC. To resolve this issue, run the following commands.

systemctl stop eucalyptus-cloud.service
systemctl stop eucalyptus-cluster.service
iptables -F
systemctl restart eucalyptus-cluster.service
systemctl start eucalyptus-cloud.service

NC does not recalculate disk size correctly This can occur when trying to add extra disk space for instance ephemeral storage. To resolve this, you need to delete the instance cache and restart the NC.

For example:

rm -rf /var/lib/eucalyptus/instances/* 
systemctl restart eucalyptus-node.service               				

9.4.5 - Walrus and Storage

This topic contains information about Walrus-related problems and solutions. Walrus decryption failed. On Ubuntu 10.04 LTS, kernel version 2.6.32-31 includes a bug that prevents Walrus from decrypting images. This can be determined from the following line in cloud-output.log

javax.crypto.
BadPaddingException: pad block corrupted

If you are running this kernel:

  1. Update to kernel version 2.6.32-33 or higher.
  2. De-register the failed image ( ).
  3. Re-register the bundle that you uploaded ( ).

Walrus physical disk is not large enough.

  1. Stop the CLC.
  2. Add a disk.
  3. Migrate your data. Make sure you use LVM with your new disk drive(s).