This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Troubleshooting

Troubleshooting

This topic details how to find information you need to troubleshoot most problems in your cloud. To troubleshoot Eucalyptus, you must have the following:

  • a knowledge about which machines each Eucalyptus component is installed on
  • root access to each machine hosting Eucalyptus components
  • an understanding of the network mode (EDGE, VPCMIDO)
  • an understanding of eucanetd and the configuration connecting the Eucalyptus components

For most problems, the procedure for tracing problems is the same: start at the bottom to verify the bottom-most component, and then work your way up. If you do this, you can be assured that the base is solid. This applies to virtually all Eucalyptus components and also works for proactive, targeted monitoring.

1 - Eucalyptus Log Files

Usually when an issue arises in Eucalyptus, you can find information that points to the nature of the problem either in the Eucalyptus log files or in the system log files. This topic details log file message meanings, location, configuration, and fault log information.

2 - Network Information

When you have to troubleshoot, it’s important to understand the elements of the network on your system.Here are some ideas for finding out information about your network:

  • It is also important to understand the elements of the network on your system. For example, you might want to list bridges to see which devices are enslaved by the bridge. To do this, use the command.
  • You might also want to list network devices and evaluate existing configurations. To do this, use these commands: , , and .
  • You can use to check status, or to force eucanetd to run in the foreground, sending log messages to the terminal.
  • You can get further information if you use the commands with the options. For example, returns all instances running by all users on the system. Other describe commands are:

3 - Common Problems

Common Problems

This section describes common problems and workarounds.

3.1 - Problem: can't communicate with instance

Use ping from a client (not the CLC). Can you ping it?

Yes: Check the open ports on security groups and retry connection using SSH or HTTP. Can you connect now? Yes. Okay, then. You’re work is done. No: Try the same procedure as if you can’t ping it up front. No: Is your cloud running in Edge networking mode?

  • Yes: Run euca-describe-nodes . Is your instance there?

  • No, it is not in Edge networking mode:

3.2 - Problem: install-time checks

Eucalyptus offers installation checks for any Eucalyptus component or service (CLC, Walrus, SC, NC, SC, services, and more). When Eucalyptus encounters an error, it presents the problem to the operator. These checks are used for install-time problems. They provide resolutions to some of the fault conditions.

Each problematic condition contains the following information:

HeadingDescription
ConditionThe fault found by Eucalyptus
CauseThe cause of the condition
InitiatorWhat is at fault
LocationWhere to go to fix the fault
ResolutionThe steps to take to resolve the fault

image

For more information about all the faults we support, go to https://github.com/eucalyptus/eucalyptus/tree/master/util/faults/en_US .

3.3 - Problem: instance runs but fails

Run euca-describe-nodes to verify if instance is there. Is the instance there?

Yes: Go to the NC log for that NC and grep your instance ID. Did you find the instance?

  • Yes: Is there an error message?

No: Go to the CC log and grep the instance ID. Is it there error message?

  • Yes: The error message should give you some helpful information.

  • No: grep the instance ID in cloud-output.log . Is there error message?

No: Log in as admin and run euca-describe-instance . Is the instance there?

  • Yes:
  • No: Start over and run a new instance, recreate failure, and start these steps over.

3.4 - Problem: snapshot creation failed

On the SC, depending on the backend used for storage:

  • For Overlay, use the command to check the disk space in .
  • For DAS, use the command to check the disk space in the DAS volumes.
  • For any other backend, use its specific commands to check the free space for storage allocated for volumes. Is there enough space?

Yes: On the OSG host, depending on the backend used for object storage:

  • For Walrus, use the command to check the disk space in .

  • For RiakCS or Ceph-RGW, use its specific commands to check the free space for storage allocated for buckets and objects. Is there enough space? Yes.

  • Use and note the IP addresses for the OSG and SC.

  • SSH to SC and ping the OSG. Are there error messages?

No: Delete volumes or add disk space. No: Delete volumes or add disk space.

3.5 - Problem: volume creation failed

Symptom: Went from available to fail. This is typically caused by the CLC and the SC.On the SC, use df or lvdisplay to check the disk space. Is there enough space?

Yes: Check the SC log and grep the volume ID. Is there error message? Yes. This provides clues to helpful information. No: Check cloud-output.log for a volume ID error. No: Delete volumes or add disk space.

4 - Component Workarounds

Component Workarounds

This section contains troubleshooting information for Eucalyptus components and services.

4.1 - Access and Identities

This topic contains information about access-related problems and solutions. Need to verify an existing LIC file.

  1. Enter the following command: The output from the example above shows the name of the LIC file and status of the synchronization (set to false).

4.2 - Elastic Load Balancing

This topic explains suggestions for problems you might have with Elastic Load Balancing (ELB). Can’t synchronize with time server Eucalyptus sets up NTP automatically for any instance that has an internet connection to a public network. If an instance doesn’t have such a connection, set the cloud property loadbalancing.loadbalancer_vm_ntp_server to a valid NTP server IP address. For example:

euctl loadbalancing.loadbalancer_vm_ntp_server=169.254.169.254
PROPERTY	loadbalancing.loadbalancer_vm_ntp_server	169.254.169.254 was {}

Need to debug an ELB instance To debug an ELB instance, set the loadbalancing.loadbalancer_vm_keyname cloud property to the keypair of the instance you want to debug. For example:

# euctl loadbalancing.loadbalancer_vm_keyname=sshlogin
PROPERTY	loadbalancing.loadbalancer_vm_keyname	sshlogin was {}

4.3 - Imaging Worker

This topic contains troubleshooting tips for the Imaging Worker.Some requests that require the Imaging Worker might remain in pending for a long time. For example: an import task or a paravirtual instance run. If request remains in pending, the Imaging Worker instance might not able to run because of a lack of resources (for example, instance slots or IP addresses).

You can check for this scenario by listing latest AutoScaling activities:

euscale-describe-scaling-activities -g asg-euca-internal-imaging-worker-01

Check for failures that indicate inadequate resources such as:

ACTIVITY        1950c4e5-0db9-4b80-ad3b-5c7c59d9c82e    2014-08-12T21:05:32.699Z        asg-euca-internal-imaging-worker-01    Failed   Not enough resources available: addresses; please stop or terminate unwanted instances or release unassociated elastic IPs and try again, or run with private addressing only

4.4 - Instances

This topic contains information to help you troubleshoot your instances. Inaccurate IP addresses display in the output of euca-describe-addresses. This can occur if you add IPs from the wrong subnet into your public IP pool, do a restart on the CC, swap out the wrong ones for the right ones, and do another restart on the CC. To resolve this issue, run the following commands.

systemctl stop eucalyptus-cloud.service
systemctl stop eucalyptus-cluster.service
iptables -F
systemctl restart eucalyptus-cluster.service
systemctl start eucalyptus-cloud.service

NC does not recalculate disk size correctly This can occur when trying to add extra disk space for instance ephemeral storage. To resolve this, you need to delete the instance cache and restart the NC.

For example:

rm -rf /var/lib/eucalyptus/instances/* 
systemctl restart eucalyptus-node.service               				

4.5 - Walrus and Storage

This topic contains information about Walrus-related problems and solutions. Walrus decryption failed. On Ubuntu 10.04 LTS, kernel version 2.6.32-31 includes a bug that prevents Walrus from decrypting images. This can be determined from the following line in cloud-output.log

javax.crypto.
BadPaddingException: pad block corrupted

If you are running this kernel:

  1. Update to kernel version 2.6.32-33 or higher.
  2. De-register the failed image ( ).
  3. Re-register the bundle that you uploaded ( ).

Walrus physical disk is not large enough.

  1. Stop the CLC.
  2. Add a disk.
  3. Migrate your data. Make sure you use LVM with your new disk drive(s).