Troubleshooting
Troubleshooting
This topic details how to find information you need to troubleshoot most problems in your cloud. To troubleshoot Eucalyptus, you must have the following:
- a knowledge about which machines each Eucalyptus component is installed on
- root access to each machine hosting Eucalyptus components
- an understanding of the network mode (EDGE, VPCMIDO)
- an understanding of eucanetd and the configuration connecting the Eucalyptus components
For most problems, the procedure for tracing problems is the same: start at the bottom to verify the bottom-most component, and then work your way up. If you do this, you can be assured that the base is solid. This applies to virtually all Eucalyptus components and also works for proactive, targeted monitoring.
1 - Eucalyptus Log Files
Usually when an issue arises in Eucalyptus, you can find information that points to the nature of the problem either in the Eucalyptus log files or in the system log files. This topic details log file message meanings, location, configuration, and fault log information.
2 - Network Information
When you have to troubleshoot, it’s important to understand the elements of the network on your system.Here are some ideas for finding out information about your network:
- It is also important to understand the elements of the network on your system. For example, you might want to list bridges to see which devices are enslaved by the bridge. To do this, use the command.
- You might also want to list network devices and evaluate existing configurations. To do this, use these commands: , , and .
- You can use to check status, or to force eucanetd to run in the foreground, sending log messages to the terminal.
- You can get further information if you use the commands with the options. For example, returns all instances running by all users on the system. Other describe commands are:
3 - Common Problems
Common Problems
This section describes common problems and workarounds.
3.1 - Problem: can't communicate with instance
Use ping from a client (not the CLC). Can you ping it?
Yes: Check the open ports on security groups and retry connection using SSH or HTTP. Can you connect now? Yes. Okay, then. You’re work is done. No: Try the same procedure as if you can’t ping it up front. No: Is your cloud running in Edge networking mode?
3.2 - Problem: install-time checks
Eucalyptus offers installation checks for any Eucalyptus component or service (CLC, Walrus, SC, NC, SC, services, and more). When Eucalyptus encounters an error, it presents the problem to the operator. These checks are used for install-time problems. They provide resolutions to some of the fault conditions.
Each problematic condition contains the following information:
Heading | Description |
---|
Condition | The fault found by Eucalyptus |
Cause | The cause of the condition |
Initiator | What is at fault |
Location | Where to go to fix the fault |
Resolution | The steps to take to resolve the fault |

For more information about all the faults we support, go to https://github.com/eucalyptus/eucalyptus/tree/master/util/faults/en_US .
3.3 - Problem: instance runs but fails
Run euca-describe-nodes
to verify if instance is there. Is the instance there?
Yes: Go to the NC log for that NC and grep your instance ID. Did you find the instance?
- Yes: Is there an error message?
No: Go to the CC log and grep the instance ID. Is it there error message?
No: Log in as admin and run euca-describe-instance
. Is the instance there?
- Yes:
- No: Start over and run a new instance, recreate failure, and start these steps over.
3.4 - Problem: snapshot creation failed
On the SC, depending on the backend used for storage:
- For Overlay, use the command to check the disk space in .
- For DAS, use the command to check the disk space in the DAS volumes.
- For any other backend, use its specific commands to check the free space for storage allocated for volumes.
Is there enough space?
Yes: On the OSG host, depending on the backend used for object storage:
For Walrus, use the command to check the disk space in .
For RiakCS or Ceph-RGW, use its specific commands to check the free space for storage allocated for buckets and objects.
Is there enough space? Yes.
Use and note the IP addresses for the OSG and SC.
SSH to SC and ping the OSG. Are there error messages?
No: Delete volumes or add disk space. No: Delete volumes or add disk space.
3.5 - Problem: volume creation failed
Symptom: Went from available to fail. This is typically caused by the CLC and the SC.On the SC, use df
or lvdisplay
to check the disk space. Is there enough space?
Yes: Check the SC log and grep the volume ID. Is there error message? Yes. This provides clues to helpful information. No: Check cloud-output.log for a volume ID error. No: Delete volumes or add disk space.
4 - Component Workarounds
Component Workarounds
This section contains troubleshooting information for Eucalyptus components and services.
4.1 - Access and Identities
This topic contains information about access-related problems and solutions.
Need to verify an existing LIC file.
- Enter the following command: The output from the example above shows the name of the LIC file and status of the synchronization (set to false).
4.2 - Elastic Load Balancing
This topic explains suggestions for problems you might have with Elastic Load Balancing (ELB).
Can’t synchronize with time server
Eucalyptus sets up NTP automatically for any instance that has an internet connection to a public network. If an instance doesn’t have such a connection, set the cloud property loadbalancing.loadbalancer_vm_ntp_server
to a valid NTP server IP address. For example:
euctl loadbalancing.loadbalancer_vm_ntp_server=169.254.169.254
PROPERTY loadbalancing.loadbalancer_vm_ntp_server 169.254.169.254 was {}
Need to debug an ELB instance
To debug an ELB instance, set the loadbalancing.loadbalancer_vm_keyname
cloud property to the keypair of the instance you want to debug. For example:
# euctl loadbalancing.loadbalancer_vm_keyname=sshlogin
PROPERTY loadbalancing.loadbalancer_vm_keyname sshlogin was {}
4.3 - Imaging Worker
This topic contains troubleshooting tips for the Imaging Worker.Some requests that require the Imaging Worker might remain in pending for a long time. For example: an import task or a paravirtual instance run. If request remains in pending, the Imaging Worker instance might not able to run because of a lack of resources (for example, instance slots or IP addresses).
You can check for this scenario by listing latest AutoScaling activities:
euscale-describe-scaling-activities -g asg-euca-internal-imaging-worker-01
Check for failures that indicate inadequate resources such as:
ACTIVITY 1950c4e5-0db9-4b80-ad3b-5c7c59d9c82e 2014-08-12T21:05:32.699Z asg-euca-internal-imaging-worker-01 Failed Not enough resources available: addresses; please stop or terminate unwanted instances or release unassociated elastic IPs and try again, or run with private addressing only
4.4 - Instances
This topic contains information to help you troubleshoot your instances.
Inaccurate IP addresses display in the output of euca-describe-addresses.
This can occur if you add IPs from the wrong subnet into your public IP pool, do a restart on the CC, swap out the wrong ones for the right ones, and do another restart on the CC. To resolve this issue, run the following commands.
Note
A restart should only be performed when no instances are running, or when instance service interruption can be tolerated. A restart causes the CC to reset its networking configuration, regardless of whether or not it is in use.systemctl stop eucalyptus-cloud.service
systemctl stop eucalyptus-cluster.service
iptables -F
systemctl restart eucalyptus-cluster.service
systemctl start eucalyptus-cloud.service
NC does not recalculate disk size correctly
This can occur when trying to add extra disk space for instance ephemeral storage. To resolve this, you need to delete the instance cache and restart the NC.
For example:
rm -rf /var/lib/eucalyptus/instances/*
systemctl restart eucalyptus-node.service
4.5 - Walrus and Storage
This topic contains information about Walrus-related problems and solutions.
Walrus decryption failed.
On Ubuntu 10.04 LTS, kernel version 2.6.32-31 includes a bug that prevents Walrus from decrypting images. This can be determined from the following line in cloud-output.log
javax.crypto.
BadPaddingException: pad block corrupted
If you are running this kernel:
- Update to kernel version 2.6.32-33 or higher.
- De-register the failed image ( ).
- Re-register the bundle that you uploaded ( ).
Walrus physical disk is not large enough.
- Stop the CLC.
- Add a disk.
- Migrate your data.
Make sure you use LVM with your new disk drive(s).