Operations
This section contains concepts and tasks associated with operating your Eucalyptus cloud.
This is the multi-page printable view of this section. Click here to print.
This section contains concepts and tasks associated with operating your Eucalyptus cloud.
This section is for architects and cloud administrators who plan to deploy Eucalyptus in a production environment. It is not intended for end users or proof-of-concept installations.To run Eucalyptus in a production environment, you must be aware of your hardware and network resources. This guide is to help you make decisions about deploying Eucalyptus. It is also meant to help you keep Eucalyptus running smoothly.
To decide on your deployment’s scope, determine the use case for your cloud. For example, will this be a small dev-test environment, or a large and scalable web services environment?To help with scoping your deployment, see Plan Your Installation in the Installation Guide . There you will find solution examples and physical resource information.
This topic details what you should test when you want to make sure your deployment is working. The following suggested test plan contains tasks that ensure DNS, imaging, and storage are working.
This section describes the most commonly applied post-install customizations and the issues they pose:
Over-subscription refers to the practice of expanding your computer beyond its limits. Over-subscription applies only to node controllers. You may modify disks and cores to allow enough usage buffer for your instance. Navigate to /etc/eucalyptus/
and locate the eucalyptus.conf
file. Edit the following values to define the appropriate size buffers for your instances: NC_WORK_SIZE
Defines the amount of disk space available for instances to be run. Defaults to 1/3 of the currently available disk space on the NC, and NC_CACHE_SIZE defaults to the other 2/3.
NC_CACHE_SIZE
Defines how much disk space is needed for images to be cached. MAX_CORES
Defines the maximum number of cores that can be provided to VMs on each NC. If it is 0 or not present, then the only limit on the number of instances is the number of cores available on the NC. If it is present, any value greater than 256 is treated as 256. In order for these changes to take effect, you must restart the NC.
You can modify the default by adding network IPs to your cloud. Adding public IPs does not require shutting down the whole system.
To add network IPs:In EDGE mode, adding or changing the IP involves creating a JSON file and uploading it the Cloud Controller (CLC). See Configure for Edge Mode for more details. No restart needed, changes apply automatically.
You can change the following CloudWatch properties:
Property | Description |
---|---|
cloud.monitor.default_poll_interval_mins | This is how often the CLC sends a request to the CC for sensor data. Default value is 5 minutes. If you set it to 0 = no reporting. The more often you poll, the more hit on system performance. |
cloud.monitor.history_size | This is how many data value samples are sent in each sensor data request. The default value is 5. How many samples per poll interval. |
cloudwatch.enable_cloudwatch_service | Disables CloudWatch when set to false. |
Capacity changes refer to adding another zone or more nodes. To add another zone, install , start , and register . To add more nodes, see Add a Node Controller .
This topic details best practices for managing your cloud policies.
This topic addresses networking in the Eucalyptus cloud.
Eucalyptus offers different modes to provide you with a cloud that will fit in your current network. For information what each networking mode has to offer, see Plan Networking Modes .
Eucalyptus EDGE networking mode supports EC2-Classic networking. Your instances run in a single, flat network that you share with others. For more information about EC2-Classic networking, see EC2 Supported Platforms .
Eucalyptus VPCMIDO networking mode resembles the Amazon Virtual Private Cloud (VPC) product wherein the network is fully configurable by users. For more information about EC2-VPC networking, see Differences Between Instances in EC2-Classic and EC2-VPC .
This topic includes details about which resources you should monitor.
Component | Running Processes |
---|---|
Cloud Controller (CLC) | eucalyptus-cloud, postgres, eucanetd (VPCMIDO mode) |
User-facing services (UFS) | eucalyptus-cloud |
Walrus | eucalyptus-cloud |
Cluster Controller (CC) | eucalyptus-cluster |
Storage Controller (SC) | eucalyptus-sc, tgtd (for DAS and Overlay) |
Node Controller (NC) | eucalyptus-node, httpd, dhcpd, eucanetd (EDGE mode), qemu-kvm / 1 per instance |
Management Console | eucaconsole |
This section provides details on important files to back up and recover.
This section explains what you need to back up and protect your cloud data.We recommend that you back up the following data:
To back up the cloud database follow the steps listed in this topic.Bucket and object metadata are stored in the Eucalyptus cloud database. To back up the database
Log in to the CLC. The cloud database is on the CLC. Extract the Eucalyptus PostgreSQL database cluster into a script file.
pg_dumpall --oids -c -h/var/lib/eucalyptus/db/data -p8777 -U root -f/root/eucalyptus_pg_dumpall-backup.sql
Back up the cloud security credentials in the keys directory.
tar -czvf ~/eucalyptus-keydir.tgz /var/lib/eucalyptus/keys
This topic explains what to include when you recover your cloud.Recovering Your Cloud Data
To restore the cloud database follow the steps listed in this topic.
Stop the CLC service.
systemctl stop eucalyptus-cloud.service
Remove traces of the old database.
rm -rf /var/lib/eucalyptus/db
Restore the cloud security credentials in the keys directory.
tar -xvf ~/eucalyptus-keydir.tgz -C /
Re-initialize the database structure.
clcadmin-initialize-cloud
Start the database manually.
su eucalyptus -s /bin/bash -c "/usr/bin/pg_ctl start -w \
-s -D/var/lib/eucalyptus/db/data -o '-h0.0.0.0/0 -p8777 -i'"
Restore the backup.
psql -U root -d postgres -p 8777 -h /var/lib/eucalyptus/db/data -f/root/eucalyptus_pg_dumpall-backup.sql
Stop the database manually.
su eucalyptus -s /bin/bash -c "/usr/bin/pg_ctl stop -D/var/lib/eucalyptus/db/data"
Start CLC service
systemctl start eucalyptus-cloud.service
This topic details how to find information you need to troubleshoot most problems in your cloud. To troubleshoot Eucalyptus, you must have the following:
For most problems, the procedure for tracing problems is the same: start at the bottom to verify the bottom-most component, and then work your way up. If you do this, you can be assured that the base is solid. This applies to virtually all Eucalyptus components and also works for proactive, targeted monitoring.
Usually when an issue arises in Eucalyptus, you can find information that points to the nature of the problem either in the Eucalyptus log files or in the system log files. This topic details log file message meanings, location, configuration, and fault log information.
When you have to troubleshoot, it’s important to understand the elements of the network on your system.Here are some ideas for finding out information about your network:
This section describes common problems and workarounds.
Use ping from a client (not the CLC). Can you ping it?
Yes: Check the open ports on security groups and retry connection using SSH or HTTP. Can you connect now? Yes. Okay, then. You’re work is done. No: Try the same procedure as if you can’t ping it up front. No: Is your cloud running in Edge networking mode?
Yes: Run euca-describe-nodes
. Is your instance there?
No, it is not in Edge networking mode:
Eucalyptus offers installation checks for any Eucalyptus component or service (CLC, Walrus, SC, NC, SC, services, and more). When Eucalyptus encounters an error, it presents the problem to the operator. These checks are used for install-time problems. They provide resolutions to some of the fault conditions.
Each problematic condition contains the following information:
Heading | Description |
---|---|
Condition | The fault found by Eucalyptus |
Cause | The cause of the condition |
Initiator | What is at fault |
Location | Where to go to fix the fault |
Resolution | The steps to take to resolve the fault |
For more information about all the faults we support, go to https://github.com/eucalyptus/eucalyptus/tree/master/util/faults/en_US .
Run euca-describe-nodes
to verify if instance is there. Is the instance there?
Yes: Go to the NC log for that NC and grep your instance ID. Did you find the instance?
No: Go to the CC log and grep the instance ID. Is it there error message?
Yes: The error message should give you some helpful information.
No: grep the instance ID in cloud-output.log . Is there error message?
No: Log in as admin and run euca-describe-instance
. Is the instance there?
On the SC, depending on the backend used for storage:
Yes: On the OSG host, depending on the backend used for object storage:
For Walrus, use the command to check the disk space in .
For RiakCS or Ceph-RGW, use its specific commands to check the free space for storage allocated for buckets and objects. Is there enough space? Yes.
Use and note the IP addresses for the OSG and SC.
SSH to SC and ping the OSG. Are there error messages?
No: Delete volumes or add disk space. No: Delete volumes or add disk space.
Symptom: Went from available to fail. This is typically caused by the CLC and the SC.On the SC, use df
or lvdisplay
to check the disk space. Is there enough space?
Yes: Check the SC log and grep the volume ID. Is there error message? Yes. This provides clues to helpful information. No: Check cloud-output.log for a volume ID error. No: Delete volumes or add disk space.
This section contains troubleshooting information for Eucalyptus components and services.
This topic contains information about access-related problems and solutions. Need to verify an existing LIC file.
This topic explains suggestions for problems you might have with Elastic Load Balancing (ELB).
Can’t synchronize with time server
Eucalyptus sets up NTP automatically for any instance that has an internet connection to a public network. If an instance doesn’t have such a connection, set the cloud property loadbalancing.loadbalancer_vm_ntp_server
to a valid NTP server IP address. For example:
euctl loadbalancing.loadbalancer_vm_ntp_server=169.254.169.254
PROPERTY loadbalancing.loadbalancer_vm_ntp_server 169.254.169.254 was {}
Need to debug an ELB instance
To debug an ELB instance, set the loadbalancing.loadbalancer_vm_keyname
cloud property to the keypair of the instance you want to debug. For example:
# euctl loadbalancing.loadbalancer_vm_keyname=sshlogin
PROPERTY loadbalancing.loadbalancer_vm_keyname sshlogin was {}
This topic contains troubleshooting tips for the Imaging Worker.Some requests that require the Imaging Worker might remain in pending for a long time. For example: an import task or a paravirtual instance run. If request remains in pending, the Imaging Worker instance might not able to run because of a lack of resources (for example, instance slots or IP addresses).
You can check for this scenario by listing latest AutoScaling activities:
euscale-describe-scaling-activities -g asg-euca-internal-imaging-worker-01
Check for failures that indicate inadequate resources such as:
ACTIVITY 1950c4e5-0db9-4b80-ad3b-5c7c59d9c82e 2014-08-12T21:05:32.699Z asg-euca-internal-imaging-worker-01 Failed Not enough resources available: addresses; please stop or terminate unwanted instances or release unassociated elastic IPs and try again, or run with private addressing only
This topic contains information to help you troubleshoot your instances. Inaccurate IP addresses display in the output of euca-describe-addresses. This can occur if you add IPs from the wrong subnet into your public IP pool, do a restart on the CC, swap out the wrong ones for the right ones, and do another restart on the CC. To resolve this issue, run the following commands.
systemctl stop eucalyptus-cloud.service
systemctl stop eucalyptus-cluster.service
iptables -F
systemctl restart eucalyptus-cluster.service
systemctl start eucalyptus-cloud.service
NC does not recalculate disk size correctly This can occur when trying to add extra disk space for instance ephemeral storage. To resolve this, you need to delete the instance cache and restart the NC.
For example:
rm -rf /var/lib/eucalyptus/instances/*
systemctl restart eucalyptus-node.service
This topic contains information about Walrus-related problems and solutions. Walrus decryption failed. On Ubuntu 10.04 LTS, kernel version 2.6.32-31 includes a bug that prevents Walrus from decrypting images. This can be determined from the following line in cloud-output.log
javax.crypto.
BadPaddingException: pad block corrupted
If you are running this kernel:
Walrus physical disk is not large enough.