DS8000 Service Documentation Version 7.5

MAP4510 CEC enclosure to CEC enclosure communication failure

About this task

This MAP is used for communication failures between the partner logical partitions (LPARs) in the CEC enclosures. The communication begins after AIX is loaded and the functional code load occurs. For model 961, this communication between CEC enclosures is across the PCIe interface. For models 941 and 951, this communication between CEC enclosures is across the RIO interface.

Because this communication uses redundant paths, no single cable fault should cause a communication failure. Each LPAR periodically sends a communication message to the partner LPAR (heartbeat) and sets a timer waiting for the response. If the timer expires with no response, the error recovery process will cause the non-responding LPAR to failover its resources to the originating LPAR and then reboot itself to try to restore communication.
  • If communication is restored, a fail-back occurs to return resources, and no serviceable event is created.
  • If communication still fails, the LPAR is fenced and a serviceable event is created to send you to this MAP.

Procedure

  1. Display the status of the Servers:
    1. From the navigation area, click Storage Facility Management > storage facility > Server View.
    2. Note the status of each server, and the reference code, if one is displayed.
      Record the serial number of any server that does not show a status of "Operating," or displays a reference code.
  2. Display the status of the LPARs:
    1. From the navigation area, click Storage Facility Management > storage facility > Server View > server.
    2. Note the status of each LPAR, and the reference code, if one is displayed.
      Record the name of any LPAR that does not show a status of "Running," or displays a reference code.
    3. Repeat steps a and b for the other server.
  3. If a server does not show a status of "Operating" or is missing from the display, check the input power LEDs of both CEC enclosure power supplies for that server.

    Is at least one input power LED lit?

  4. Do one of the following:
    • If any server or partition that displays an Operator Panel Value is not responding, exit this MAP and go to MAP4360 Codes displayed by the CEC enclosure control panel. After the repair is complete, close the serviceable event that sent you here.
    • If any Server State is not "Operating" or any Partition State is not "Running," display and repair any related serviceable events for that CEC enclosure. After the repair is complete, close the serviceable event that sent you here. If no related serviceable events are found, contact your next level of support.