DS8000 Service Documentation Version 6.3.3

MAP2220 CEC enclosure to RPC card communication failure

A view of the detail data for one of the RPCs interrupt registers is not available from one or both clusters.

About this task

Only the base LPAR in a CEC enclosure communicates with the RPC cards. The LPAR communicates with the service processor through the RTAS interface. The service processor communicates with both RPC cards through a shared I2C interface. The same I2C interface also communicates with the CEC enclosure control panel.

MAP2220 Section-1

Procedure

Display open serviceable events. Find those with the SRCs listed in Table 1. Use Table 1 to prioritize the order of repair.
Table 1. SRC and associated actions
SRC repair order SRC definition Go to
BE190010 CEC enclosure 1 failed to communicate with RPC1 and RPC2. MAP2220 Section-2 (One CEC communication failure with both RPC cards)
BE190011 CEC enclosure 2 failed to communicate with RPC1 and RPC2.
BE190012 CEC enclosure 1 and 2 failed to communicate with RPC1. MAP2220 Section-3 (Both CECs communication failure with one RPC card)
BE190013 CEC enclosure 1 and 2 failed to communicate with RPC2.
BE190014 CEC enclosure 1 failed to communicate with RPC1. MAP2220 Section-6 (One CEC communication failure with one RPC card)
BE190015 CEC enclosure 2 failed to communicate with RPC1.
BE190016 CEC enclosure 1 failed to communicate with RPC2.
BE190017 CEC enclosure 2 failed to communicate with RPC2.

MAP2220 Section-2 (One CEC communication failure with both RPC cards)

About this task

One CEC enclosure has a communication failure through its service processor to both RPC cards.

Procedure

  1. Use the SRC in the serviceable event that sent you here and refer to Table 2 for an overall description of your failure. The failure could be caused by one of the following FRUs:
    • CEC enclosure control panel
    • CEC enclosure service processor
    • CEC enclosure I/O backplane assembly
    • RPC card
    • Y-Cable, CEC enclosure to both RPC cards
    Table 2. SRC communication-failure resources
    SRC Communication failure resources
    BE190010 CEC enclosure 1 failure to communicate with RPC1 card and RPC2 card
    BE190011 CEC enclosure 2 failure to communicate with RPC1 card and RPC2 card
  2. Display open serviceable events that need repair. Is there any other serviceable event with FRUs listed in step 1?
    • Yes, exit this MAP and attempt to repair that serviceable event first. If that repair does not correct this problem, return here and continue with the next step. If that repair does correct this problem, remember to also close this serviceable event.
    • No, go to the next step.
  3. Observe the CEC enclosure control panel. Does the control panel display a ball icon slowly moving clockwise around the display?

MAP2220 Section-3 (Both CECs communication failure with one RPC card)

About this task

Both CEC enclosures have a communication failure through their service processor to the same RPC card. The working RPC card is not reporting any problem with the suspect RPC card. There should be no single point of failure that can cause this failure condition.

Procedure

  1. Use the SRC in the serviceable event that sent you here to determine which RPC card the CEC failed to communicate with. Refer to Table 3.
    Table 3. SRC communication interface-failure resources
    SRC Communication failure resources
    BE190012 Both CEC enclosure service processors, failure to communicate with RPC1 card (right, viewed from rear)
    BE190013 Both CEC enclosure service processors, failure to communicate with RPC2 card (left, viewed from rear)
    Figure 1. Location codes for the RPC cards
    RPC card
  2. Display serviceable events that need repair. Is there any other serviceable event listing the RPC card determined in step 1 or CEC enclosure to RPC card cable that connects to it?
    • Yes, exit this MAP and attempt to repair that serviceable event first. If that repair does not correct this problem also, return here and continue at the next step. If that repair does not correct this problem, remember to also close this serviceable event.
    • No, go to the next step.
  3. At the suspect RPC card determined in step 1, ensure all the cables are properly connected before you continue. If any cables are not connected, do not connect them now. Instead go to the next step and do a psuedo repair of the RPC card. When you are directed to replace the RPC card, do not replace it but instead connect the cables.
  4. The possible failing FRUs are the RPC card and CEC enclosure to RPC card cable.
    1. Use the Exchange Parts procedure to select the suspect RPC card determined in step 1:
      1. From the navigation area, click Storage Facility Management > storage facility.
      2. From the bottom Task area, click Exchange Parts > Exchange Rack Components .... The Show Rack Enclosures window opens.
      3. Select a rack and click Show FRUs. The Show Rack FRUs window opens.
    2. Select the suspect RPC Card FRU and continue the guided repair.
      1. From the Show Rack FRUs window, select RPC Card and click Exchange FRU.

MAP2220 Section-4

Procedure

  1. The CEC enclosure control panel displays this clockwise moving icon when:
    • The control panel was installed into the CEC enclosure and the service processor firmware has not logically installed the control panel, see step 2.
    • The control panel is failing and needs to be replaced.
  2. The default setting for the control panel is "installed." If the setting is "not installed" and the control panel is installed in the CEC enclosure with CEC power on, the icon moves slowly clockwise around the display. Use the ASMI Concurrent Maintenance > Control Panel menu option to ensure the control panel is logically installed. Refer to the MAP1221 ASMI menu structure.
  3. If the control panel itself is failing, replace the control panel. Refer to MAP1215 Replace a FRU.

MAP2220 Section-5

Procedure

  1. Determine whether the CEC enclosure to RPC card cable (Y cable) is causing the problem. Ensure the cable from the CEC enclosure to both RPC cards is fully seated by pressing on each connector. The cable plugs into RPC card connectors J210 or J214 (Figure 2) and the CEC end plugs into the I2C (Figure 3) connector U789D.001.sssssss-P1-I2C (to the left of the SPCN connectors).
  2. Determine which RPC is fenced.
    • If an RPC card is listed in the serviceable event FRU list, then it is fenced.
    • If an RPC card is not listed in the serviceable event FRU list, use MAP1100 View storage facility state (end of call) to display fenced resources and then click Details to determine which RPC is fenced.
    • If no RPC shows as fenced, check for codes XE or XF on the status display of both PPSs. Use Table 4 to select which RPC card to use for the next step.
      Note: Any combination of codes may be displayed. You are only concerned with XE and XF.
      Table 4. RPC card to pseudo-repair when none are fenced
      PPS status display RPC card to pseudo-repair in next step
      Both PPSs show XF RPC2 card (R1-C2)
      Both PPSs show XE RPC1 card (R1-C1)
      Any other condition, including:
      XE on one PPS (but not both)
      XF on one PPS (but not both)
      RPC1 card (R1-C1)
  3. Do a pseudo repair of the fenced RPC card. This resets the existing RPC card without replacing it.
  4. Was the FRU verification successful?
    • Yes, go to step 5.
    • No, go to step 6.
  5. Seating the cable corrected the problem. Exit this MAP and close related serviceable events.
  6. Determine whether the control panel is causing the problem. Use the ASM interface to logically and physically remove the CEC enclosure control panel. Refer to MAP4110 Exchange the CEC enclosure control panel (concurrent).
    Important: The CEC enclosure must not be power cycled or rebooted while the control panel is removed, to ensure that the load of the functional code succeeds.
  7. Do a pseudo repair of the fenced RPC card. This resets the existing RPC card without replacing it.
  8. Was the FRU verification successful?
    • Yes, go to step 9.
    • No, go to step 10.
  9. The control panel is causing the failure when it is installed. Replace the control panel.
  10. The CEC enclosure control panel is not failing. Use MAP4110 Exchange the CEC enclosure control panel (concurrent) to reinstall the existing control panel.
  11. Replace the remaining FRUs in the serviceable event FRU list. If there are none, then the possible failing FRUs are listed below.
    • CEC enclosure service processor
    • CEC enclosure I/O backplane
    • RPC1 card (only when it is fenced; use MAP1100 to display fenced resources)
    • RPC2 card (only when it is fenced; use MAP1100 to display fenced resources)
    • Y-Cable, CEC enclosure to both RPC cards (plugs to the J210 or J214 connectors on both RPC cards)
    Figure 2. Location codes for the RPC cards
    RPC card
    Figure 3. Location codes for the CEC enclosure (rear view)
    Location codes for the CEC enclosure (rear view)

MAP2220 Section-6 (One CEC communication failure with one RPC card)

About this task

One CEC enclosure has a communication failure through the service processor to one RPC card. The RPC card is not reporting any communication failures to the service processor.

Procedure

  1. Determine whether an LPAR is fenced (not all resources running) using MAP1100 Section-16, LPARs IMLed (CPSS).
    Note: One of the two RPC cards should already be fenced.

    Is an LPAR fenced?

  2. Use the SRC in the serviceable event that sent you here to determine the failing communication interface using Table 5.
    • Determine whether the Y cable is causing the problem. Ensure the Y cable from the CEC enclosure to both RPC cards is fully seated by pressing on each connector.
    • If a cable or connector is found, the cable can be hot-plugged without first powering off the RPC card.
    • Most likely one of the RPC cards is fenced. Before this service action is complete, if that RPC card is not replaced for another reason, then that RPC card needs to go through a pseudo repair. A pseudo repair means the parts exchange process for that RPC card is used to create a reset to that card without physically replacing that RPC card. MAP1100 Section-27, Fenced Resources can be used to display fenced resources.
    Table 5. SRC communication interface-failure resources
    SRC Communication failure resources Y-Cable Connections for this SRC
    BE190014 CEC enclosure 1 (upper), failure to communicate with RPC1 card (right viewed from rear)
    BE190016 CEC enclosure 1 (upper), failure to communicate with RPC2 card (left viewed from rear)
    BE190015 CEC enclosure 2 (lower), failure to communicate with RPC1 card (right viewed from rear)
    BE190017 CEC enclosure 2 (lower), failure to communicate with RPC2 card (left viewed from rear)
  3. Display serviceable events that need repair.

    Is there any other serviceable event with one or more of the same FRUs as listed in the serviceable event that sent you to this MAP?

    • Yes, exit this MAP and repair the other serviceable event.
      • If the repair of the other serviceable event is successful, it closes automatically. Manually close the serviceable event that sent you here.
      • If the repair is not successful, go to step 4.
    • No, go to step 4.
  4. Replace hardware FRUs listed in the serviceable event that sent you here until the problem is repaired.
    If no hardware FRUs are listed, the possible failing FRUs are listed here. To exchange a FRU that is not listed in a serviceable event, use the following management console navigation option: Storage Facility Management > storage facility > Exchange Parts.
    • CEC enclosure I/O backplane assembly
    • Either RPC card
    • CEC enclosure service processor
    • CEC enclosure operator panel
    • RPC card to CEC enclosure Y cable (see Table 5)
    • RPC card to base rack PPSs Y cable (see Table 5)