Scalability

A Hadoop cluster is easy to scale by adding new servers or whole server racks to the cluster and increasing the memory in the NameNodes to deal with the rising load. This will generate a lot of "rebalancing traffic" at first, but will deliver extra storage and computation power. Because the NameNodes do matter, we recommend that you pay the premiums for those machines.

Use the following guidelines to scale your existing Hadoop cluster:

  1. Ensure there is potential free space in the data center near the Hadoop cluster. This space should be able to accommodate the power budget for more racks.

  2. Plan the network to cope with more servers.

  3. It might be possible to add more disks and RAM to the existing servers — and extra CPUs if the servers have spare sockets. This can expand an existing cluster without adding more racks or making network changes.

  4. A hardware upgrade in a live cluster can take considerable time and effort. We recommend that you plan the expansion one server at a time.

  5. CPU parts do not remain on the vendors price list forever. If you do plan to add the second CPU, consult with your reseller on when they will cut the price of CPUs that are your existing parts and buy these parts when available. This typically takes at least 18 months.

  6. You are likely to need more memory in the master servers.

Found a mistake? Seleсt text and press Ctrl+Enter to report it