SWY's technical notes

Relevant mostly to OS X admins

Why you can’t always trust the Hardware Compatibility List (HCL)

Late last year, I started on a project to replace my workplace’s main AFP/SMB/NFS file storage.  Due to already having good experiences with Synology gear, knowing DSM well, and knowing other macadmins happy with their Synos, I ordered storage and built a system that should perform quite well:

  • Synology 3614RPXS unit
  • 10 HGST 7k4000 4TB drives
  • 2 250gig SSD drives
  • 10 Gig SFP+ card, fiber networking back to the switches.
  • +16 Gigs ECC RAM

All taken from the Synology HCL- so I was ready to rock.  I built it as a RAID6, started copying files, and everything was looking good.  Once I’d moved a few TB onto the NAS, I started checking out consecutive read speeds from the NAS, and found an unexpected behavior:  the reads would often saturate a GigE connection as you’d expect, until they didn’t- large transfers would look like Image

When it’s great, it’s great.  When it wasn’t it wasn’t.  REALLY wasn’t.  It would run for about 90 seconds at full tilt, then 20 of nearly nothing.

During these file copies, the overall Volume Utilization levels would ebb and flow with an inverse relationship with speed.  When volume utilization approaches and hits 100%, the network speed plummets.Pasted_Image_3_11_15__10_15_AM

So the question became “why does the volume utilization go so high?” I started a ticket with Synology on Feb 2.  I did their tests and requested configuration changes- direct GigE connections to take the LAN out of the equation, SMB/AFP/NFS, disabling every service that makes the NAS a compelling product.  This stumped the U.S. based support, so it became an issue for the .tw engineers.

If your ticket goes off to the Taiwanese engineers, the communications cycles start to rival taking pen to paper and paying the government to deliver the paper. To Taiwan.  It all runs through the US support staff, and it gets slow. Eventually, I coordinated a screen sharing session with an engineer, where I replicated the issue.  They tested more… htop, iostat.  “can you make a new volume?”  “if you send me disks I can!”

Meanwhile, I’m asking the storage guys I know on Twitter (and their friends), and scouring the Synology forums for anybody who has an answer.  Eventually, I don’t find an answer, but someone else who has the same experience.  We start collaborating.  Then a few days later, I find another forum post from a user who has the same issues.  We start exploring ideas… amount of RAM?  Version of DSM that built the storage?  RAID level?  Then we find the overlap: we all use Hitachi HUS724040AL[AE]640 drives, and at least 10 of them in an volume.  One user was fine with 8 of them in a NAS, but when expanded to 13, performance changed and led to his post looking for help.

I then brought this information to Synology, and on March 27, Synology informed me they were trying to replicate the issue with that gear.  On April 16, they’d finally received drives to test.  On April 21, they agreed with my conclusion:

The software developers have came to a conclusion on this issue. That is, the Hitachi with a HUS724040… suffix indeed has slow performance in a RAID consisted of more than 6 disks.

Despite being on the list, and despite configuring everything properly, I still ended up with gear that did not perform as expected, as they’d not tested this number of drives in a volume.  Hitachi now tells me that they’re working on the issue with Synology, but in the meantime, I’m abandoning the Hitachi drives for WD Red Pros.

Advertisements

4 responses to “Why you can’t always trust the Hardware Compatibility List (HCL)

  1. Steve Surowiec October 30, 2015 at 1:15 pm

    Were you able to resolve this? I think i have the same problem. I have 8 of these drives in an SHR type raid. Network throughput is near unusable. If I were to create separate volumes across the 8 disks would that improve performance? Do you have any links (perhaps from synology) that describe this problem in more detail?

    • swy October 31, 2015 at 12:47 pm

      There does seem to be a relationship between the count of drives in a volume and the performance degradation, but exactly where a regrettable threshold is hit is not a condition tested and documented by me. Synology replied that the advised limit with these Hitachi drives was 6 per volume, though I found another owner who had no issues at 8.

      My eventual solution to this issue was to abandon the 7K4000 drives for WD Red Pro drives, which have met all performance expectations. I see as of today, the drives are still on the HCL, though only specific firmware versions, and no notes on volume size caps.

  2. Patrick Bouex June 21, 2016 at 9:12 am

    I have 8 drives HGST (HDN724040ALE640) in Raid 5 in a rs3614xs+ with 32 Go Ram and 1To of SSD R/W Cache.
    Since 3 weeks and last DSM update 6.0.1 I got the same problem.
    After a downgrade to 5.2 and updrade to dsm 6, it works fine for 5 hours and at 00:00 the volume utilisation rise 100 %…
    Do you have any idea to solve this ?
    thanks

    • swy June 21, 2016 at 1:13 pm

      Sorry, I don’t know anything better for this than to not use those drives, and know you’re not alone. Not the answer you want to hear, but I’ve nothing better for this condition.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: