At Samsung's Tech Day 2018 they debuted a collaboration with Xilinx to develop Smart SSDs that would combine storage with FPGA-based compute accelerator capabilities. Their proof of concept prototype combining a Samsung SSD and Xilinx FPGA on a PCIe add-in card has evolved into a 4TB U.2 drive that has completed customer qualification and reached general availability.

The Samsung SmartSSD CSD includes all the guts of one of their high-end PCIe Gen3 enterprise SSDs, plus the second-largest FPGA from Xilinx's Kintex Ultrascale+ (16nm) family and 4GB of DDR4 specifically for the FPGA to use. The SmartSSD CSD uses a portion of the FPGA as a PCIe switch, so the FPGA and SSD each appear to the host system as separate PCIe endpoints and all PCIe traffic going to the SSD is first routed through the FPGA.

In a server equipped with dozens of large and fast SSDs, actually trying to make use of all that stored data can lead to bottlenecks with the CPU's IO bandwidth or compute power. Putting compute resources on each SSD means the compute capacity and bandwidth scales with the number of drives. Classic examples of compute tasks to offload onto storage devices are compression and encryption, but reconfigurable FPGA accelerators can help with a much broader range of tasks.  

Xilinx has been building up a library of IP for storage accelerators that customers can use with the SmartSSD CSD, as part of their Vitis libararies of building blocks and and Xilinx Storage Services turnkey solutions. Samsung has worked with Bigstream to implement Apache Spark analytics acceleration. Third party IP that has been developed for Xilinx's Alveo accelerator cards can also be ported to the SmartSSD CSD thanks to the common underlying FPGA platform, so IP like Eideticom's NoLoad CSP are an option.

The Samsung SmartSSD CSD is being manufactured by Samsung and sold by Xilinx, initially with 3.84TB capacity but other sizes are planned.

Comments Locked

25 Comments

View All Comments

  • SaberKOG91 - Friday, November 13, 2020 - link

    Good to see this finally come to fruition on the hardware side. Here's hoping software support catches up.
  • Duncan Macdonald - Friday, November 13, 2020 - link

    One big use may well be searching - telling a SSD to find all documents containing a specified string for example. This is a job that is CPU and I/O intensive on a traditional system.
  • vol.2 - Friday, November 13, 2020 - link

    That's so true. I'm still shocked every time I put a search query into explorer and windows takes ten minutes to spit out the results. Feels like it hasn't changed in at least 10 years.
  • ClaudioMP - Sunday, November 22, 2020 - link

    One more I saw hardware implementation ,that will save Windows ASS ,whit very obsolete NTFS,MacOs searches are insanely fast whit minimal overhead for the computer.
  • K_Space - Sunday, November 15, 2020 - link

    I'm a noob so please correct me if I'm wrong: it sounds like this is a product intended for server with typically huge number of drives or SSDs but limited CPU cores, no?
    Isn't the windows example limitation from the OS? Most desktop users will probably have more cores than parititons; sorting out a search from couple/few partitions sounds like a software limitation?
  • TomWomack - Friday, November 13, 2020 - link

    That is a $7000 FPGA which is 3/4 filled with the logic to drive the PCIe and SSD interfaces, which makes the discussion of 'servers with dozens of them in' quite alarmingly expensive; also leads me to wonder why they've integrated it with $800 worth of flash. I am also quite surprised not to see a QSFP+ port on the SSD. I'm sure there is some really exciting use case that they will write a white paper about, but this is really very esoteric for the front page of anandtech.
  • AlexDaum - Friday, November 13, 2020 - link

    I'm sure Samsung can get that FPGA cheaper when they buy large quantities, but it's still going to be expensive.
  • MrSpadge - Friday, November 13, 2020 - link

    Well, that 7000$ is surely not the fabrication cost, just the end user price, right?
  • SaberKOG91 - Friday, November 13, 2020 - link

    BittWare make an accelerator called the 250-U2 which seems to retail for ~2800$ with 8GB of DDR4. They also have a PCI-E card version with 3.84TB of storage which goes for ~6600$. Samsung cuts out the overhead by selling their own Flash, so I wouldn't be surprised if MSRP was closer to 5-6k$.

    I'm not at all surprised there's no QSFP+. U.2 is a great form-factor for both compatibility with existing storage server designs and handling the thermals.

    That said, people seem to be missing the point of these devices. If you operate on large datasets, being able to optimize access at the device level can significantly reduce the amount of computing power needed to do the same operations with a CPU in RAM, especially when you are using an FPGA to implement parts of the hardware. The cost of one of these easily outweighs the cost of having multiple NVMe drives to satisfy the same I/O requirements and then having to rely on sub-optimal CPU algorithms for processing that data. This will be a huge boon to database and search engine performance. You could also use them to implement rather large content-addressable memories for networking applications and the like.
  • Spunjji - Monday, November 16, 2020 - link

    "That is a $7000 FPGA which is 3/4 filled with the logic to drive the PCIe and SSD interfaces"

    The SSD interface doesn't run on the FPGA - just the PCIe switch. There's no indication that it's "3/4" full, and in fact that seems like a bizarre assumption to make given the obvious downsides that you note.

Log in

Don't have an account? Sign up now