06:35PM EDT - Who wants all the RISC-V cores?!?

06:36PM EDT - Ever growing demand for compute

06:36PM EDT - Energy efficiency is critical

06:37PM EDT - lots of CPUs burn power on superfluous elements of out-of-order

06:38PM EDT - Maximise computer datapath with respect to control

06:38PM EDT - Now for Manticore

06:38PM EDT - 220mm2 per chip

06:38PM EDT - (estimated in 22FDX GloFo)

06:38PM EDT - Four chiplets

06:39PM EDT - die-to-die serial link to each other die

06:39PM EDT - 8 GB HBM2 per die private to that die

06:40PM EDT - Four quadrants of 32 clusters per chiplet

06:40PM EDT - Clusters can do 64 TB/s with each other

06:41PM EDT - 4x L1 quadrants share an L1 cache

06:41PM EDT - Bandwidth thinning scheme to optimize bandwidth to HBM without affecting floorplan

06:41PM EDT - Support a lot of cluster-to-cluster traffic

06:42PM EDT - Each compute cluster has 8 RV32G Snitch cores

06:42PM EDT - Each core has a multi-format SIMD compute unit

06:42PM EDT - supports half-precision bfloat, FP8

06:42PM EDT - Custom ISA extensions

06:44PM EDT - Goal was to maximize compute/control die area ratio

06:44PM EDT - Async with DMA Engine

06:44PM EDT - XSSR - Stream semantic registers

06:44PM EDT - Turn register read/writes into implicit memory load/stores

06:45PM EDT - increases FPU/ALU from 3x-5x

06:46PM EDT - Extension in the core register file

06:47PM EDT - Latency tolerant approach

06:47PM EDT - XFREP - Floating Point Repetition Buffer (programmable micro-loop buffer)

06:48PM EDT - custom instruction indicates start of hardware loop block

06:48PM EDT - 'Psuedo-dual issue' as integer core can work at the same time

06:49PM EDT - SSRs only work on float-only hardware loops

06:49PM EDT - FREP marks the loop

06:50PM EDT - For example, reduction!

06:52PM EDT - single-issue core can saturate an FPU

06:52PM EDT - IPC > 1

06:52PM EDT - FREP acts as instruction amplifier

06:53PM EDT - increased utilization for matmul and dotproduct that might be memory bound

06:54PM EDT - Up to 80 DP GFLOPs/W per cluster

06:55PM EDT - Close tracking of roofline model

06:56PM EDT - 9mm2 prototype made

06:56PM EDT - 22nm FDX

06:56PM EDT - Forward Body Biasing

06:56PM EDT - This is only a prototype small core of chiplet

06:57PM EDT - Snitch cores used for DVFS and IO management

06:58PM EDT - Full 4096 core system expected 27 DP Flops/sec

07:00PM EDT - In max perf mode, competitive vs A100 FP64

07:02PM EDT - snitch inside

07:05PM EDT - Q&A time

07:06PM EDT - Q: how does the compiler target the new instrutcions? A: Loop detection to promote loops that have the required characteristics. Might not always hit all cases - so go down QDNN, offer optimized low level kernels that frameworks would support

07:07PM EDT - Q: Productization? A: Concept so far to explore the key components. Wanted lean and mean RISC-V cores. Still missing the key components at SoC level, such as interconnects, which as a university is hard to come by. Looking into to generating and taping out later system in a research concept in the future.

07:08PM EDT - That's a wrap. Short break until the next sesstion, at half-past. Baidu + Alibaba NPUs

Comments Locked

7 Comments

View All Comments

  • jchang6 - Tuesday, August 18, 2020 - link

    any mention of memory latency?
  • evilpaul666 - Wednesday, August 19, 2020 - link

    Typo? "06:58PM EDT - Full 4096 core system expected 27 DP Flops/sec."
  • TomWomack - Wednesday, August 19, 2020 - link

    And around we go - those SSRs look a lot like a classic vector machine from the Cray era, I look forward to version 2 where you can use an SSR as a destination rather than having them used only for reduction. Maybe there is a technology point where using full-strength instruction decoders and out-of-order issue to run simple vector loops is no longer the right answer ...
  • nandnandnand - Wednesday, August 19, 2020 - link

    Some of these bot commenters are becoming self-aware.
  • Spunjji - Thursday, August 20, 2020 - link

    This made me think of Sony's concept behind Cell, only scaled up to an extreme extent (and not a solution being applied to the wrong problem). Kinda cool!
  • Alan123 - Monday, August 24, 2020 - link

    Hey, I wanted to give something real basic to help with anyone that wakes up with neck pain maybe
     
    https://www.kickstarter.com/projects/676962738/the...

    My recommendation to all of you is to research the best pillow for you so you don't have to wake up with any kind of pain and your spinal cord will thank you!

    That's if you haven't done any research on it...I woke up with neck pain and I was like wtf!?!?.. oh..hello..my stinky stupid feathered pillow is the culprit lol

    This might sound simple but...this could be seriously life changing...isn't it so funny.. a simple pillow can change the wellness of your life? lol
     
    HOPE THIS HELPS YOU GUYS.. IM OFF TO DO MORE RESEARCH!!!!!!!!!!
  • Unashamed_unoriginal_username_x86 - Saturday, March 27, 2021 - link

    Eat my dongus you nerd

Log in

Don't have an account? Sign up now