Manycore processing unit
Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are used extensively in embedded computers and high-performance computing.
Contrast with multicore architecture
Manycore processors are distinct from multi-core processors in being optimized from the outset for a higher degree of explicit parallelism, and for higher throughput (or lower power consumption) at the expense of latency and lower single-thread performance.
The broader category of multi-core processors, by contrast, are usually designed to efficiently run both parallel and serial code, and therefore place more emphasis on high single-thread performance (e.g. devoting more silicon to out-of-order execution, deeper pipelines, more superscalar execution units, and larger, more general caches), and shared memory. These techniques devote runtime resources toward figuring out implicit parallelism in a single thread. They are used in systems where they have evolved continuously (with backward compatibility) from single core processors. They usually have a 'few' cores (e.g. 2, 4, 8) and may be complemented by a manycore accelerator (such as a GPU) in a heterogeneous system.
Motivation
Cache coherency is an issue limiting the scaling of multicore processors. Manycore processors may bypass this with methods such as message passing,[1] scratchpad memory, DMA,[2] partitioned global address space,[3] or read-only/non-coherent caches. A manycore processor using a network on a chip and local memories gives software the opportunity to explicitly optimise the spatial layout of tasks (e.g. as seen in tooling developed for TrueNorth).[4]
Manycore processors may have more in common (conceptually) with technologies originating in high-performance computing such as clusters and vector processors.[5]
GPUs may be considered a form of manycore processor having multiple shader processing units, and only being suitable for highly parallel code (high throughput, but extremely poor single thread performance).
Suitable programming models
- Message passing interface
- OpenCL[6] or other APIs supporting compute kernels
- Partitioned global address space
- Actor model
- OpenMP[7]
- Dataflow
Classes of manycore systems
- GPUs, which can be described as manycore vector processors
- Massively parallel processor array
- Asynchronous array of simple processors
Specific manycore architectures
- ZettaScaler [1], Japanese PEZY Computing 2,048-core modules
- Xeon Phi coprocessor, which has MIC (Many Integrated Cores) architecture
- Tilera
- Adapteva Epiphany Architecture, a manycore chip using PGAS scratchpad memory
- Coherent Logix hx3100 Processor, a 100-core DSP/GPP processor based on HyperX Architecture
- Movidius Myriad 2, a manycore vision processing unit (VPU)
- Kalray, a manycore PCI-e accelerator for data-intensive tasks
- Teraflops Research Chip, a manycore processor using message passing
- TrueNorth, an AI accelerator with a manycore network on a chip architecture
- Green arrays, a manycore processor using message passing aimed at low power applications
- Sunway SW26010, a 260-core manycore processor used in the then top 1 supercomputer Sunway TaihuLight
- SW52020, an improved 520-core[8][9] variant of SW26010, with 512-bit SIMD (also adding support for half-precision), used in a prototype, meant for an exascale system (and in the future 10 exascale system), and according to datacenterdynamics China is rumored to already have two separate exascale systems secretly[citation needed]
- Eyeriss, a manycore processor designed for running convolutional neural nets for embedded vision applications[10]
- Graphcore, a manycore AI accelerator
Specific manycore computers with 1M+ CPU cores
A number of computers built from multicore processors have one million or more individual CPU cores. Examples include:
- Gyoukou (Japanese: 暁光 Hepburn: gyōkō, dawn light), a supercomputer developed by ExaScaler and PEZY Computing, with 20,480,000 processing elements total plus the 1,250 Intel Xeon D host processors.
- SpiNNaker, a massively parallel (1 million CPU cores) manycore processor (ARM-based) built as part of the Human Brain Project.
Specific computers with 5 million or more CPU cores
Quite a few supercomputers have over 5 million CPU cores. When there are also coprocessors, e.g. GPUs used with, then those cores are not listed in the core-count, then quite a few more computers would hit those targets.
- Frontier
- Fugaku, a Japanese supercomputer using Fujitsu A64FX ARM-based cores, 7,630,848 in total.
- Sunway TaihuLight, a massively parallel (10 million CPU cores) Chinese supercomputer, once one of the fastest supercomputers in the world, using a custom manycore architecture.[citation needed] As of November 2018, it was the world's third fastest supercomputer (as ranked by the TOP500 list), obtaining its performance from 40,960 SW26010 manycore processors, each containing 256 cores.
See also
- Multi-core processor
- Vector processor
- SIMD
- High-performance computing
- Computer cluster
- Multiprocessor system on a chip
- Vision processing unit
- Memory access pattern
- Cache coherency
- Embarrassingly parallel
- Massively parallel
- CUDA
References
- ^ Mattson, Tim (January 2010). "The Future of Many Core Computing: A tale of two processors" (PDF).
- ^ Hendry, Gilbert; Kretschmann, Mark. "IBM Cell Processor" (PDF).
- ^ Olofsson, Andreas; Nordström, Tomas; Ul-Abdin, Zain (2014). "Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany". arXiv:1412.5538 [cs.AR].
- ^ Amir, Arnon (June 11, 2015). "IBM SyNAPSE Deep Dive Part 3". IBM Research. Archived from the original on 2021-12-21.
- ^ "cell architecture"."The Cell architecture is like nothing we have ever seen in commodity microprocessors, it is closer in design to multiprocessor vector supercomputers"
- ^ Rick Merritt (June 20, 2011), "OEMs show systems with Intel MIC chips", www.eetimes.com, EE Times
- ^ Barker, J; Bowden, J (2013). "Manycore Parallelism through OpenMP". OpenMP in the Era of Low Power Devices and Accelerators. IWOMP. Lecture Notes in Computer Science, vol 8122. Springer. doi:10.1007/978-3-642-40698-0_4.
- ^ Morgan, Timothy Prickett (2021-02-10). "A First Peek At China's Sunway Exascale Supercomputer". The Next Platform. Retrieved 2021-11-18.
- ^ Hemsoth, Nicole (2021-04-19). "China's Exascale Prototype Supercomputer Tests AI Workloads". The Next Platform. Retrieved 2021-11-18.
- ^ Chen, Yu-Hsin; Krishna, Tushar; Emer, Joel; Sze, Vivienne (2016). "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks". IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers. pp. 262–263.
External links
- Architecting solutions for the Manycore future, published on Feb 19, 2010 (more than one dead link in the slide)
- Eyeriss architecture