|
| 1 | +# Title of Project - "CORE-V CV-Mesh" |
| 2 | +# Project Concept Proposal |
| 3 | +## Date of proposal - 2023-07-24 |
| 4 | +## Author(s) - Jonathan Balkind, Assistant Professor, UC Santa Barbara |
| 5 | + |
| 6 | +## High Level Summary of project, project components, and deliverables |
| 7 | + |
| 8 | +OpenPiton is a manycore processor design and research framework, in development since late 2013 and open-source since mid 2015. Its coherence system, known as P-Mesh, enables creation of large meshes of cores and other heterogeneous elements. A number of designs have been taped out including Piton (25 tiles/cores, using the OpenSPARC T1 core), CIFER (8 tiles: 22 cores, including 4 CVA6 cores, plus an eFPGA), and DECADES (108 tiles: 60 CVA6 cores, 24 accelerators, 23 intelligent storage tiles, plus an eFPGA). |
| 9 | + |
| 10 | +The goal of this project is to bring the P-Mesh coherence system into the OpenHW ecosystem to enable users to build large meshes of OpenHW CORE-V and other cores and accelerators. We propose to name this CV-Mesh and to separate it from OpenPiton itself as an independent IP block. |
| 11 | + |
| 12 | +### Features of P-Mesh (to be adopted as CV-Mesh) |
| 13 | + |
| 14 | +* Directory-based coherence model |
| 15 | +* Three-level cachee hierarchy |
| 16 | +* MESI protocol |
| 17 | +* Support for heterogeneous cores via Transaction-Response Interface |
| 18 | +* Support for coherent LLC access from other heterogeneous elements |
| 19 | +* Can connect arbitrary point-to-point ordered NoCs |
| 20 | +* Open source (BSD license) |
| 21 | +* SystemVerilog |
| 22 | + |
| 23 | +### Components |
| 24 | + |
| 25 | +* Component 1: CV-Mesh protocol specification |
| 26 | +* Component 2: CV-Mesh user guide |
| 27 | +* Component 3: RTL implementation of the local private cache (L1.5 cache from OpenPiton P-Mesh) verified to TRL 5 |
| 28 | +* Component 4: RTL implementation of the shared last-level cache (L2 cache from OpenPiton P-Mesh) verified to TRL 5 |
| 29 | +* Component 5: RTL implementation of bridges to/from other data movement protocols (e.g. AXI-Lite, AXI) verified to TRL 3 |
| 30 | +* Component 6: RTL implementation of physical 2D-mesh network-on-chip (dynamic node network from OpenPiton P-Mesh) verified to TRL 3 |
| 31 | + |
| 32 | +## Summary of market or input requirements |
| 33 | +### Known market/project requirements at PC gate |
| 34 | + |
| 35 | +* OpenPiton designs have been taped out by a number of teams, including the following chips: |
| 36 | + * Piton (25 tiles/cores, using the OpenSPARC T1 core) in 32nm technology |
| 37 | + * CIFER (8 tiles: 22 cores, including 4 CVA6 cores, plus an eFPGA) in 12nm technology |
| 38 | + * DECADES (108 tiles: 60 CVA6 cores, 24 accelerators, 23 intelligent storage tiles, plus an eFPGA) in 12nm technology |
| 39 | + * Intel (8 tiles/cores, using the CVA6 core) in Intel 4 technology |
| 40 | + |
| 41 | +### Potential future enhancements |
| 42 | + |
| 43 | + |
| 44 | +## Who would make use of OpenHW output |
| 45 | + |
| 46 | +Those interested in building scalable clusters of OpenHW CORE-V and other cores. |
| 47 | + |
| 48 | +## Initial Estimate of Timeline |
| 49 | + |
| 50 | +* Separating coherence IP from OpenPiton repository (Q4 2023) |
| 51 | +* Connection to CV-HPDC (initial support and validation complete in Q1 2024) |
| 52 | +* Enhancing performance characteristics and parameterisation (Q2 2024) |
| 53 | +* Standalone verification environment (Q4 2024) |
| 54 | +* Improvement of documentation (Q4 2024) |
| 55 | +* User guide (Q4 2024) |
| 56 | + |
| 57 | +## Explanation of why OpenHW should do this project |
| 58 | + |
| 59 | +The P-Mesh coherence system as established in OpenPiton has been in development since late 2013 and open-source since mid 2015. The system already supports the CVA6 core and has seen significant adoption. The IP has been well validated for use with a number of cores and ISAs, as well as heterogeneous capabilities for integrating accelerators, FPGAs, and more, including large chips with 100s of tiles and billions of transistors. The project will extend OpenHW's move into HPC and combined with CV-HPDC will enable connection of higher performance cores in the near future. |
| 60 | + |
| 61 | +## Industry landscape: description of competing, alternative, or related efforts in the industry |
| 62 | + |
| 63 | +### BedRock |
| 64 | + |
| 65 | +The BedRock coherence protocol was established for creation of coherent clusters of BlackParrot cores. |
| 66 | + |
| 67 | +* Directory-based coherence model |
| 68 | +* Two-level cachee hierarchy |
| 69 | +* Capable of coherence protocols like MOESIF (and many subsets) |
| 70 | +* Uses a microcoded coherence engine |
| 71 | +* Open source (BSD license) |
| 72 | +* SystemVerilog |
| 73 | + |
| 74 | +### ESP |
| 75 | + |
| 76 | +ESP has a long history and focuses on accelerator-rich SoCs. Originally designed for LEON3 but today supports CVA6 and Ibex as host cores. |
| 77 | + |
| 78 | +* Directory-based coherence model |
| 79 | +* Three-level cache hierarchy |
| 80 | +* MESI protocol or Spandex heterogeneous coherence |
| 81 | +* 32 bit physical addresses |
| 82 | +* Open source (Apache license) |
| 83 | +* SystemC & SystemVerilog |
| 84 | + |
| 85 | +### TileLink 2 |
| 86 | + |
| 87 | +TileLink is the primary coherence protocol used among users of Rocket/BOOM, and was specified by SiFive. There are a number of configurable implementations so we elide the details here. |
| 88 | + |
| 89 | +* Supports both snooping and directory-based coherence models |
| 90 | +* Open source (varies - BSD license for some IP) |
| 91 | +* Chisel (primarily) |
| 92 | + |
| 93 | +### AMBA ACE/CHI/etc |
| 94 | + |
| 95 | +Arm's AMBA protocols include ACE and CHI which enable coherent operation. |
| 96 | + |
| 97 | +* Primarily snoop-based coherence model |
| 98 | +* Commercial protocols with some open source implementations |
| 99 | +* OpenHW ACE implementation in SystemVerilog: "CORE-V tightly-coupled cache coherence mechanism for CVA6" |
| 100 | + |
| 101 | +## OpenHW Members/Participants committed to participate |
| 102 | + |
| 103 | +Jonathan Balkind, Assistant Professor, UC Santa Barbara |
| 104 | +Miquel Moretó, Associate Researcher, Barcelona Supercomputing Center & Associate Professor, Universitat Politècnica de Catalunya (UPC) |
| 105 | +Lluc Alvarez, Established Researcher, Barcelona Supercomputing Center |
| 106 | +César Fuguet, CEA List, Grenoble |
| 107 | + |
| 108 | +## Project Leader(s) |
| 109 | +### Technical Project Leader(s) |
| 110 | + |
| 111 | +Jonathan Balkind, Assistant Professor, UC Santa Barbara |
| 112 | + |
| 113 | +### Project Manager, if a PM is designated |
| 114 | + |
| 115 | +N/A |
| 116 | + |
| 117 | +<hr/> |
| 118 | + |
| 119 | + |
| 120 | + |
| 121 | +# Title of Project - "CORE-V CV-Mesh" |
| 122 | +# Project Launch Proposal |
| 123 | +## Date of proposal - 2023-07-24 |
| 124 | +## Author(s) - Jonathan Balkind, Assistant Professor, UC Santa Barbara |
| 125 | + |
| 126 | + |
| 127 | +## Summary of project |
| 128 | + |
| 129 | +OpenPiton is a manycore processor design and research framework, in development since late 2013 and open-source since mid 2015. Its coherence system, known as P-Mesh, enables creation of large meshes of cores and other heterogeneous elements. A number of designs have been taped out including Piton (25 tiles/cores, using the OpenSPARC T1 core), CIFER (8 tiles: 22 cores, including 4 CVA6 cores, plus an eFPGA), and DECADES (108 tiles: 60 CVA6 cores, 24 accelerators, 23 intelligent storage tiles, plus an eFPGA). |
| 130 | + |
| 131 | +The goal of this project is to bring the P-Mesh coherence system into the OpenHW ecosystem to enable users to build large meshes of OpenHW CORE-V and other cores and accelerators. We propose to name this CV-Mesh and to separate it from OpenPiton itself as an independent IP block. |
| 132 | + |
| 133 | +### Components of the Project |
| 134 | + |
| 135 | +* Component 1: CV-Mesh protocol specification |
| 136 | +* Component 2: CV-Mesh user guide |
| 137 | +* Component 3: RTL implementation of the local private cache (L1.5 cache from OpenPiton P-Mesh) verified to TRL 5 |
| 138 | +* Component 4: RTL implementation of the shared last-level cache (L2 cache from OpenPiton P-Mesh) verified to TRL 5 |
| 139 | +* Component 5: RTL implementation of bridges to/from other data movement protocols (e.g. AXI-Lite, AXI) verified to TRL 3 |
| 140 | +* Component 6: RTL implementation of physical 2D-mesh network-on-chip (dynamic node network from OpenPiton P-Mesh) verified to TRL 3 |
| 141 | + |
| 142 | +#### Component 1 Description |
| 143 | + |
| 144 | +OpenPiton provides a microarchitecture specification document which describes the P-Mesh coherence protocol. This document has fallen out of date versus a number of more recent changes and the sources are in latex. This component will be brought up to date and into a better open format. |
| 145 | + |
| 146 | +#### Component 2 Description |
| 147 | + |
| 148 | +The CV-Mesh user guide will describe the interfaces provided by the CV-Mesh caches, network, and bridges. It will provide users with the information needed to correctly instantiate these components to build their own system-on-chip, to complement the example design(s) provided in the Polara APU repository. It will also describe what types of requests and responses can be sent to/from the different caches and which protocols (or subsets thereof) are supported by the protocol bridges. |
| 149 | + |
| 150 | +#### Component 3 Description |
| 151 | + |
| 152 | +The local private cache (L1.5 cache in P-Mesh) generally acts as a second layer of cache. It communicates with the shared last-level cache to maintain cache coherence for cores and other agents. To decouple the core from the coherence protocol itself, the cache offers the Transaction-Response Interface (TRI) which is implemented by any core connected into the system. This includes the write-through L1 cache used in CVA6. Support for TRI in CV-HPDC will be developed as part of the project, as will a number of performance enhancements. |
| 153 | + |
| 154 | +#### Component 4 Description |
| 155 | + |
| 156 | +The shared last-level cache (L2 cache in P-Mesh) acts as the coherence directory and supports coherent and non-coherent access over the network-on-chip. Privates caches interact with the L2 to maintain the coherence protocol, but other agents may also communicate directly with the last-level cache to perform coherent reads or writes without participating in the entire coherence protocol. |
| 157 | + |
| 158 | +#### Component 5 Description |
| 159 | + |
| 160 | +As not every peripheral implements the P-Mesh coherence protocol, it provides bridges between P-Mesh and other protocols, most relevantly AXI and AXI-Lite. These enable interaction with a variety of peripherals, accelerators, DMAs, etc. |
| 161 | + |
| 162 | +#### Component 6 Description |
| 163 | + |
| 164 | +OpenPiton uses three physical networks-on-chip to maintain deadlock-free communication between the caches and main memory. The platform supports replacement of the network routers with others provided they maintain point-to-point ordering for messages. As a result, in CV-Mesh, the network routers from P-Mesh are only provided as an example with the recognition that users may replace them with other network routers. |
| 165 | + |
| 166 | +## Summary of market or input requirements |
| 167 | +### Known market/project requirements at PL gate |
| 168 | + |
| 169 | +* OpenPiton designs have been taped out by a number of teams, including the following chips: |
| 170 | + * Piton (25 tiles/cores, using the OpenSPARC T1 core) in 32nm technology |
| 171 | + * CIFER (8 tiles: 22 cores, including 4 CVA6 cores, plus an eFPGA) in 12nm technology |
| 172 | + * DECADES (108 tiles: 60 CVA6 cores, 24 accelerators, 23 intelligent storage tiles, plus an eFPGA) in 12nm technology |
| 173 | + * Intel (8 tiles/cores, using the CVA6 core) in Intel 4 technology |
| 174 | + |
| 175 | +### Potential future enhancements for future project phases |
| 176 | + |
| 177 | +## Who would make use of OpenHW output |
| 178 | + |
| 179 | +Those interested in building scalable clusters of OpenHW CORE-V and other cores. |
| 180 | + |
| 181 | +## Summary of Timeline |
| 182 | + |
| 183 | +* Separating coherence IP from OpenPiton repository (Q4 2023) |
| 184 | +* Connection to CV-HPDC (initial support and validation complete in Q1 2024) |
| 185 | +* Enhancing performance characteristics and parameterisation (Q2 2024) |
| 186 | +* Standalone verification environment (Q4 2024) |
| 187 | +* Improvement of documentation (Q4 2024) |
| 188 | +* User guide (Q4 2024) |
| 189 | + |
| 190 | +## Explanation of why OpenHW should do this project |
| 191 | + |
| 192 | +The P-Mesh coherence system as established in OpenPiton has been in development since late 2013 and open-source since mid 2015. The system already supports the CVA6 core and has seen significant adoption. The IP has been well validated for use with a number of cores and ISAs, as well as heterogeneous capabilities for integrating accelerators, FPGAs, and more, including large chips with 100s of tiles and billions of transistors. The project will extend OpenHW's move into HPC and combined with CV-HPDC will enable connection of higher performance cores in the near future. |
| 193 | + |
| 194 | +## Industry landscape: description of competing, alternative, or related efforts in the industry |
| 195 | + |
| 196 | +### BedRock |
| 197 | + |
| 198 | +The BedRock coherence protocol was established for creation of coherent clusters of BlackParrot cores. |
| 199 | + |
| 200 | +* Directory-based coherence model |
| 201 | +* Two-level cachee hierarchy |
| 202 | +* Capable of coherence protocols like MOESIF (and many subsets) |
| 203 | +* Uses a microcoded coherence engine |
| 204 | +* Open source (BSD license) |
| 205 | +* SystemVerilog |
| 206 | + |
| 207 | +### ESP |
| 208 | + |
| 209 | +ESP has a long history and focuses on accelerator-rich SoCs. Originally designed for LEON3 but today supports CVA6 and Ibex as host cores. |
| 210 | + |
| 211 | +* Directory-based coherence model |
| 212 | +* Three-level cache hierarchy |
| 213 | +* MESI protocol or Spandex heterogeneous coherence |
| 214 | +* 32 bit physical addresses |
| 215 | +* Open source (Apache license) |
| 216 | +* SystemC & SystemVerilog |
| 217 | + |
| 218 | +### TileLink 2 |
| 219 | + |
| 220 | +TileLink is the primary coherence protocol used among users of Rocket/BOOM, and was specified by SiFive. There are a number of configurable implementations so we elide the details here. |
| 221 | + |
| 222 | +* Supports both snooping and directory-based coherence models |
| 223 | +* Open source (varies - BSD license for some IP) |
| 224 | +* Chisel (primarily) |
| 225 | + |
| 226 | +### AMBA ACE/CHI/etc |
| 227 | + |
| 228 | +Arm's AMBA protocols include ACE and CHI which enable coherent operation. |
| 229 | + |
| 230 | +* Primarily snoop-based coherence model |
| 231 | +* Commercial protocols with some open source implementations |
| 232 | +* OpenHW ACE implementation in SystemVerilog: "CORE-V tightly-coupled cache coherence mechanism for CVA6" |
| 233 | + |
| 234 | +## OpenHW Members/Participants committed to participate |
| 235 | + |
| 236 | +Jonathan Balkind, Assistant Professor, UC Santa Barbara |
| 237 | +Miquel Moretó, Associate Researcher, Barcelona Supercomputing Center & Associate Professor, Universitat Politècnica de Catalunya (UPC) |
| 238 | +Lluc Alvarez, Established Researcher, Barcelona Supercomputing Center |
| 239 | +César Fuguet, CEA List, Grenoble |
| 240 | + |
| 241 | +## Project Leader(s) |
| 242 | +### Technical Project Leader(s) |
| 243 | + |
| 244 | +Jonathan Balkind, Assistant Professor, UC Santa Barbara |
| 245 | + |
| 246 | +### Project Manager, if a PM is designated |
| 247 | + |
| 248 | +N/A |
| 249 | + |
| 250 | +## Project Documents |
| 251 | +### Project Planning Documents |
| 252 | + |
| 253 | +* PL document (this document) |
| 254 | + |
| 255 | +### Project Output Documents |
| 256 | + |
| 257 | +* CV-Mesh protocol specification |
| 258 | +* CV-Mesh user guide |
| 259 | + |
| 260 | +## List of project technical outputs |
| 261 | + |
| 262 | +* Enhanced versions of the five components which are already provided as inputs for the project |
| 263 | + |
| 264 | +### Feature Requirements |
| 265 | + |
| 266 | +#### Feature 1 |
| 267 | + |
| 268 | +* Connection to CV-HPDC |
| 269 | + |
| 270 | +#### Feature 2 |
| 271 | + |
| 272 | +* Support for wider networks-on-chip |
| 273 | + |
| 274 | +#### Feature 3 |
| 275 | + |
| 276 | +* Support for larger cache block sizes in the local private cache |
| 277 | + |
| 278 | +#### Feature 4 |
| 279 | + |
| 280 | +* Improved parameterisation of caches |
| 281 | + |
| 282 | +## External dependencies |
| 283 | + |
| 284 | +* OpenPiton |
| 285 | + |
| 286 | +## OpenHW TGs Involved |
| 287 | + |
| 288 | +* TWG: Interconnect |
| 289 | + |
| 290 | +## Resource Requirements |
| 291 | + |
| 292 | +### Engineering resource supplied by members - requirement and availability |
| 293 | + |
| 294 | +Team from UC Santa Barbara |
| 295 | +Team from Barcelona Supercomputing Center |
| 296 | + |
| 297 | +### OpenHW engineering staff resource plan: requirement and availability |
| 298 | + |
| 299 | +N/A |
| 300 | + |
| 301 | +### Marketing resource - requirement and availability |
| 302 | + |
| 303 | +N/A |
| 304 | + |
| 305 | +### Funding for project aspects - requirement and availability |
| 306 | + |
| 307 | +N/A |
| 308 | + |
| 309 | +## Architecture and/or context diagrams |
| 310 | + |
| 311 | +The following figure shows a top-level view of a 16-core CVA6 system previously demonstrated with P-Mesh. |
| 312 | + |
| 313 | + |
| 314 | + |
| 315 | +The following figure shows a system enabled by P-Mesh compared with a future system enabled by CV-Mesh. The CV-Mesh IP is made of the components shown within the orange box. |
| 316 | + |
| 317 | + |
| 318 | + |
| 319 | +## Project license model |
| 320 | + |
| 321 | +* 3-Clause BSD (following existing OpenPiton P-Mesh) |
| 322 | + |
| 323 | +## Description of initial code contribution, if required |
| 324 | + |
| 325 | +P-Mesh code is already hosted by OpenHW as part of the Polara APU GitHub repository. The P-Mesh component will be brought into its own repository as CV-Mesh. |
| 326 | + |
| 327 | +## Repository Requirements |
| 328 | + |
| 329 | +* Repository for CV-Mesh IP |
| 330 | +* Submodule linkage to branch(es) on Polara APU repository (which is an OpenPiton fork) |
| 331 | + |
| 332 | +## Project distribution model |
| 333 | + |
| 334 | +OpenHW GitHub repositories |
| 335 | + |
| 336 | +## Preliminary Project plan |
| 337 | +*A full project plan is not required at PL. A preliminary plan, which can be for instance the schedule for completion of component or feature list, together with responsible resource, should be provided. Full details should be provided at PA gate.* |
| 338 | + |
| 339 | +## Risk Register |
| 340 | +*A list of known risks, for example external dependencies, and any mitigation strategy* |
| 341 | + |
0 commit comments