parallel free download

Showing 116 open source projects for "parallel"

View related business solutions

Software Development C++ Clear Filters & Widen Search

Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

CUDA Core Compute Libraries (CCCL)

CUDA Core Compute Libraries

...It brings together Thrust, CUB, and libcudacxx, which collectively provide high-level abstractions, low-level performance primitives, and a CUDA-compatible standard library for GPU programming. The goal of CCCL is to simplify CUDA development by offering reusable building blocks that enable developers to write efficient and scalable parallel code without starting from scratch. Thrust provides a high-level interface for parallel algorithms, while CUB delivers highly optimized primitives for device-level operations, and libcudacxx ensures compatibility with modern C++ standards. By unifying these components, CCCL reduces duplication and improves developer productivity while maintaining performance across different GPU architectures.

Downloads: 15 This Week

Last Update: 3 days ago
See Project
2

Soufflé

Datalog variant for tool designers crafting analyses in Horn clauses

Rapid prototyping for your analysis problems with logic; enabling deep design-space explorations; designed for large-scale static analysis; e.g., points-to analysis for Java, taint-analysis, and security checks. Futamura projections/partial evaluation for effective translation to parallel C++; optimized staged compilation; specialized data-structures for logical relations. Efficient translation to parallel C++ of Datalog programs (CAV'16, CC'16) Efficient interpretation using de-specialization techniques (PLDI'21) Specialized data structure for relations (PACT'19, PPoPP'19, PMAM'19) with optimal index selection (VLDB'18) Extended semantics of Datalog, e.g., permitting unbounded recursions with numbers and terms. ...

Downloads: 0 This Week

Last Update: 2025-03-24
See Project
3

Halide

A language for fast, portable data-parallel computation

Halide is a programming language for fast, portable data-parallel computation. It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. ...

Downloads: 6 This Week

Last Update: 2025-09-17
See Project
4

ispc

Intel SPMD Program Compiler

...Under the SPMD model, the programmer writes a program that generally appears to be a regular serial program, though the execution model is actually that a number of program instances execute in parallel on the hardware. ispc compiles a C-based SPMD programming language to run on the SIMD units of CPUs and GPUs; it frequently provides a 3x or more speedup on architectures with 4-wide vector SSE units and 5x-6x on architectures with 8-wide AVX vector units, without any of the difficulty of writing intrinsics code. Parallelization across multiple cores is also supported by ispc, making it possible to write programs that achieve performance improvement that scales by both numbers of cores and vector unit size. ...

Downloads: 76 This Week

Last Update: 2026-02-04
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
5

ncnn

High-performance neural network inference framework for mobile

ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...

Downloads: 92 This Week

Last Update: 2026-01-13
See Project
6

TensorStore

Library for reading and writing large multi-dimensional arrays

...It separates the logical view (shape, dtype, chunking) from the physical layout so the same code can target Zarr, N5, TIFF pyramids, or custom backends. Rich indexing, slicing, and broadcasting operations make it feel like a familiar array API, while asynchronous I/O pipelines stream chunks efficiently in parallel. Transactional semantics allow atomic updates and consistent snapshots, which is essential for large, shared datasets used by ML and scientific workflows. The library is engineered for scalability—background caching, chunk sharding, and retryable operations keep throughput high even over unreliable networks. With language bindings, it fits into Python-heavy analysis pipelines while retaining a fast C++ core.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
7

mold

A Modern Linker

Mold is a modern high-performance linker designed as a drop-in replacement for traditional Unix linkers, with a primary goal of dramatically reducing build times for large software projects. In compiled languages like C, C++, and Rust, the linking phase can become a significant bottleneck, especially in large codebases, and mold addresses this by leveraging highly optimized algorithms and extensive parallelism. It is capable of utilizing all available CPU cores efficiently, resulting in...

Downloads: 17 This Week

Last Update: 4 days ago
See Project
8

XGBoost

Scalable and Flexible Gradient Boosting

...It supports regression, classification, ranking and user defined objectives, and runs on all major operating systems and cloud platforms. XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems. XGBoost can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.

Downloads: 10 This Week

Last Update: 2026-02-10
See Project
9

ChrysaLisp

Parallel OS, with GUI, Terminal, OO Assembler, Class libraries

ChrysaLisp is a 64-bit, MIMD, multi-CPU, multi-threaded, multi-core, multi-user parallel operating system with features such as a GUI, terminal, OO Assembler, class libraries, C-Script compiler, Lisp interpreter, debugger, profiler, vector font engine, and more. It supports MacOS, Windows, and Linux for x64, Riscv64, and Arm64 and eventually will move to bare metal. It also allows the modeling of various network topologies and the use of ChrysaLib hub nodes to join heterogeneous host networks. ...

Downloads: 0 This Week

Last Update: 6 days ago
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
10

SIMD

C++ wrappers for SIMD intrinsics

SIMD is a C++ library that provides portable abstractions over SIMD (Single Instruction, Multiple Data) instructions, enabling developers to write high-performance vectorized code without dealing directly with architecture-specific intrinsics. SIMD instructions allow a single operation to be applied to multiple data elements simultaneously, significantly accelerating numerical and data-parallel computations. However, differences across CPU architectures and compilers make direct usage complex, which xsimd addresses by offering a unified API that maps efficiently to underlying hardware capabilities. The library supports a wide range of instruction sets, including SSE, AVX, NEON, and WebAssembly SIMD, ensuring portability across platforms. ...

Downloads: 8 This Week

Last Update: 5 days ago
See Project
11

TensorRT

C++ library for high performance inference on NVIDIA GPUs

...With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers, embedded, or automotive product platforms. TensorRT is built on CUDA®, NVIDIA’s parallel programming model, and enables you to optimize inference leveraging libraries, development tools, and technologies in CUDA-X™ for artificial intelligence, autonomous machines, high-performance computing, and graphics. With new NVIDIA Ampere Architecture GPUs, TensorRT also leverages sparse tensor cores providing an additional performance boost.

Downloads: 22 This Week

Last Update: 2026-03-25
See Project
12

ArrayFire

ArrayFire, a general purpose GPU library

ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs, and other hardware acceleration devices. The library serves users in every technical computing market. Data structures in ArrayFire are smartly managed to avoid costly memory transfers and to take advantage of each performance feature provided by the underlying hardware. The community of ArrayFire developers invites you to build with us if you're interested and able to write top performing tensor functions. ...

Downloads: 4 This Week

Last Update: 2025-09-05
See Project
13

frugally-deep

A lightweight header-only library for using Keras (TensorFlow) models

...Utterly ignores even the most powerful GPU in your system and uses only one CPU core per prediction. Quite fast on one CPU core, and you can run multiple predictions in parallel, thus utilizing as many CPUs as you like to improve the overall prediction throughput of your application/pipeline.

Downloads: 5 This Week

Last Update: 2025-05-16
See Project
14

Google Highway

Performance-portable, length-agnostic SIMD with runtime dispatch

Google Highway is a high-performance C++ library designed to provide portable SIMD (Single Instruction, Multiple Data) vectorization across multiple CPU architectures while maintaining predictable and efficient behavior. It abstracts low-level vector intrinsics into a consistent API that maps closely to hardware instructions, allowing developers to write high-performance code without relying heavily on compiler auto-vectorization. Highway enables the same source code to run across different...

Downloads: 1 This Week

Last Update: 2026-04-07
See Project
15

NeoPixelBus

An Arduino NeoPixel support library

...There are multiple competing libraries, FastLED being the biggest and Adafruit NeoPixel being the most common for beginners. On ESP32, both FastLED and NeoPixelBus can provide more than one channel/bus. FastLED primarily uses RMT to support 8 parallel channels. NeoPixelBus now supports the RMTs 8 channels and two more channels using i2s.

Downloads: 1 This Week

Last Update: 2025-04-29
See Project
16

Apache brpc

Industrial-grade RPC framework used throughout Baidu

Apache brpc is an industrial-grade RPC framework for building reliable and high-performance services. Apache brpc (incubating) is an effort undergoing Incubation at The Apache Software Foundation (ASF), sponsored by the Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not...

Downloads: 0 This Week

Last Update: 2026-01-18
See Project
17

OneFlow

OneFlow is a deep learning framework designed to be user-friendly

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information. Distributed performance (efficiency) is the core technical difficulty of the deep learning framework. OneFlow focuses on performance improvement and heterogeneous...

Downloads: 0 This Week

Last Update: 2024-03-11
See Project
18

Octave Forge

A collection of packages providing extra functionality for GNU Octave

Octave Forge is a central location for collaborative development of packages for GNU Octave. The Octave Forge packages expand Octave's core functionality by providing field specific features via Octave's package system. See https://octave.sourceforge.io/packages.php for a list of all available packages. GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and...

56 Reviews

Downloads: 1,555 This Week

Last Update: 2026-04-07
See Project
19

PANDA

A comprehensive and flexible quantification tool for proteomics data

...On the levels of spectra, peptides and proteins, PANDA works out a few quantitative filters and new scores for quantification confidence. Third, PANDA is designed for processing proteomics big data in parallel.

Downloads: 20 This Week

Last Update: 2024-06-07
See Project
20

BMDFM

Binary Modular DataFlow Machine (BMDFM)

...The BMDFM dynamic scheduling subsystem performs a symmetric multiprocessing (SMP) emulation of a tagged-token dataflow machine to provide the transparent dataflow semantics for the applications. No directives for parallel execution are needed. More info: http://www.bmdfm.com

Downloads: 0 This Week

Last Update: 2025-06-13
See Project
21

Proximus for NUMA

Proximus is an Electronic System Level (ESL) design environment.

Proximus-FOSS stands as an innovative platform that fosters the convergence of hardware design and software programming, enabling concurrent development across both disciplines. Its collaborative environment empowers developers to concurrently address hardware and software aspects of a project. The Proximus Open Source version boasts robust support for multi-threaded programming with a C++ implementation. This capability allows developers to harness the full potential of C++ for crafting...

Downloads: 8 This Week

Last Update: 2025-12-12
See Project
22

Evolutionary Computation Framework

C++ framework for application of any type of evolutionary computation.

ECF is a framework intended for application of any type of evolutionary computation (GA/GP, DE, Clonalg, ES, PSO, ABC, GAn, local search...). It offers simplicity for the end-user (parameterless usage, tutorial) and customization for experienced EC practicioners.

Downloads: 7 This Week

Last Update: 2026-04-10
See Project
23

classdesc

Classdesc is a system for adding reflection to C++, ie the ability to query an object's structure at runtime.

Downloads: 0 This Week

Last Update: 2025-03-31
See Project
24

eCxx

A C++ library for AVR and NodeMCU

NOTE: This project is marked with 'Status: Abandoned' on SourceForge because not enough time can be dedicated to this project. However it may still get sporadic commits to the repository. eCxx is a library for AVR and NodeMCU tailored for micro LED displays and lighting effects. eCxx is utilizing Makefile build system. Java and Python based applications/tools are also included to ease the development and debugging process using the host PC. On one side, eCxx supports the original...

Downloads: 3 This Week

Last Update: 2024-02-01
See Project
25

Thrust

The C++ parallel algorithms library

Thrust is the C++ parallel algorithms library which inspired the introduction of parallel algorithms to the C++ Standard Library. Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. It builds on top of established parallel programming frameworks (such as CUDA, TBB, and OpenMP).

Downloads: 2 This Week

Last Update: 2023-03-20
See Project