concurrency vs parallelism

- Bottom line: Concurrency is about *structuring* a program to handle multiple tasks (interleaving); parallelism is about *executing* multiple tasks simultaneously on multiple cores.

async vs threads vs multiprocessing

- Bottom line: Concurrency is about *structuring* a program to handle multiple tasks (interleaving); parallelism is about *executing* multiple tasks simultaneously on multiple cores.

goroutines vs virtual threads vs asyncio

- Bottom line: Concurrency is about *structuring* a program to handle multiple tasks (interleaving); parallelism is about *executing* multiple tasks simultaneously on multiple cores.

parallel programming patterns by language

- Bottom line: Concurrency is about *structuring* a program to handle multiple tasks (interleaving); parallelism is about *executing* multiple tasks simultaneously on multiple cores.

how to choose concurrency model

- Bottom line: Concurrency is about *structuring* a program to handle multiple tasks (interleaving); parallelism is about *executing* multiple tasks simultaneously on multiple cores.

Concurrency & Parallelism Patterns by Language

What are the concurrency and parallelism patterns by language?

TL;DR

Bottom line: Concurrency is about structuring a program to handle multiple tasks (interleaving); parallelism is about executing multiple tasks simultaneously on multiple cores.
Key tool: async/await for I/O-bound concurrency; threads/processes for CPU-bound parallelism.
Watch out for: Using threads for I/O in Python (GIL blocks CPU work, asyncio is faster for I/O) or blocking the event loop in Node.js/Python asyncio.
Works with: Python 3.9+, Node.js 18+, Go 1.21+, Java 21+, Rust 1.75+/tokio 1.x, C# .NET 6+.

Constraints

Python's GIL prevents true parallelism with threads for CPU-bound work; use multiprocessing or the experimental free-threaded build (3.13+). [src2]
Node.js worker_threads share no JS objects by default; only ArrayBuffer/SharedArrayBuffer can be transferred. [src3]
Go goroutines on a single GOMAXPROCS=1 setting run concurrently but not in parallel. [src1]
Java virtual threads must not be pooled -- create a new virtual thread per task (they are cheap to create and destroy). [src4]
Rust async functions are lazy; they do nothing until polled by a runtime (tokio, async-std). [src5]

Quick Reference

Language	Concurrency Model	Parallelism Model	GIL / Limits	Best For
Python	`asyncio` (coroutines), `threading`	`multiprocessing`, `concurrent.futures`	GIL blocks CPU parallelism in threads	I/O: asyncio; CPU: multiprocessing
JavaScript (Node.js)	Event loop, Promises, `async/await`	`worker_threads`, `child_process`, `cluster`	Single-threaded event loop	I/O: async/await; CPU: worker_threads
Go	Goroutines + channels (CSP model)	Goroutines across OS threads (GOMAXPROCS)	None -- true parallelism by default	Both I/O and CPU-bound work
Java	Virtual threads (Project Loom), `CompletableFuture`	`ForkJoinPool`, parallel streams, platform threads	None -- true parallelism	I/O: virtual threads; CPU: ForkJoinPool
Rust	`async/await` + tokio/smol runtime	`std::thread`, Rayon (data parallelism)	None -- ownership prevents data races at compile time	I/O: tokio; CPU: Rayon or std::thread
C#	`async/await`, `Task`, `ValueTask`	`Parallel.ForEach`, `Task.Run`, PLINQ	None -- true parallelism	I/O: async/await; CPU: Parallel/TPL

Concurrency Primitives Comparison

Primitive	Python	Node.js	Go	Java	Rust	C#
Coroutine/task	`async def`	`async function`	`go func()`	`Thread.startVirtualThread()`	`tokio::spawn()`	`Task.Run()`
Channel/queue	`asyncio.Queue`	N/A (use streams)	`chan`	`BlockingQueue`	`tokio::sync::mpsc`	`Channel<T>`
Mutex/lock	`threading.Lock`	N/A (single-threaded)	`sync.Mutex`	`synchronized` / `ReentrantLock`	`std::sync::Mutex<T>`	`lock` / `SemaphoreSlim`
Atomic	N/A	`Atomics` (SharedArrayBuffer)	`sync/atomic`	`AtomicInteger` etc.	`std::sync::atomic`	`Interlocked`
Thread pool	`ThreadPoolExecutor`	Worker pool (manual)	Runtime manages	`newVirtualThreadPerTaskExecutor()`	`tokio` runtime	`ThreadPool` / TPL
Parallel loop	`multiprocessing.Pool.map()`	`Promise.all()` (I/O)	`errgroup.Group`	`parallelStream()`	`rayon::par_iter()`	`Parallel.ForEach()`

Decision Tree

START
|-- Is the workload I/O-bound (network, disk, database)?
|   |-- YES
|   |   |-- Python? --> Use asyncio with async/await
|   |   |-- Node.js? --> Use Promises / async/await (built-in event loop)
|   |   |-- Go? --> Use goroutines + channels
|   |   |-- Java? --> Use virtual threads (Java 21+)
|   |   |-- Rust? --> Use tokio async runtime
|   |   +-- C#? --> Use async/await with Task
|   +-- NO (CPU-bound) v
|-- Is the workload CPU-bound (computation, data processing)?
|   |-- YES
|   |   |-- Python? --> Use multiprocessing or ProcessPoolExecutor
|   |   |-- Node.js? --> Use worker_threads with a worker pool
|   |   |-- Go? --> Use goroutines (parallel by default with GOMAXPROCS > 1)
|   |   |-- Java? --> Use ForkJoinPool or parallel streams
|   |   |-- Rust? --> Use Rayon for data parallelism or std::thread
|   |   +-- C#? --> Use Parallel.ForEach or Task.Run
|   +-- NO v
+-- Mixed workload? --> Separate I/O and CPU layers; use async for I/O,
    offload CPU to worker pool/process

Step-by-Step Guide

1. Identify workload type

Determine whether your bottleneck is I/O-bound (waiting for network/disk) or CPU-bound (processing data). Profile first. [src1]

# Python: profile to see where time is spent
python -m cProfile -s cumtime your_script.py

# Node.js: use built-in profiler
node --prof your_script.js
node --prof-process isolate-0x*.log > profile.txt

Verify: Look at output -- if most time is in socket.recv, http.get, or file I/O calls, your workload is I/O-bound.

2. Choose the right concurrency primitive

Match your language and workload type to the Quick Reference table above. [src2]

Verify: Run a benchmark with 100 concurrent tasks -- you should see near-linear scaling for I/O-bound work with async.

3. Implement structured concurrency

Group related concurrent tasks and ensure all complete (or fail) together. [src4]

# Python: structured concurrency with asyncio.TaskGroup (3.11+)
async with asyncio.TaskGroup() as tg:
    task1 = tg.create_task(fetch_url(url1))
    task2 = tg.create_task(fetch_url(url2))
# Both tasks guaranteed complete or cancelled here

Verify: If any task raises an exception, the entire group is cancelled.

4. Add error handling and cancellation

Every concurrent task must handle errors and support cancellation. Never fire-and-forget. [src7]

// Go: use errgroup for structured error handling
g, ctx := errgroup.WithContext(context.Background())
g.Go(func() error { return fetchURL(ctx, url1) })
g.Go(func() error { return fetchURL(ctx, url2) })
if err := g.Wait(); err != nil {
    log.Fatal(err) // first error cancels all goroutines via ctx
}

Verify: Introduce a deliberate error in one task -- confirm all others are cancelled.

Code Examples

Python: Async I/O with asyncio

# Input:  List of URLs to fetch concurrently
# Output: List of response bodies

import asyncio
import aiohttp  # aiohttp>=3.9

async def fetch_all(urls: list[str]) -> list[str]:
    async with aiohttp.ClientSession() as session:
        async def fetch(url: str) -> str:
            async with session.get(url) as resp:
                return await resp.text()
        return await asyncio.gather(*[fetch(u) for u in urls])

results = asyncio.run(fetch_all(["https://example.com"] * 10))

Python: CPU Parallelism with multiprocessing

# Input:  List of numbers to compute (CPU-bound)
# Output: List of results computed in parallel

from concurrent.futures import ProcessPoolExecutor
import math

def heavy_computation(n: int) -> float:
    return sum(math.sqrt(i) for i in range(n))

with ProcessPoolExecutor() as executor:
    results = list(executor.map(heavy_computation, [10**7] * 8))

Node.js: Worker Threads for CPU-bound work

// Input:  CPU-intensive computation
// Output: Result from worker thread

const { Worker, isMainThread, parentPort } = require("worker_threads");

if (isMainThread) {
  const worker = new Worker(__filename);
  worker.on("message", (result) => console.log("Result:", result));
  worker.postMessage({ n: 1e8 });
} else {
  parentPort.on("message", ({ n }) => {
    let sum = 0;
    for (let i = 0; i < n; i++) sum += Math.sqrt(i);
    parentPort.postMessage(sum);
  });
}

Go: Goroutines with Channels

// Input:  List of URLs to fetch concurrently
// Output: Collected results via channel

func fetchURL(url string, ch chan<- string, wg *sync.WaitGroup) {
    defer wg.Done()
    resp, err := http.Get(url)
    if err != nil {
        ch <- fmt.Sprintf("error: %v", err)
        return
    }
    defer resp.Body.Close()
    ch <- fmt.Sprintf("%s: %d", url, resp.StatusCode)
}

func main() {
    urls := []string{"https://go.dev", "https://pkg.go.dev"}
    ch := make(chan string, len(urls))
    var wg sync.WaitGroup
    for _, url := range urls {
        wg.Add(1)
        go fetchURL(url, ch, &wg)
    }
    go func() { wg.Wait(); close(ch) }()
    for result := range ch { fmt.Println(result) }
}

Java: Virtual Threads (Java 21+)

// Input:  List of tasks to run concurrently
// Output: Collected results via structured concurrency

// Virtual threads -- do NOT pool them, create fresh per task
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Future<String>> futures = List.of(
        executor.submit(() -> HttpClient.newHttpClient()
            .send(HttpRequest.newBuilder(URI.create("https://example.com")).build(),
                  HttpResponse.BodyHandlers.ofString()).body())
    );
    for (var f : futures) System.out.println(f.get().substring(0, 100));
}

Rust: Tokio Async Runtime

// Input:  List of URLs to fetch concurrently
// Output: Collected response statuses

// Cargo.toml: tokio = { version = "1", features = ["full"] }
//             reqwest = { version = "0.12", features = ["json"] }

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let urls = vec!["https://httpbin.org/get"; 5];
    let mut handles = vec![];
    for url in urls {
        handles.push(tokio::spawn(async move {
            let resp = reqwest::get(url).await?;
            Ok::<_, reqwest::Error>(resp.status())
        }));
    }
    for handle in handles {
        println!("Status: {}", handle.await??);
    }
    Ok(())
}

Anti-Patterns

Wrong: Shared mutable state without synchronization

# BAD -- race condition with shared counter across threads
counter = 0
def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1  # not atomic -- lost updates

Correct: Use a lock or atomic operation

# GOOD -- use a Lock for thread-safe mutation
import threading
counter = 0
lock = threading.Lock()
def increment():
    global counter
    for _ in range(1_000_000):
        with lock:
            counter += 1

Wrong: Blocking the event loop

// BAD -- blocks the entire Node.js event loop
app.get("/compute", (req, res) => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) sum += Math.sqrt(i);
  res.json({ sum });
});

Correct: Offload CPU work to a worker thread

// GOOD -- offload to worker_threads
const { Worker } = require("worker_threads");
app.get("/compute", (req, res) => {
  const worker = new Worker("./compute-worker.js");
  worker.on("message", (sum) => res.json({ sum }));
  worker.on("error", (err) => res.status(500).json({ error: err.message }));
});

Wrong: Pooling Java virtual threads

// BAD -- defeats the purpose of virtual threads
ExecutorService pool = Executors.newFixedThreadPool(100);
pool.submit(() -> blockingIO()); // wastes platform threads

Correct: Use virtual-thread-per-task executor

// GOOD -- virtual threads are cheap; create one per task
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    executor.submit(() -> blockingIO()); // millions of these are fine
}

Wrong: Python threads for CPU-bound work

# BAD -- GIL prevents true parallel execution of CPU-bound threads
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=8) as executor:
    results = list(executor.map(heavy_cpu_work, data))
    # runs SLOWER than single-threaded due to GIL contention

Correct: Use ProcessPoolExecutor for CPU-bound work

# GOOD -- separate processes bypass the GIL
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=8) as executor:
    results = list(executor.map(heavy_cpu_work, data))
    # true parallelism across CPU cores

Common Pitfalls

Deadlocks: Two or more tasks waiting for each other's locks. Fix: Always acquire locks in a consistent global order, or use channels which avoid explicit locks. [src1]
Race conditions: Multiple tasks read-modify-write shared state without synchronization. Fix: Use atomic operations, locks, or message-passing (channels). [src7]
GIL limitations in Python: Threading does not speed up CPU-bound Python code. Fix: Use multiprocessing or ProcessPoolExecutor; or the free-threaded build in Python 3.13+. [src2] [src6]
Blocking the event loop: Calling synchronous code inside async context. Fix: Use loop.run_in_executor() in Python or worker_threads in Node.js. [src3]
Thread-per-request at scale: Creating one OS thread per request exhausts memory at ~10K concurrent connections. Fix: Use async I/O, virtual threads, or goroutines. [src4]
Forgetting to await: Calling an async function without await silently drops the result. Fix: Always await or collect into asyncio.gather() / Promise.all(). [src2]
Goroutine leaks in Go: Launching goroutines without cancellation. Fix: Always pass context.Context and select on ctx.Done(). [src1]
Unbounded concurrency: Spawning unlimited tasks overwhelms downstream services. Fix: Use semaphores to limit concurrency. [src7]

Version History & Compatibility

Language/Runtime	Version	Concurrency Milestone	Notes
Python	3.4 (2014)	`asyncio` module added	Basic event loop
Python	3.5 (2015)	`async/await` syntax	Native coroutine syntax
Python	3.11 (2022)	`asyncio.TaskGroup`	Structured concurrency
Python	3.13 (2024)	Free-threaded build (experimental)	Optional GIL removal
Node.js	10.5 (2018)	`worker_threads` module	Experimental
Node.js	12 (2019)	`worker_threads` stable	Production-ready
Go	1.0 (2012)	Goroutines + channels	Core feature since inception
Java	21 (2023)	Virtual threads GA (Project Loom)	Replaces thread pools for I/O
Rust	1.39 (2019)	`async/await` stabilized	Requires external runtime
Rust	tokio 1.0 (2020)	Tokio 1.0 stable	De facto async runtime
C#	.NET 4.5 (2012)	`async/await`, `Task`	TPL-based
C#	.NET 6 (2021)	`Parallel.ForEachAsync`	Async parallel loops

When to Use / When Not to Use

Use When	Don't Use When	Use Instead
Many I/O operations need to run concurrently	Simple sequential script with one I/O call	Synchronous code
CPU-bound work can be split into independent chunks	Task requires shared mutable state across workers	Single-threaded with optimized algorithm
Handling thousands of concurrent connections	Low request volume (<100 concurrent)	Simple thread-per-request or synchronous handler
Background processing while main thread stays responsive	Computation is inherently sequential	Pipeline/streaming pattern
Need to saturate multi-core CPU for batch processing	Overhead of spawning exceeds computation time	Vectorized operations (NumPy, SIMD)

Important Caveats

Python's free-threaded build (3.13+) is experimental; many C extensions are not yet thread-safe. Do not use in production without thorough testing.
Go's goroutine scheduler is cooperative at function call boundaries; a tight CPU loop without function calls can starve other goroutines. Use runtime.Gosched() if needed.
Java virtual threads pin to platform threads when holding synchronized locks or calling native (JNI) code; prefer ReentrantLock over synchronized.
Rust's async-std runtime has been discontinued (March 2025); use tokio or smol instead.
asyncio.gather() swallows individual task exceptions by default; use TaskGroup for fail-fast behavior.