Results in:
This happens on both CPU and GPU. There’s no reason this calculation should require 551GB’s worth of ram. The explicit hessian is “only” (24833)^24 bytes=221 MB sdonn asked 6 years ago During some random testing, I stumbled upon this error message: [Quasar CUDA Engine] – OUT OF MEMORY detected (request
size 536870912 bytes)! Starting memory diagnostics subprogram… Amount of pinned memory: 67897344 bytes Freelist size: 2 memory blocks Largest free block: 67108864 bytes Process total: 201326592, Inuse: 67897344 bytes, Free: 133429248 bytes; Device total: 2147352576, Free: 1655570432 Chunk 0 size 67108864 bytes: Fragmentation: 0.0%, free: 67108864 bytes Chunk 1 size 134217728 bytes: Fragmentation: 0.0%, free: 66320384 bytes Info: CUDA memory failure arises when too many large memory blocks are
used by the same kernel function. Please split the input data into blocks and let the program process these blocks individually, to avoid the CUDA memory failure. Basically, I request 500MB video memory. Okay, the process can\’t serve this because it only gets 200MB to start with. However, the GPU itself still has 1.6GB of free memory! Why can\’t the quasar process access this memory? 2 Answers The Quasar process tries to allocate a memory block that is large
enough to hold the 536 MB using cudaMalloc, but this fails. There might be 1.6 GB available, but due to memory fragmentation (especially if there are other processes that take GPU memory, it could also be opengl) and other issues, a contiguous block of 536 MB might not be available, unfortunately… sdonn answered 6 years ago i used nvidia-smi to check other GPU memory users. There was just 200MB allocated to X11, and about 10MB for kwin. So it would have been possible that there was no 550MB free, but that would have required some pretty bad memory allocation from the GPU’s side. I
now set the GPU memory footprint to ‘large’ by default. When I am running quasar I’m at work anyhow and nothing GPU-intensive should be running, aside from X11. Toggle in-page Table of Contents JAX will preallocate 90% of currently-available GPU memory when the first JAX operation is run. Preallocating minimizes allocation overhead and memory
fragmentation, but can sometimes cause out-of-memory (OOM) errors. If your JAX process fails with OOM, the following environment variables can be used to override the default behavior: This disables the preallocation behavior. JAX will instead allocate GPU memory as needed, potentially decreasing the overall memory usage. However, this behavior is more prone to GPU memory fragmentation, meaning a JAX program that uses most of the available GPU memory may OOM with
preallocation disabled. If preallocation is enabled, this makes JAX preallocate XX% of currently-available GPU memory, instead of the default 90%. Lowering the amount preallocated can fix OOMs that occur when the JAX program starts. This makes JAX allocate exactly what is needed on demand, and deallocate memory that is no longer needed (note that this is the only configuration that will deallocate GPU memory, instead of reusing it).
This is very slow, so is not recommended for general use, but may be useful for running with the minimal possible GPU memory footprint or debugging OOM failures. Either
use TensorFlow also preallocates by default, so this is similar to running multiple JAX processes concurrently. One solution is to use CPU-only TensorFlow (e.g. if you’re only doing data loading with TF). You can prevent TensorFlow from using the GPU with the command Alternatively, use Use |