Yesterday and today I got a challenge in a project of mine, the thing is I was required to create a feature where the user could upload a file to a cloud service.
- I wrote the API client.
- I wrote the file picker.
- I wrote the UI.
Then I simply sent the multipart request when the user selected a file.
This worked fine... for < 1MB files.
But when the file was like, greater than 20MB, the JVM was simply crashing right after some Garbage Collector messages:
Background concurrent copying GC freed 153040(3MB) AllocSpace objects, 12(4MB) LOS objects, 49% free, 4MB/8MB, paused 819us total 173.633ms
I've always hated/loved working with this kind of stuff, it's amazing when your theory is right and you did understand, but the hell otherwise.
The JVM GC seeing someone loading a 534TB file to a ByteArray at once:
After some while I also noticed this message is just saying that the garbage collector is working to free memory, this indicates that either:
- your app is doing a very intensive task and since it manages memory very well the message is just indicating that ( ).
- you suck at memory management and your app is trying to land to the moon with the user's RAM (x).
But okay, lets see what's wrong with this code:
File file = ...
// The 292 GB variable
final Uint8List fileBytes = await file.readAsBytes()
The cool thing is that there is no loophole, recursive hell or anything, is just me, trying to load a file to a var.
This variable will simply crash any app, so how can we fix that?
And no, I'll not ask you to implement a bufferedreader in C++ if you want to do so go ahead but don't bring anyone else with you.
Most high-level languages includes Stream APIs, let it be a InputStream, OutputStream, or simply Streams. They are an abstraction to work with large amounts of bytes (which is what all files are).
The catch is to load a 50TB file as a several chunks of 1MB, it will take some time? yes, but what else did you expect?
I don't like Dart Streams, but they are what we have, so lets go:
final subscription = file.openRead().listen(
(Uint8List chunk) {
// Chunk size depends on the buffer size
// but it's small enough to fit into a variable.
},
);
Of course, this is just a direction, but you can keep some tips:
- Your user's device can't load 50TB into a variable. But several chunks of 1MB is fine.
- Do not store the chunks together anywhere, it will simply nullify the efforts of loading through a Stream.
- Always avoid loading something "all at once", use Streams.
- If you need to display something, use the lazy-load strategy.