Hello fellow developers and DApp enthusiasts,
I’m currently developing a decentralized application (DApp) that needs to manage very large files, often exceeding 2GB, on the client side within a web environment. I’ve encountered a significant challenge: most browsers have a limitation on handling lists or data structures that exceed 2GB in size.
This limitation poses a problem when generating Content Identifiers (CIDs) for these large files. Ideally, a CID should represent the entire file as a single entity, but the browser’s limitation necessitates processing the data in smaller chunks (each less than 2GB).
Here’s my concern: If I process the file in segments to overcome the browser’s limitation, I’m worried that the resulting CIDs for these segments won’t match the CID that would be generated if the file were processed as a whole. This discrepancy could potentially impact the file’s integration and recognition within the IPFS network.
Has anyone else encountered this issue? Are there strategies or workarounds for generating a consistent CID for very large files without splitting them into smaller chunks? I’m looking for solutions or insights that would allow the DApp to handle these large files efficiently while maintaining consistency in the CIDs generated.
Appreciate any advice or shared experiences!
Well at the risk of being unhelpful ( :) ) I’d take a second to reevaluate whether I really need the huge files in the first place, or whether it would be better/possible to have the content of the file unpacked natively inside IPFS.
IPFS is just not really optimized for big binary files, and you’re running into that. It has a ton of features for collecting and connecting atoms of raw content outside of files, though, and if your application involved content that could be handled natively like that you might find some of those features to be a helpful bonus.
Think of IPFS as a database, not a filesystem. Using it for huge files is akin to putting the file in the field of an SQL table. It’s kind of awkward.
Anyway, I also worry about performance when people start talking about big files. That comes with A LOT of overhead. However, I have heard some people talking about getting acceptable real world performance.