We need to do something for Open Access.
Sci-Hub is a shadow library website that provides free access to millions of research papers and books, without regard to copyright, by bypassing publishers’ paywalls in various ways. Sci-Hub was founded by Alexandra Elbakyan in 2011 in Kazakhstan in response to the high cost of research papers behind paywalls.
On May 7th, Sci-Hub’s Alexandra Elbakyan revealed that the FBI has been wiretapping her accounts for over 2 years. This news comes after Twitter silenced the official Sci_Hub Twitter account because Indian academics were organizing on it against Elsevier.
Sci-Hub itself is currently frozen and has not downloaded any new articles since December 2020. This rescue mission is focused on seeding the article collection in order to prepare for a potential Sci-Hub shutdown.
For now, sci-hub has more than 85,483,812 papers and the total size is up to 77 TB. The Rescue Mission from Reddit uses BitTorrent to distribute papers. They split those papers into 850 sci-hub torrents (every one of them is about 100G). It looks good, but not so enough.
- For storage provider: 100GB or 1TB consumes too much (they need to be online)
- For end-users: They depend on centralized service to get the paper
- For global networks: They can’t reuse the already existing data.
We can store PDF / Papers on IPFS to avoid been taken down.
IPFS is a P2P hypermedia protocol:
- IPFS address file/content via their content hash, no file will be corrupted.
- IPFS transfers data in a P2P way instead of a centralized node.
- IPFS can remove duplications via their content hash.
So IPFS is a good fit for us.
We can set up an IPFS cluster holding the whole dataset and allow users to set up their own.
- Require the user to have an IPFS cluster storing 77TB data.
- Allow the user to build an API upon data.
- Allow the user to fetch single paper by it’s hash
We only maintain the index of papers:
- DOI → Paper Hash
- Title → Paper Hash
- … → Paper Hash
And we can provide APIs including :
- Insert new papers
- Query paper via DOI / Titles / …
The difference from IPFS cluster is, in this way, we only maintain the index/database of papers.
More: we can build a distributed DB over IPFS (maybe OrbitDB).
- After go-serivce-ipfs has been implemented, we can operate on data from IPFS.
- After IPFS repo that based on go-storage · Issue #631 · beyondstorage/go-storage · GitHub implemented, we can store data in IPFS via go-storage as backend.