Building a Decentralized BitTorrent Search Enginen
For those of you who don't know of us at Aurous Group, we made a really cool BitTorrent Search Engine awhile back. The technology behind that engine turned out to be really useful and serves as the foundation for the topic of today.
Strike Search works in the following steps. http://40.media.tumblr.com/52f1179951aabd221a5df6f6e5d115d5/tumblr_inline_nt91026tOK1s97th6_500.png
These steps are easy enough, however to do this for Aurous it has to work without a central servers, we want Aurous to work even after we've reached EOF; because lets face all software does eventually.
There are two master list present in Aurous, one contains a comprehensive list of hashes and titles sorted by match; so two strings that closely resemble each other are put in the same section. The second list is again, lots of torrent hashes, but this list contains peer data, file information and popularity metrics.'
Breaking It Down
Each Aurous client seeds a copy of these list, when a user searches for something not present in either list, their client will seek out the relevant information by crawling the raw text of a plethora of websites, this process is fast, master list results will be instant. DHT is also crawled, this happens parallel to the main request and is usually instant as it works based off hashes. When information needs to be seeked out, it is added to the master list present within Aurous. These changes are “pending”, a few things happen next. http://36.media.tumblr.com/6e54cfcb258fc9ad608d894fbe2d1eea/tumblr_inline_nt9126iW3m1s97th6_500.png
Once changes have been pushed to a node, a routine will run that automatically that compiles the main list in its queryable format every 24 hours. All list are auto updated using only the differences present in the files; so there is no need to redownload the entire list for nodes or clients.
Essentially what we've done is turn every Aurous user into their own search engine, we've sped up this process by localizing learning routines so your client can easily spot fakes, we've also strictly made clients filter out non-audio files, this shaves seconds off every request.
Of course as a user, you can disable BitTorrent/Aurous Network functionality entirely and simply stick to pure http searching, which still works fast as we've indexed millions of websites by their relevant data and is also decentralized to a degree.
Is it really decentralized?
Yes, anyone can setup a node for the network, nodes cannot be tampered with and if by some grace of god someone manages to, that node will be black listed by all clients on the network right away. Nodes are only there as a way to tell clients which hash to download and seed for other clients.
Even if all nodes went down, you still have your local list, the master list is just a way to have quick searches for the things you haven't looked for yet. But by no means is it required. You can share your local list, when you decide to search for something not in that list, the process will still be fast. Closing thoughts
This was a fun project creating a decentralized search engine, with this feature being implemented within Aurous we hope that even if the whole clearnet went down, you could still find the music you want on BitTorrent in a matter of seconds.