Since Google likes to tackle complex problems that don’t have an obvious use right away and they have tons of cash and more smart people than most university faculties, I have an interesting project for them to tackle.  I call it the Crowd Source Cloud.  The idea popped into my head when I was listening to “This Week in Google.”  They mentioned Walmart’s desire to crowd source delivery.  Then I started to think about all of the crowd sourced distributed computing projects in the past years.  The two that came to mind were the SETI project for finding intelligent life and the Folding project for cancer research.

Both projects basically leveraged idle cycles on machines across the web to solve really complicated problems that could be distributed across multiple (millions) of nodes.  This got me to ask the question why not come up with a crowd sourced Cloud that allows a broker such as Google to offer inexpensive IaaS nodes or even PaaS infrastructures via the same model.  There could be hypervisors installed on everything from Desktops, Laptops and Phone phones that run cheap virtual machines images.  They’d basically use the idle cycles of these machines and could move from one machine to another and be more or less resilient.

Their expertise in doing really complicated stuff plus super high speed residential broad band like Google Fiber kind of makes this type of service feasible.  I can think of a ton of challenges from networking, security etc but from a pure academic perspective I’d love to see this type of solution.  I’m sure their are academic and commercial uses for such a virtual cloud but again, I’ll defer to the folks way smarter than me to figure out the use cases.

What are your thought’s?  Have I just taken too much cell phone radiation to the head or is this a worth while research project for “The Google?”

 

Crowd Sourced Cloud – A potential project for Google
Tagged on:

13 thoughts on “Crowd Sourced Cloud – A potential project for Google

  • April 6, 2013 at 11:24 am
    Permalink

    I like the idea. It might run into a couple of classic cloud problems though. Firstly, how do you control where the data is and under which country or even state law is it governed? Secondly, if data is on my phone, who owns it? Me, the phone company or whoever it was that launched it into the ether? 🙂

    Reply
    • April 6, 2013 at 11:27 am
      Permalink

      Good questions and I’m wondering is the answer is encryption for both classic cloud and this solution as well.

      I also checked out you blog. Pretty cool.

      Reply
      • April 6, 2013 at 11:31 am
        Permalink

        Thanks Keith, much appreciated 🙂

        I like the idea of stuffing the lawyers with encryption. Basically it doesn’t matter who owns it under law, only the person who put it there can do anything with it anyway. I like your style 🙂

        Reply
  • Pingback: VirtualizedGeek Tech Talk Episode 8 – Virtualized Geek

  • April 9, 2013 at 6:48 pm
    Permalink

    Just watched your Tech Talk Episode 8 video and had some time to think about it:

    Google and Yahoo already do this, it’s called Big Data/Map Reduce. Hadoop came out of Yahoo, build after the model of what Google was already doing. The recent Stata conference has a lot of information on Big Data: http://www.youtube.com/playlist?list=PL055Epbe6d5ZEYjq8K7CA37-1fEST-yWe

    The classic problems Jon talks about are:
    – privacy, encryption won’t safe you. You still need to send the key before you can work on the data. Al though there are very limited experiments where this works.
    – needs to be tasks that can be easily divided, very few are
    – on a desktop your network connection is probably not so well suited unless it’s sitting in an enterprise environment. Downloading code from the Internet to run inside the corporate network is probably not allowed. Unless of course it is running jobs on desktop machines is a corporate policy and everything has been checked to not be able to escape.

    And:
    – most hypervisors don’t support scaling dynamically (very well), so the VM would occupy the same amount of memory or CPU all the time if it was running on your desktop. You can obviously suspend the VM, but then networking is gonna timeout. So a VM probably isn’t very well suited.

    The best situation which applies would be private or public clouds with spare compute time. So like a market place where people can share compute time on their servers. Not desktops, laptops and certainly not phones (batteries ?). OnApp does this for CDN-services: http://onapp.com/cdn/features/cdn-marketplace/

    Reply
    • April 9, 2013 at 8:23 pm
      Permalink

      Those are some really well thought out issues. I think the Hadoop type uses are valid. I’ll have to check out the video.

      Reply
      • April 10, 2013 at 5:40 am
        Permalink

        The problem is really with the data, privacy of the data and sharing of the data. Most companies would think their data should be kept in house.

        Anyway as I mentioned above, I think most Hadoop workload probably need a lot of data, so networking is key.

        I messed up that part: “Al though there are very limited experiments where this works.”
        I meant to say there are very limited experiments where they don’t send a key, they are able to do calculations on the encrypted data.

        Reply
      • April 10, 2013 at 5:42 am
        Permalink

        It’s a 100 talks in 3 days. They posted a lot of them online.

        I’ve been at bigger conferences, there is a conference in Brussels which is FOSDEM (free and open source software developers), which is 500 talks in 2 days and the what is cool about it: the conference is free.

        Reply
  • May 13, 2013 at 3:26 pm
    Permalink

    Here are some people trying to do something like you mentioned, but at a completely different layer of the cake. In this case they are running other peoples code inside the browser in a secure sandbox:

    http://blog.grimwire.com/#2013-03-20-breaking_the_single_page_with_webworkers.md
    http://blog.grimwire.com/#2013-04-04-grimwire.md

    A webworker is something that is part of the browser which allows code to be run in a separate thread (so it can run on a different CPU for example). But as a webworker does not get access to the page (it can only send and receive messages) that makes it secure to run code from different sites on the same page.

    Reply
  • July 29, 2013 at 2:01 am
    Permalink

    A small startup has just launched a beta service that does just this – check out http://www.slicify.com.

    There are a lot of good points raised above – the principal hurdle to overcome with distributed computing is security, and this is in both directions.
    For the hoster, they want to be allow the guest to come in and run a VM on their machine, but they don’t want the guest to be able to access their resources (imagine if you let someone use your compute time only to find they scanned your network and installed a bunch of malware on your other machines).
    For the guest, they want to be able to run in a secure environment where the hoster can’t get access to their code/data.
    For the first one, it’s reasonably solvable if the guest is sufficiently isolated (via VM/firewall/etc), however the second one is basically impossible. There is no technology in existence that lets you run a program on a computer that cant be cracked by someone physically sitting at the computer. However, this is no different if you’re going to run your code on EC2 – Amazon have free access to all your stuff too. If anything, splitting up and running your code across thousands of different hosters is more secure (in the sense that nobody has access to the full data set), than trying to run it with one provider.

    Reply
  • September 13, 2013 at 8:47 am
    Permalink

    Maybe with TPM/SecureBoot something could be build that guaranteed the host is running the intended software:
    https://wiki.openstack.org/wiki/TrustedComputingPools
    http://blog.scottlowe.org/2013/09/10/idf-2013-enhancing-openstack-with-intel-technologies/

    Let’s say Google made the hardware and software and companies could buy Google Hypervisors machines and them in their own datacenter. And the TPM would contain a key that identified the machine is still the machine that Google build.

    Now how do you get the data and or VM-/OS-image on the machine so only the machine can decrypt it. You obviously would have the host create a private/public key pair in the TPM. And you’d encrypt the data/VM-image with the public key before you send it to the host. It would be a bit annoying, everytime to start/deploy a VM you’ll need to encrypt it with a specific host-key.

    If done right that might actually work, though.

    Obviously Google in this example could be any company that wanted to deliver these services. And the software and hardware wouldn’t have to be from the same company.

    Reply

Leave a Reply

%d bloggers like this: