Crypto Hipster Presents: Reporter on the Ground, Episode 10; Pioneering Advancements in Secure, Collaborative AI Model Training and Deployment, with Jiahao Sun @ FLock.io

Jiahao Sun, the founder and CEO of FLock.io, is an Oxford alumnus and is an expert in AI and blockchain. With previous roles as the Director of AI for the Royal Bank of Canada and an AI Research Fellow at Imperial College London, he founded FLock.io to focus on privacy-centered AI solutions. Through his leadership, FLock.io is pioneering advancements in secure, collaborative AI model training and deployment, showcasing his dedication to using technology for societal advancement.

[00:00:00] Hello everybody and welcome to the Crypto Hipster podcast this is your host, Jamil Hasan, the Crypto Hipster where I interview founders entrepreneurs, executives, thought leaders, artists, you name it all around the world, Crypto and blockchain and today I have an amazing guest and I have an amazing guest because I know it's a spoke to him already at consensus 2024 and unfortunately did not record then but we are recording now.

[00:00:27] And you know I enjoy my conversation in the first time and I'm certainly certain that this is going to be an awesome conversation as well. I'd like to introduce my guest today who's the CEO and founder of Flock, his name is Gihouse Sun, Giau welcome to the show.

[00:00:49] Thanks, thank you Jamil for having me again and yeah I'd say hi to all your audience as well.

[00:00:57] Thank you very much for joining me and we'll start out with the you know asking first about your background what is your background and is it a logical background for what are you doing now. Yeah yeah I actually started off in traditional AI industry doing application research.

[00:01:17] I was head of AI at one of the you know big names traditional financial institution at the time actually figured that especially finance data side or it's actually a big scene like all the banks they would hold their own data right.

[00:01:32] In order to make up model that works good at AI model that works well right you need to have enough data to actually train learn from.

[00:01:40] But if you are trading silos between different banks or different containers I'll actually make a unified great model so I was thinking about this and then came up with the idea of a block you know block actually stands for federated learning fast block change.

[00:01:56] So it's a way that can use federated learning which is you know many of the companies already using today like Google Apple they use it to train your typing habits to predict your user behaviors on your own device without needing to take your data out right.

[00:02:11] But it's like centralized control you can't I mean you have to believe Google don't do don't do evil right so that's why I want a blockchain to join this consensus mechanism.

[00:02:21] So we don't have to believe in any centralized authority we just have this in the open protocol everyone can join training everyone can commit their data but they don't have to send their data anywhere out of their device. That's blocked yeah.

[00:02:36] Okay so that's what's what's what's what's what's all about and um federated learning what is what is some examples of federated learning what what is that all down.

[00:02:47] Yeah yeah as I just see a expand right of federated learning meaning um feder it's a machine learning mechanism but that. But it makes the training nodes better right here meaning that they they don't actually being gathered together.

[00:03:03] It's not like traditional ways so in traditional ways where we train or model we actually need to get all the data from everywhere right put them into one server and then train a huge model over those data.

[00:03:13] That's that's cool that's traditional ways but in order to make users privacy more preserved we are there are so many different ways that people trying to see if we can make that compute happen closer to the user.

[00:03:30] So by closer to user maybe we can say on your own device or on your own maybe a server at home but I doubt everyone will have that right,

[00:03:40] but you know all this you know possible ways that's our many of you actually having a compute somewhere decentralized or some are federated everywhere in the world, But can still have a gathered or aggregated model.

[00:03:55] So it's a bit different from local model training because if you talk about local model training that means that you have a model you're trying to own data and that's it.

[00:04:02] So for federated learning it actually yes trends of local model, but it will share the change of the model model gradients and then to aggregate with others so that actually can benefit from crowdsource the data training but still preserve your own privacy on your own device.

[00:04:21] That's it works. God. Yes, and you asked about something example sorry.

[00:04:27] So yeah example is that if you're using some phones iPhones right so your typing happens is predicted so whenever you type something down you will predict your next words right so you didn't become comfortable if Google tell you actually all your typing in my in a group.

[00:04:47] So you're looking at the example are sending over to a certain to the Google cloud right you were like I don't want to sound like you tried to know my secret notes in in heart on my phone right so yeah that's that that's what those manufacturers they promoted.

[00:05:03] That's why they promoted the federal learning to make sure that they can prove preserve user user behavior as our user privacy. God okay so you are leveraging the community right they'll sell to the creation on she and.

[00:05:19] You know your on she and decentralized AI models right how do you leverage a community. Yeah, yeah because in many cases communities don't really want to share the data or they don't necessarily you know stay somewhere together.

[00:05:33] For example if there's a sub community of people who are very interested in one very specific comic style character right they don't necessarily live together right it's hard to actually call them up to to to to drop their data into one place to travel model.

[00:05:50] So what the way flock to is yeah we'll just call it task and then you know whoever want to join the training just join that they don't have to be joining the training very beginning.

[00:06:00] They can join anytime during training process as long as they can provide their own insights to the model they can provide their own.

[00:06:07] Even a while I think it provides them are their own improvements to the model then it's all welcome and then the whole incentivization process is purely audit on chain so it's like transparent it's there to everyone.

[00:06:23] That's how we said about people to actually join whatever maybe interest groups they are interested in and then to to to co create models with the community. So let's talk a little bit about consensus you were on a panel.

[00:06:41] I tried to find the title of panel on the app and I can't I can't log into the app anymore so what was the what was the name of that title you know of that panel. And what takeaways do you have from that experience.

[00:06:56] Oh wow that's a great question I also forgot the title of the panel but I know I was on a panel in case of a men man stage panel together with Tommy Eastman from funding for digital services and Greg from a cash and also been from.

[00:07:14] So we all doing different traffic AI stuff in different different stages of different layers of this of this industry so yeah we had a great chat. I think I think James and a cash that all folks only decentralized to be used or decentralized.

[00:07:30] A computer where log more folks on the training training part of it so we actually already collaborated with a cash which they actually provide their services to public users so now users only cash can.

[00:07:43] You just run a template or of log and becoming block trainers so yeah it's great it was a great conversation we talk about talk about a lot about. Data, privacy, how you know decentralized AI is changing this, controlling decentralized AI. Yeah.

[00:08:02] Very cool so let's talk about that let's talk about that data right. Decentualized AI model training is seems to be necessary for industries handling sensitive data that leverages AI right what are some of those industries and what are some of the types of data that you focus on.

[00:08:23] Models. Yeah we start off with some healthcare industries because believe it or not the people who don't actually working healthcare industry they will.

[00:08:34] Feel like it helps care quite you know in terms of taking all this side they're quite outdated right but no actually they will be impeached a lot by federal learning companies in traditional industry like everywhere so.

[00:08:45] Yeah whenever I when I first actually reached out to healthcare industry like they all know you're doing better learning are you are bringing on change then there's no central government as this like a decentralized government that's great so yeah seems like everyone at least understand what's better learning in this industry so yeah so so that's a heavy use the industry I think because it's so sensitive and it's so.

[00:09:08] So so direct that all the regulators were actually put very heavy regulations on such industries right and other places like finance and mentioned right personal credit data or personal banking data those things are very sensitive and you the you don't want to send it over to any sort of party or a random startup telling you that I'm going to make your life better but yeah you have to give me all your back bill.

[00:09:33] So I you will have to think about it right it's not something that you would oh yes, okay take it. Right so many of those things and also personal assistant AI system know you in order to make a system app or assist model to be very good.

[00:09:50] You actually have to put every detail of the you into that model training because otherwise it just won't be as good.

[00:09:57] But in order to do so right you have to hand over all your data that's a dangerous signal right so it's like a dilemma between whether I want a powerful model or I want to preserve some of my details secrets but just getting a social model right so I think flop here is to just solve this problem like you don't have to.

[00:10:18] Or if you don't have to for for any of your privacy, but you can still get a very good model. So how I wanted to how you do that. Hey and up with a powerful model versus a so so model because if you're.

[00:10:32] Relivision the power of the community and the community is not not so powerful. Then how how do you get your you know your great. I think the best best data hyper data are always with the hands of the community right those are the data that never actually.

[00:10:52] To do that as an open domain data right because you know the domain open domain if a model can run very well in open domain it will run well on open domain already but the reason why we don't have this industry specific models now is because we don't have enough data to actually make more better.

[00:11:10] For example, it's it's actually a very. I'm. Trending thing in AI research we know that. Lama three right yeah. Metacess release Lama three which is over source seven eight billion large number and it actually give us give us a very interesting.

[00:11:30] Signaled that this model is actually very small comparing to what GPD is building but it trend on seven times or seven times larger high quality data sets come to other models then if I actually can run as good as a GP through front.

[00:11:48] But that's a great signal to tell people actually in this large number of age that we're already facing. High quality data really matters and those high quality data.

[00:11:59] You can be anything that they never want to put out on open domain right because if that seems already all the domain out of the imagine there will already be open source data sorry open source model who trend on already it's only those user centric user private data that can actually you know playing the last mile of data.

[00:12:21] And bring the last mile of the of the more of the AI training that say yeah to make the AI models in a really good. So you're saying basically the Lord even though you have these large language models that are gathering information from everywhere.

[00:12:37] It with the more specific to data set the more specific the population the better signal you get. Yeah, especially especially for industry specific ones yes. Interesting interesting so. I want to talk about.

[00:12:57] The current state of AI industry and we had a really great conversation going back to the history of it you know that saying you know when did this happen and then there seemed to be some kind of the flexion point around the 2016 timeframe.

[00:13:17] Where the city of an Amazon's price skyrocketed so there was a correlation there I wanted to find out what your assessment of the history and the current state is. Oh yeah, yeah, I think yeah I think I can sense it I guess I I discuss with you like.

[00:13:37] The end of real human generated data is 2016 right that was a date you know when.

[00:13:44] Yeah, another day to when the first new generated AI watches came out and then one year later transformer came out and then to done 18 j gibi one came out so basically that's the online where you know the intern net is it is now baby but it was more synthetic data then we actually thought so.

[00:14:07] If if a model is trend on such you know.

[00:14:13] Open open domain and synthetic data which has some maybe have some bias on a version of longevity then version two of jb you train those data and version three of jb train those data again those them those biases will be amplified all the way up.

[00:14:30] That's why I think real human data will be so important and it will be so valuable in the future data yeah I'll say yeah generate a real generated data I'm sorry I mean real data yeah will be so important in the future because.

[00:14:47] We will see soon like I've already seen some companies doing so so they basically will use something like a chorus style but there's all the answers and questions have generated.

[00:14:58] Any question you're asking or searching online they'll just give you generated answers but that that website is like indexed by Google so.

[00:15:07] Basically if there's a new large-scale model coming today when they look at all the online data that they'll basically they will read those data and thought they are human generated data. It's a trend that we're seeing in internal data stage and talking about some interesting.

[00:15:28] News I guess you noticed the story about Apple intelligence right they just launched in double double double DC I think.

[00:15:37] We could go yes we could go yeah so I believe everyone every industry players all the people I'm sorry all industry players their way of treating privacy they're all doing the way I just described trying to make the compute as close as the user right either on your phone or in Apple's way because.

[00:16:00] And this stage and then be that strong yet to actually wrong or large-scale model right so they use this private. Compute cloud where it's actually your eye cloud but then maybe only a few I clouds that's assigned to you that can't run more for you and then.

[00:16:15] Yeah that's their answer to the question so whether if my phone is not strong enough how can I do this LM or on my own.

[00:16:25] And devices right so they bring the private cloud to you and the animals tell me you say but still there's still some new calls to say because your.

[00:16:34] You wouldn't be assigned to a dead again the server because that doesn't make economic sense for Apple so basically many of you gonna share same cloud together just with some maybe with some cryptographic.

[00:16:49] And so that's a method to make sure your data is locked other people data is separated from yours but. Whether this. This encryption works or whether you know having a shared cloud somehow is. The other thing that's not a lot of security nothing is all you know all.

[00:17:08] A romantic tested when the real I was 18 came out.

[00:17:13] That letter this year I guess so yeah I guess the the the other from all this you know big players in the market is all they're all trying to bring compute closer to the user and that's a trend now I'm seeing this.

[00:17:28] The video is bringing compute to the to the users well right. They are constantly upgrading their their graphic cards right several years from now.

[00:17:39] Yeah maybe your phone can just run as powerful as the 90 40 90 graphic cards up today right so yeah that's basically what's happening in the market right now. Wow. So those those those some of the things that are happening right there's also limitations right a i in its current state.

[00:18:00] Right what are some of this limitations. Oh, I think limitations for AI's I I would I would very much interest with a multi model.

[00:18:12] The angle of my model in they are nowadays because it's quite the critical thing by money model with me like for example if you look at the picture and you ask the AI what's in the picture or you know what's.

[00:18:26] What can we imagine from the picture or their things so it involves not only just just ability to process perfect picture or not just ability to use languages or not just ability to do reasoning it includes all the properties together that what we call money model.

[00:18:45] I think that's the that's the first thing that I guess currently AI's limiting but we are seeing some new progress is and somehow gp five is.

[00:18:57] Kind of kind of like tackling on this small model thing, but we have not yet right it's still not published yet the other thing I think is the embody.

[00:19:09] In body perception thing like AI nowadays is still arguments right but when we talk about AI like 10 years ago we are sorry 20 years ago maybe.

[00:19:19] But I'm talking about it and it will also give out robots right those real robots actually can work with you be your be your nanny at home or I treat you be your cook.

[00:19:31] So these requires a bit more than just algorithms right it's still requires engineering work robotics for that can actually can't put a brain of AI into a body that that body can actually work and behave like human.

[00:19:46] It's it's quite a hard thing it's harder it's harder than we thought it's like for example Boston dynamics they'll be in actually for like 20 years but but their but robots it's only just running like like baby right still haven't reached the age for for adult. You know I.

[00:20:10] Over the past week week we can have my my kids graduated from. Elementary school middle school right and I've been I finally put the pictures on Facebook but I had been hesitant.

[00:20:26] You know to do that because of the new age of AI where people can replicate you know people's pictures right.

[00:20:35] Yeah so how do you how do you add security for people to this AI age you know so that people aren't fearful of you know short sharing pictures you know with other.

[00:20:50] So there's some tricks actually you can't you can be they are even taking a technical way for example you can always wear. A glass that's that have a red and green lenses you know different lenses you know.

[00:21:05] Redding one side and green on the other side so that will actually confuse AI if they want to see setting a you then those photos are actually on useful or you can actually wear.

[00:21:15] T-shirt with the head on it like with the face of someone else or wherever it is because on that case when AI see the photo no to their two hands basically this is not a real person.

[00:21:27] So yeah so they're these are the very tricky tricks many of the engineers. And the early days when we generate AI came out right they put together so they put together to treat those systems.

[00:21:40] For example the face recognition systems right they just wearing those glasses or wearing t-shirts like that to actually treat this system. So yeah this is just showed yeah joke aside when comes to the real content that you're putting on line you're hearing about.

[00:21:58] Yeah people generating or seeing that in or in person you're in person at the you right trying to scam others but yeah this is happening a lot and I do see a lot of those things happen.

[00:22:11] Recently right I I remember there was a news there wasn't like a senior manager. There was a scammer who actually in person at a senior manager of a company and actually got a one million away.

[00:22:25] Just by putting up a fake face AI generated a facing a video zoom video call people just believed it I was like wow okay yeah so there are actually ways to detect those so there are different.

[00:22:39] I even know some of my friends are running their new protocols find you know open up either a new AI start company or corporate company to do in the own chain.

[00:22:51] Data verification to make sure those are real human generated data it's not actually setting this is not a. Generated AI thing so yeah and in argue the ways there are ways to actually detect them.

[00:23:05] There always ways to detect them but you always need to tip up your argument so it's making this it just it's just like a race between soldiers and police right you you you always have to keep up with it so maybe some arguments you can you used to before like and protect soldiers but.

[00:23:24] Maybe a week or two weeks later then they have better arguments so you're police are with them perhaps to keep up again it's it's a race but I think. On on the good hands of of those engineers they should be able to.

[00:23:41] I'm always keep up with the best argument or maybe in the future we will have a standard in this tree saying okay all the photos have to pass this standard or past is argument check in order to publish online right so you have your reputation tech.

[00:23:56] So yeah there are solutions like there are many people trying to sell and bring solutions to this and I'm also very expecting this mission as well. I don't have to worry about some of you using my likeness is still a million because I don't have a million so.

[00:24:16] But there are risks you know risk so the integration between decentralized systems and AI can mitigate. Those risks right how you how do you do it like to have a through had that harvesting and miss use and how do you have that integration help.

[00:24:38] I think the very good thing about.

[00:24:41] Blockchain is it's on change for right and everything's transparent on change so yeah so either the point which is talking about that data authentication or either about the training about the malicious attacks that we can actually rule them out by the system but.

[00:24:59] But can you know the problem is that if you are playing even you'll be slashed if you're playing good you get some words and then eventually you know the bad guys will be ruled out so all this.

[00:25:11] Fundamental functionality of blockchain actually can help AI very good in way because in traditional AI all the training everything is covered by a central. And a centralized entity right either is a company called Google or it's someone or some engineers who just build up an algorithm.

[00:25:30] It's good but it's always you know not the best way not the best open way to government so I think with blockchain this is changing a lot for this industry that what a block trying to promote as well.

[00:25:42] So let's let's let's dive down that a little bit there have been many projects in crypto recently they have used the word AI. To you know to help boost their the price of their you know project.

[00:26:00] Okay, the only thing that they have to do with AI is the fact that they put AI on their slept AI on their name right yeah I know it's a lot.

[00:26:12] Yes, so how do we root those out and then one is going to be the future of AI and crypto together. All right, so that's why actually I have block and other you know real breeders in the market we are actually posting a event around the cc called.

[00:26:31] We call the renference it's like a walk in in short renference when they always like real non-blocking AI summit there's only just for those builders who actually have their products.

[00:26:43] Yeah, you know back to a question how do you rule them out I think narratives nowadays everyone bring huge narratives to their projects right so they have to deliver they have to have their real product and I can be you.

[00:26:55] That I think that the goal is standard it's maybe it's mirror ironic in web two world are having a product that works it's like a minimum requirement for any company right but in web three it's like a highest standard already because many of the pros and they don't have nothing they just put out their narrative.

[00:27:11] So I guess yeah this is a very important thing I call users who trying to either invest or trying to use up those tools you know make sure you can you know log onto the website try their products whether it works with it works in a way that you think it is and that's a good test for yourself already right so you're all research so yeah that's what I what I want to say about it.

[00:27:34] You know what was the other question or. The intersection like what what what what's possible yeah. Yes I think nowadays even training becomes more possible so two years ago when we do flock I had a feeling that you know doing training from grown up that sounds like the.

[00:27:57] In front of a lot of people if there are companies so they're doing so I feel like it is a bit too far away but nowadays I even see people doing one point two billion large likes model on their phones and they worked well so I think.

[00:28:12] I think that's the the ages coming so if I put it down take it down into into like layers I think. Real air are the where all the places where this in a sexual work from the top that.

[00:28:24] The agent layer or occupation layer like people who use algorithms to build aging work for them so I.

[00:28:32] Agents or even just some aunties that generated the arts so yeah you would say it's not a I mean it actually yes used AI to generate generate art and and FTs or generated an agent that can help you maybe wake you up 5 or 5 in the morning right so yes it's still somehow AI but they are all on the application layer.

[00:28:53] But I would see I would say there will be a boom of AI agent soon in in this in this new cycle and then maybe there'll be some kid or apps with a solid you know consumer apps are always you know the closest thing to the users and an A can.

[00:29:09] If there are one of those apps actually you know can can spread out to the general public widening then yeah that'll be a big win in this layer so in the middle we call is the AI layer or the sophisticated layer so that's where all those algorithm stuff can mean to like flock like many others or even doing these aeronautic proofs doing fully homomorphic encryption so yeah it's okay.

[00:29:35] Like what they do if they connect with user data connect with users compute connect with the task they will they want to create and then they use their own ways to make make sure user data users data preserved or make sure user data is authenticated.

[00:29:50] Right is it's real is not fake and then to make the air motor better right so using our frameworks training frameworks learning frameworks to make this happen and then the very bottom layer basically as I mentioned that just the infrastructure layer already some players we're like like a machine at the edge of the right doing doing decentralized compute.

[00:30:13] If you don't want to have all your computer handled by AWS which is so centralized and they can be yours or right so you will naturally think about what a you know we have to centralized that you use is everywhere in world everyone's maybe.

[00:30:26] Home right and then they connect to a network and whenever I want to use it I can just use it from different networks for a by access network and access a different graphic cards right it will be way cheaper.

[00:30:41] Right and then apart from compute there will be storage is data storage how best or my data if I don't want to put them all my phone because I only spot a 128 I follow up.

[00:30:52] But my data is like 200 so I can put somewhere even the decentralized way is putting all instead of putting them all maybe Google's Google's driver I cloud I want to make sure it's teaching us enough is well secured so there are decentralized storage protocols like in old days like ocean protocol right and then.

[00:31:13] There are new ones you know profiles and IPFS or we they're all doing this so yeah I think basically this three layers the other places where I see the intersection makes sense and many interesting. Companies players actually now in market they're active.

[00:31:33] I'm trying to envision this all three levels are working fine we have decentralized AI. You said we're going to have consumer products let's go back to your comment about robots.

[00:31:46] If we have if we have decentralized AI infrastructure everything's in place at that time do you think then we will be able to bring those robots into the thing and everybody will robot as a pet.

[00:32:00] I think the industry is moving very fast and yeah yeah that's not my specialization right it's not where I work in but from the progress I saw recently yeah it's getting up.

[00:32:17] It's but if you say by the time so basically the sense of the question is whether the algorithm inside will happen quicker or the robotic aside which I think are going to be quicker yeah.

[00:32:29] Because nowadays all the sandbox training and everything just like just like a alpha go several years ago right so how does alpha go learn in section short period of time because.

[00:32:43] Machine don't need to stop machine don't need to sleep right it will just give them all the history of the data about how people play the game goes and then to it will just learn everything and then two of the go agents will play go with themselves.

[00:32:59] So that can't happen like in humans time maybe 10,000 years so basically you when human beings are facing the alpha go argue that they are facing someone who played go. For 10,000 years already so we argue them yeah that's why our algorithm progress will be always be a bit faster.

[00:33:17] Confirm you robots right when it comes to real robots we need to know if you tested the in real human time right something that you can just simulate in in in in a environment in your computer right you have to actually build them and then you actually interact with them to see whether there are folks or flows.

[00:33:37] So yeah of course for this so walk it stuff thanks a bit more on number time.

[00:33:45] Guys and that means we're not going to depend on robots however people are going to need to learn a guy and have the skill set you know to be able to build so how can they best start to get acclimated to developing the skill sets to help build out the AI future.

[00:34:05] Oh I think nowadays. There are already so many different tools out there right. So I guess of course a registered account on an AI and try out all the try out all this AI tools because it's cool with fun and also to see the potential way.

[00:34:21] But I don't think everyone has to be engineering the future to get a job there might be some of them who do.

[00:34:27] In the comer science and they become engineer to to fix AI algorithms in the future right but there will also be another bunch of jobs created by AI assisted works right so now that you can create only to create a Microsoft slide yourself.

[00:34:47] But you still maybe need to dream about or train about to make it a slide in a distinct art style way that obvious right or.

[00:35:02] Or you can use AI sorry like to be to help you do a lot of work through not on the job right but then maybe cost a month customization can be at the service and you can provide to other users.

[00:35:15] So yeah there are many you know I will people will always think like AI gonna take over human beings jobs and a lot of that right but I think it's it's the other way around because yes many of the repetitive jobs will be a place.

[00:35:31] But then the human intelligence will speak it to someone else somewhere even more interesting right they can do a lot of other things instead of. If that I have to stay on what we're putting it for which is not the answer anyways.

[00:35:46] It's all fascinating to me very fascinating so. Yeah I enjoyed speaking with you thank you very much for your time today. This was a demo. I have one last question. Yeah this one's easy. I'll see you drop a voting.

[00:36:07] But you know still how can people find it more information about you and about flock. Oh yeah just our website blocked.io so flock.io. That's yeah exactly it we put everything online and then we have our training platform online as well.

[00:36:23] So now everyone can actually join as a training node to our models. We have we have yeah we have tons of models there that you can play with that you can actually join for training to become a part of this movement of this country. Yeah I'm enjoying this.

[00:36:41] Awesome thank you very much for your time today.

Crypto Hipster Presents: Reporter on the Ground, Episode 10; Pioneering Advancements in Secure, Collaborative AI Model Training and Deployment, with Jiahao Sun @ FLock.io

Follow Us on LinkedIn

Important Links

Powered by