After having reviewed and introduced to you the new and much improved VSee, high-quality videoconferencing web-based solution (Win only), I am publishing today the full text transcript and the streaming audio recording of this very interview conversation with Milton Chen of VSee.
In this online interview Milton highlights some of the key unique characteristics of the unique P2P videoconferencing approach used by VSee such as its unique ability to display high-quality full-motion video with lip-synch while using only 100 Kbps of bandwidth.
Photo credit: Robin Good and Milton Chen using VSee
Milton Chen also clarifies the differences that separate Skype, now capable of videoconferencing numbers itself, and VSee personal approach to P2P video. He dwells onto clarifying the issue of Skype supernodes and why VSee has chosen a different approach around it.
Most interesting is then the discussion covering research utilized by VSee to better understand when and how videoconferencing is a true benefit to an online meeting and which circumstances and variables need to be evaluated when deciding about the use of video conferencing tools. Highly interesting is also research data answering my own question about the validity and effectiveness of using video conferencing for online language learning. Milton answers that promptly and exhaustively.
New uses of real-time collaboration features, like simultaneous editing of documents by two or more people are also explored as fascinating and possibly highly efficient modes of co-working in the near future.
There is indeed a lot of useful information packed in this conversation with Milton Chen, and unless you enjoy reading its textual transcript here below, I strongly invite you to give a good ear to this passionate videoconference-based exchange.
Here below is the audio streaming record of the full conversation. It lasts 34 mins and, if you don't have a fast Internet connection can be also downloaded as an .mp3 file from Ourmedia.
Click the play button and wait momentarily to listen to the interview
Milton Chen - Photo credit: Robin Good
Robin Good: Hello everyone, here is Robin Good, live in Rome, Italy and today I’m not just connecting, I am videoconferencing live with Milton Chen of VSee Labs who is connecting from where? Where are you, Milton?
Milton Chen: I’m currently in Silicon Valley, in California right now.
RG: Fantastic, and I’m seeing Milton beautifully in front of me. With the pictures hanging on the wall right behind him I can kind almost see individual people and the color of their clothes. That’s really nice quality viewing, and this is thanks to Milton’s own technology, correct?
Milton Chen: This technology came out of Stanford University where I and a couple other people - we did our graduate work in video conferencing. It was mainly, I guess, also two professors, Professor Terry Winograd and Professor Pat Hanrahan. They oversaw this project as well.
RG: Oh, that’s right, and I recall you telling me this. And are they still involved in the technology that is behind VSee?
Milton Chen: Yeah, they’re actually serving as the technical advisors to VSee the company right now.
RG: Great, and so VSee, or VSee Lab as you can find it on the web by placing W-W-W-dot-VSee, spelled V-S-double E, lab, spelled L-A-B-dot-COM, is the place where you can find the fabric of collaboration. That is the tagline for this video conferencing technology that is so cutting edge that you’re going to be hearing about it quite a bit more this year. But I don’t want to uncover too many cards right at the beginning of this game. And I’m here to explore with you what Milton has been doing since I saw him last time in San Francisco about, just a little less than a year ago. So, Milton, what has happened to VSee and what are you up to?
Milton Chen: I think the biggest thing we’ve been busy with is improving the VSee product and there’s a couple lines of improvement we’ve made. One is on the bandwidth management control.
So right now, Robin, if you are opening up the VSee network statistics you can find that by moving your mouse to the Windows taskbar, at the low right corner, probably around your clock. If you click on that, go to "options," and select "show network statistics", that will show you how much bandwidth VSee is currently using.
You should see we’re using about 100 kilobits per second, for this level of video quality, so most people when they see this is a little bit shocked. They expect something like a typical H.323 system, quite a bit more.
RG: Yes, guys, let me testify to that, that what Milton said is indeed true. What I am looking at is saying that I am sending about nothing more than really about 100 kilobits, and what I’m looking at is good quality video conferencing that is smooth and in lip-synch.
The windows I’m looking at are both mine and Milton’s own and there’s sized about 320-by-240, if I am correct, or even something more than that. So on my screen they look really sharp, and the image is good quality. Please shake your head left and right a bit, let’s see Milton how good it looks. It’s good it’s definitely good. If you trust Robin Good’s evaluation on collaboration tools, this video conferencing quality looks good to me.
Milton Chen: Thanks, Robin, but one thing that we spent a lot of time on is minimizing the bandwidth usage. Because VSee is designed to, I guess, go over the public internet, where you have very difference bandwidth available.
A lot of times the congestion varies, so we want to make sure that VSee uses absolutely the minimum of bandwidth so that bandwidth variation will have the least amount of impact. We try to do this. And the other area where we’ve made quite a bit improvement is to VSee’s peer-to-peer traversal technique.
VSee is very much like Skype in that it is a peer-to-peer technology. So my audio-video goes right from my computer to your computer crossing through a firewall, and net translators (NATs), and this is designed to make the technology extremely scalable.
But where VSee differs from Skype, in the peer-to-peer sense, is that VSee uses a centralized server with directory if you want to do the management with the connection. See, in this case is, unless you’re talking with someone, VSee would never use your computer or your network to involve other people’s traffic. So we use our servers for that. So that’s the key difference between Skype and the VSee model.
RG: So if I understand correctly, while you’re using a peer-to-peer model and we are connecting point-to-point directly to each other, differently than Skype you’re not using individual nodes, so-called super-nodes, as computing resources that can be used to map connection for other users without actual users being consciously aware of it. Does this bring some cons or only some pros? I mean, if Skype is doing this, there must be also some pros to it outside the complaints that we are making. So what are we giving up when we are using VSee in this respect?
Milton Chen: From a customer perspective, as an end user, you’re actually not giving up anything. The audio and video streams directly from my computer to your computer. So you have the advantage of lower latency with less routing and all these things. And what you’re giving up is on the hosting end. It’s more extensive for us to maintain the VSee infrastructure because in the Skype model yes, anybody could be selected to be a super-node. You don’t have a choice, because, by signing the user agreement, you basically say you have to: You could become a super-node if Skype decides to choose you.
And basically in the VSee model, is where you would never be chosen as a super-node. Essentially all the functions of super-node are handled by the VSee the company. Typically, in a deployment situation, this is less taxing on your resources.
RG: Okay, that makes it more than clear. Good and what are also the changes or the abilities that I probably forgot about in terms of how many users can connect together in a video conference using VSee?
Milton Chen: There are typically two bottlenecks for how many users you can connect in the same session with VSee. VSee is designed to do multi-party conferences. So for example, we have done work with the internet to communicate linking up to 50 different sites together, with VSee using multiple computers. But typically, ...I’m here using a Comcast, a cable modem connection with which I could have three, four different locations, connecting all at the same time. And the exact number can be calculated by taking your internet upload speed and dividing by roughly 100 kilobits per second. That gives you how many other links, or people, because VSee is peer-to-peer, so if you talk to let’s say five people, VSee will send your video five times to these, so that becomes your maximum number of connections.
RG: So what you Milton is saying is that the capacity number of video connections at the same time that I can handle depends on the amount of bandwidth that I have available, and in particular, my up-stream connection bandwidth. So, if I have one megabit available to me, I would probably be able to send out my video stream to about nine, ten people at the most.
So, that’s interesting to know, and that is also easy to manage for people once they know that. So I guess it would work very well generally with two, three, four people, with small groups while enterprise users can certainly enjoy larger meetings as well. I don’t know exactly what your market and your target, preferred target customers are, nonetheless, you sent me a very prestigious list of names. Tell me again who are you favoring? Those really these big fancy names and prestigious industries or are you working also for the small guys?
Milton Chen: As far as we’ve been working with a handful of customers very closely over the past few years. In particular we’ve been working with Shell, the oil company, for quite a long time. Right now they’re doing a pilot of VSee internally. And the reason we were very interested in working with Shell is this specific angle of doing this test in video conferencing and collaboration is really about their interesting change in their fundamental business process and how the corporation works. So you imagine a company like that, they have people working in many countries across the different time zones... very spread out... so... traditional communication management issues come out quite often.
Other large companies are essentially using technology like VSee to enable to make the business run better. So there’s a lot of high-level use. Like for example...let’s say for example you hire me to do some programming job, and half way through the programming job, you ask me "how it’s going, are you on schedule?"
Well the interesting thing is for example, that if I said, over e-mail, "yes, I’m on schedule," versus on the phone, where you can look me in the face and I say, "yes, I’m on schedule," or "eh, maybe, okay yeah, I’m on schedule."
There’s a lot of hidden, non-verbal information that’s coming to you the manager and which you would not be able to get otherwise. And so, there’s issues like that where exploring to figure out, just how does streaming video impact the business workplace. And which are those specific cases where you reallyneed to have video.
Because one of the things we’re finding, traced back to our Stanford research days, is there’s a lot of times where you don’t need to have video, in fact having video is a bad thing. So understanding those cases versus the times where you must have video that is extremely interesting.
RG: And yes I must totally agree with Milton on this point. I just had a fun episode a few minutes ago. I don’t know if I have cut out this from the final edit of this podcast or streaming audio or not... but just a few minutes ago you may have noticed in the audio Milton stopping for a few times and laughing gently at whether to proceed or not.
And that was happening because Milton is also seeing me. And I was trying to order an orange soda from a friend here who was going out to a snack bar. So every time I was turning around to order the orange, Milton was stopping because he was seeing that I was turning around and not paying attention to him.
But this never happens when I have a normal interview. I can order all the things I want when somebody else is talking because they don’t see what I am doing. I could actually, you know, be leaving the room while they answer a question.
But that wasn’t the case today; video really made things a lot more transparent and while the orange soda just got here this second, - thank you Ale, and here it is, cheer to you, - I’m curious to know from your research, marketing, and a little experience with these customers, what are then the situations that would seem to be augmented by the presence of video?
Milton Chen: Okay, that’s a great question. So what we’re finding is, for people you don’t know as well, for example, let’s say you have a team, let’s say a company of five-six people and you have a new person joining the team; let’s say you haven’t worked with this person before. And this person can happen to be not physically located with you. In this case, to be able to see this person, it is just critical. You’re able to build a trust, able to absorb this person into a team to work effectively.
What we find is when you know this person very well, the value of video goes down substantially.
And then the other thing to think about is the type of task you’re doing. For example if let’s say, you hire me to write, to develop this webpage or some Java programming, you don’t really need to see me. You need to see my code, look at my final result.
But let’s say we’re trying to do brainstorming, or trying to do some of this very high-risk decision where there is a lot of intuition and where things come into play. And with things like negotiation, trying to hire someone, you want to - you try to get a read on this. Honestly, in these cases video becomes critical.
So the way we sort of think about this is it’s almost a two-dimensional matrix, on one axis you can rank how well do you know this person and on the other the type-of-task you are doing.
How risky or how analytical does it have to be?
And there you can basically map how when you need to have video and when you don’t need to.
RG: Interesting stuff, definitely. If you have something that you guys have written, or any research referring to that, that would be very useful to refer in this exchange we’ve had. Thank you for sharing that. And so again, it is how well they know each other and whether this is a critical a task or not.
Now I have a situation, a scenario in which I’m in as some kind of advisor in a research study that involves the exploration of the use of real-time videoconferencing tools for language learning online. What do you see there?
Milton Chen: Regarding online language learning, there’s actually quite a bit of academic research work that has been done in that area. And what we find is that by being able to see the people personal lip movement, it does help quite a bit on people’s ability to pick up the verbal cues.
So a lot of the classic studies when you look at a face, people try to look in a triangle pattern. People look at a person’s eyes, and their mouth. I’ll try to send you some pictures of those later on. And so even when people actually just look at someone, look at you right now, these people scan the triangle pattern, it’s very normal. And to be able to see that person’s lips move, especially if it’s lip-synched with audio, may help quite a bit.
Now, also there’s a con aspect to that. If let’s say the video-quality is below a certain threshold, people have found that the video actually disturbed the audio. They say, what they’re saying is, if the video quality is below a certain threshold having video is actually worse than just audio alone. Because now you’re confusing the person with these various cues, so that’s why you have to be very careful.
On the thing of lip-synch again, here is something that is very critical. If I’m observing this person, the person’s lip is not in sync - there’s no lip synch - that again, that is worse than having no video at all.
So I guess in some ways video overall is helpful, but be careful, there are a couple of danger spots. Avoiding those danger spots video is very helpful for people to learn language better.
Milton Chen - Photo credit: Robin Good
RG: Thank you. Very useful advice again, very interesting. And let me switch and jump back again into your technology: VSee Lab, or VSee, I should just say VSee, that’s correct, and you can nod with your face I can see. Good, I wanted to check, just what are the few extra features that people who may not have the ability right now to download it or try it out because... - did I have to download something? Remind me! What are the installation, configuration requirements?
Milton Chen: Well, VSee is designed as a plug-in for browsers so you do download something, but it’s more like a transparent by just going to a web browser, using a Firefox or Internet Explorer with things. When you go there, then, when you go to our webpage, then you will be asked, "Do you want to install the extension to your browser?" Once you’ve installed it then VSee will be part of your web browser experience.
In fact, what VSee will also do, is that it will allow you to configure VSee as a plug-in for example to your e-mail programs. In that case you can send what they call a "URL handler", in the sense that you can pop VSee from any other application. The way we designed VSee, rather than thinking VSee as an application, VSee is really a very small lightweight tool. Depending on the conferencing you are doing or some other task you are doing, you can have the video and you can just open that from that application.
RG: So it can be said to be some kind of ideal contextual tool, enabling existing applications to extend their abilities into video-enabled collaboration. That is the future and you are there. So, I mean the right place, that’s good. Now, let me understand a little better. You said you are a plug-in for browsers and you mentioned Internet Explorer and Mozilla Firefox. Is this just for Windows or also for other platforms?
RG: How difficult, expensive and time-consuming is that work compared to the amount of work you have done until now?
Milton Chen: I think I’m not quite qualified to say that. I think the core of code base is regular C code which we can compile to different platforms. That’s pretty easy. But we found developing our little window, there’s functionalities, like little hidden cases to take care of, for example how to get the most optimized audio and video capture display, all these things. So we actually spent, on the Windows platform, a lot of time optimizing for these cases. Now, because I don’t have, and on our team we don’t have a person that is experienced developing for other online platforms so it’s hard to say how much time we need to spend doing those optimizations to get a very sort of compelling experience. And then hopefully it wouldn’t be that difficult so we could actually get this done, even with a very small team.
RG: Thank you. And I must say that VSee is a very compact tool. It’s one of the very first few tools that enable itself around itself. It doesn’t pull up a whole cathedral of controls and menus and application windows and dial-up box. There’s just the video windows, because what are we doing? Videoconferencing! So that’s all we need! And the little extra stuff we need is embedded in those windows.
Nothing could be simpler, more essential, easy and intuitive to use. There is a little chat button that allows pull-down, non-intrusive simple chat window to be available right below your video window.
There is a muting button that allows you to mute your microphone and also a nice visual control that signals whether your audio is going through, whether you’re silent, and so that enables you also to see other people, to mute them but to see if they’re talking or not. I haven’t seen anything to freeze myself, but maybe there is also that?
... There is! Indeed. And you can also fade yourself out! So there is basically everything I want it to have. The quality is good, the frame rate is high, the bandwidth is low, so it comes down to know, besides when we’re going to see Mac and Linux version, how much does it cost?
Milton Chen: Right now our pricing is aimed at an enterprise, I guess high-end business users, although we would like and thinking that sometime at the end of this year or sometime offering some kind of consumer version of VSee. That would be sort of is essentially would be extremely inexpensive or even free. We haven’t figured out what that market or business model would be. So right now we’re a fairly small team, we’re just focusing on just making sure when people come on the window, for now, to have a fairly light way and compelling experience.
RG: Okay, but we do have some very large organizations listening to us. I mean international organizations, some multi-national corporations. There are some smart people out there Milton listening to us. So since I would recommend them to check out this tool, what can they do, first, to try it out? Second, if they’re a big enterprise and they have the big money, how really do they have to think? Are you close to anything that is out there? Can you give us some hints?
Milton Chen: That’s a great question. So we give everybody a free 30-day evaluation, so anybody can go to our website and can just play, you know, just do whatever they want, really test-drive it. And we actually really like this so that people can really experience the VSee technology first.
And then, in terms of actually purchasing the VSee, there’s actually two models you could do.
The first one is you use the monthly-subscription model. In that case, I would probably say the same as WebEx and Live Meeting. And so we’re also a small player and whatever others charge we charge the same a month, just to match that.
Second, instead of a monthly subscription model, you can also buy our backend server and host it yourself. And this tends to make sense if you have a large organization. Let’s say you want to have, let’s say ten thousand or 100,000 people in your company and you want everybody to have VSee on their laptop or desktop. And the most economical way is to give you our server, put it in there and everybody can have it. The pricing for that the pricing for that actually depends on the exact number of people you have. Basically, we give you a quote.
RG: Okay, so I was going to say what is the cost for 1,000 people but I don’t want to make your life too difficult. Unless you really signal me that you want to give me that price, I leave people to find out by getting in direct touch with you. I understand it’s a delicate thing to deal with these large entities, so I leave it to the skills of your business marketing team to do that.
And I am very excited, I like what I see. And it’s not a rare event, but certainly a pleasant one, especially in this collaboration and conferencing area, and also knowing that there is a team of people that has a strong background in research and understands the situation where collaboration can be made to be useful beyond the simple addition of technology, is something encouraging because we’re discussing more and more the importance of these other elements that make collaboration effective, which really goes beyond just putting a tool out there.
Milton Chen: Thanks, Robin. And another area that relates to our research in collaboration is a site for audio and video. We’ve also done quite a bit of work on application or document sharing, and co-editing. The reason we’ve done work in that mode is...Maybe the easy thing is to just show you very quickly what we mean by this. Give me about five seconds; let me set it up, okay.
Okay, Robin, what I’ve done now is I’ve just shared an application with you right now. You should see on your screen something like an Outlook icon I’ve just shared with you. And I’m actually replying to an e-mail you sent me. There’s some standard things you can do. For example, you can do a highlight, annotating on the document. And I can erase it. But the really neat thing we’ve done this one, - this is actually very different from the other systems - is the idea of co-editing. For example, in your window, if you click on the window caption, you’ll probably see something like a pen and control and the application I’ve shared with you. Right above there, basically do you see something like a "CTRL", you will discover a pen...
Okay, now you are moving my mouse. Okay.
Let me take control for a second...
Okay, now we’re back...
What happened just now is both you and me have mouse and keyboard control at the same time. And the reason we did it this way is if you look at a program like WebEx, NetMeeting and so on, they’re really designed for presentation. So when you have a flow control event, it’s very explicit; I say okay, I want control of your thing.
You say: yes or no.
And the mode we have explored is... so typically we use audio and video to monitor, to do the flow control for the application share window. So typically what we are doing is, for example, let's say we have a document we have to submit in thirty minutes. We have to basically try to write it as fast as possible. And what people found was the most efficient way to do this is for us to sit right next to each other, look at a document, I could type something, say "oh, no, no, that’s bad" you grab the keyboard and type something, that I can go back and forth. So VSee is designed specifically for that. So that is where we had this idea. For example, right now you can type in my window or you can control. At the same time I could do it at the same time, and we use basically use audio and video to monitor this. And so when we practice this it turns out to be extremely effective. (laughs)
RG: Yes, guys. What Milton has done is basically pulled out a screen-sharing session next to our video session. So he’s showing me his Outlook, and I have to say, it shows exceptionally well. It is enlarged from the normal dimensions, but the fonts look all bloody great. I mean, I don’t know how he does it, but it looks very readable and legible, unlike some other screen-sharing technologies.
So it’s just like having... - hey I use GoToMeeting and it’s no secret - right now it’s my favorite tool for screen sharing and here with VSee basically have the same features I have there plus video next to it with very high quality. That’s what he’s showing me. Am I correct, Milton?
Milton Chen: Yeah, exactly. The way we designed VSee, ... you may wonder, like what is the driving application for VSee? So the way we designed the driving, or the killer application for VSee is for people that work remotely, whether they work from homes or where many times you’re separate from your co-workers. So, the two-steps with designing this is pretty much everything you need to do to do that effectively. And nothing more. Okay?
RG: Yes, it makes sense. That looks good. So is that part of the future VSee version, or is it already part of what you’re offering right now to the users of VSee?
Milton Chen: That’s already part of the standard VSee package. You can pretty much do everything you see right now. Now it’s our standard VSee, I mean it’s part of our free evaluation, so anyone can go there and play with it if you want.
RG: Unbelievable, yes, everything is fully integrated into it, I just noticed that among the only three simple buttons available on my video window there’s one called "menu," which integrates an option called "share", "application", "desktop", "movie", or PowerPoint, whatever you want you can share it live, in a P2P direct fashion with a high-quality display. Not only, you also have a full recording facility there, ... man... there’s a slew of heavenly things here, I want to stop. Help me!
Milton Chen: Yes, thanks a lot for the compliment. One thing you probably know, that we actually spend a lot of time seeking out what features not to have, or not to have on the menu. This traces back to our research days back at Stanford. Because a lot of tools that come out, it’s very easy for engineers to essentially putting everything they can imagine someone use to be on the menu. It’s almost like a Microsoft Word or Excel, that type of experience. We have so much stuff; you’ll just get very frustrated.
So for us, we deliberately make this to be a very small tool, not a full-blown application, where literally you might, you know how hard to imagine all the brain power, all the heated discussion we have internally about what features not to have in VSee,... even though it’s simple to do... we could do it, we would just confuse people.
So ideally we want to is create something hopefully easy to use as a pan, where there’s no FAQ, no menu, no training involved. It’s simple as going to a web browser, you type in a person’s name and that’s it... you just have a conference.
RG: Another super-duper feature they have which I forgot because I saw it at the very beginning today that is super-cool is their cursor. When you move the cursor on your video window it’s got a blazing trail after it, florescent color that allows you to signal and to point to items that are on the window, and you are explaining the tool the first time around and, oh, I found it cool, a good way to use also your pointer. What was the idea behind it?
Milton Chen: Yeah, this came out of our work in the medical community. What they wanted to do, imagine let’s say this is a patient diagnostic, where you’re teaching students particular issues. So what they want to do is able to say okay, there’s something wrong here, take a closer look. And that’s what came out this. And also from the laydown we did research where we find is in a face-to-face conference, people use their hand quite often to point at objects. Being that video-conference is typically lacking, so I could point at my video, but I couldn’t easily point at your video. So the idea is I could just move my mouse as an extension of my hand to be able to point at certain things.
RG: Very nice, very interesting. Well, that’s a great deal of good information. Thank you very much Milton for your time today. That was a fantastic session. I appreciate very much all of the knowledge you shared and I invite everyone to come out there to www.vseelab, like Virginia, See like "to watch", see, S-E-E-L-A-B, L as in Louisiana, A as in America, B as in bloom dot com, vseelab.com. That’s the place you go, and that’s where you can get everything we were just using.
That’s truly exciting.
Thank you Milton, from Robin Good here, live in Rome, Italy, this is all for today. I leave it to you for the closing remarks, and bye-bye!
Milton Chen: Bye, thanks!