Have you watched the videos? It's actually amazingly fuggin hi-tech and amazing.
There are two cameras on it, so it can actually process depth, which means it recieves a 3-dimensional image of what you're doing-- so it can actually do 3-dimensional interaction with an exact picture of you.
The tech demo that peter Molyneux (sp?) did was fuggin amazing-- he insisted it was all real-time and not scripted-- this several minute long conversation between an ai kid and this lady, and it was fuggin amazing; he read her emotions/face/tone/etc apparently and reacted accordingly and it looked like two people having a conversation. pretty fuckin' impressive
Also, the 3d-ness allowed for some fantastic animation in that 3d-breakout game.
Nope I haven't seen the videos, but when I have the time I will take a look. Like Infini said, I wasn't talking about the actual getting the picture, but the analyses of it. I don't claim it can't be done, because their are some really good algorithms and models. The article says that several things are done at the same time:
1) Facial recognition for emotion.
2) Speech and Language processing/filtering of the received audio.
3) Motion analyses of the body.
4) Response formulation.
5) Maintaining a 3D virtual world.
Each of the 5 parts are very taxing tasks. Each on their own are possible to do in real-time on strong computers, but to do it in real-time together requires a super computer or several parallel running strong computers. You could cut some corners with approximations, time scheduling and leveled detail analyses.
For the response formulation, they probably cut corners to have a specific knowledge base about a few popular subjects and not to much detail. Additionally some preprocessed responses were already in the system. And the woman was probably also limited to the questions and answers she could give.
The last thing that makes me think, that a lot is exaggerated is that a computer is somewhat the opposite of the human brain. A computer is a master at making heavy and difficult calculations. It does it with a snap of the finger. But it is lousy at pattern recognition. This gives a problem. Each person is different in appearance and voice. Even the smallest variation in pattern is a big obstacle for a computer. So the machine learning algorithms need to be train to the user before it can make reasonably accurate analyses.
But like I said it isn't entirely impossible to do it in real-time. What I seriously doubt is that the Xbox360 has the processor power to do it, together in realtime.