Visual Google.
Jan. 13th, 2005 05:41 pmOne of the things I've always wanted is a system for abstracting the contents of images in such a way that you could:
The problem has always been finding a good image signature algorithm and having a database from which to draw for comparison. While I was working at Softguard, we were trying to solve the second half of the problem, and were considering the first as a possible place to extend the technology.
Anyway, according to this article, someone thinks they have a handle on doing all this.
- Notice that two images of the same event, taken at different angles or times are the same,
- Identify the larger image from which a smaller one was cut. (from which famous painting did Monty Python steal its 'foot').
- Identify that two images took place at the same location, although the events and people may be completely different.
- Identify common visual elements among different images, such as people and objects.
- Figure out what is going on in an image (and give it attributes accordingly. Is this a sports picture, or porn, or both?)
The problem has always been finding a good image signature algorithm and having a database from which to draw for comparison. While I was working at Softguard, we were trying to solve the second half of the problem, and were considering the first as a possible place to extend the technology.
Anyway, according to this article, someone thinks they have a handle on doing all this.
no subject
Date: 2005-01-14 04:42 am (UTC)Even assuming that a commonly accepted "vocabulary" is defined, you have the issue of failure of some individuals to follow it. You also have the issue of people using synonymous terminology. For example: one might describe a picture as "trees" and another as a "forest". Both would be correct, yet more to index, cross reference, and present as options after a search.
Then you also have the issue of a continually growing vocabularly... not to mention language translations. Someone who posts the picture and speaks spanish may put an attribute on the image of "el rio". However, an english speaker would usually do a search using the attribute "river".
The variables would be endless. Though, there are people who are WAY MORE mathematically inclined than I am... and those people are usually good at defining the "algortihms" to figure such things out.
no subject
Date: 2005-01-14 05:01 am (UTC)no subject
Date: 2005-01-14 01:56 pm (UTC)But Note: I am thinking in terms of a system for all images all over the internet. If I were to narrow that down to say, just medical scans / images... or just botanical specimins (ex: single items represented in an image)... then I can totally see how this would work. It's when you think of "any joe blow out in the world" contributing that I think this kind of starts to fall apart.
no subject
Date: 2005-01-14 09:11 pm (UTC)That kind of thing is very difficult, but increasingly being studied - in fact there's some work in that area done here at McGill's CIM.
But only the largest languages break a million words, you know; and there are reputation-correlating technologies to deal with the contributor noise problem.
no subject
Date: 2005-01-14 09:26 pm (UTC)Maybe I am just not properly thinking of the use / context of how this would be used... and making this harder than it really is. As an example, I imagine a search of the images that are trademarked with the US Trademark Office. They have a searchable database. You put in your criteria, but the search criterion only cover the text that goes along with the image that is trademarked. How can you make a computer program recognize that a formation of pixels in a jpg (for example) represents a triangle in order to link it to other images of a triangle? Ok, a triangle is maybe too simple of an image because I can see how a program could do that... but complex images would be a problem right?
Do you see what I am getting at? Where am I going wrong?
Sometimes, I really wish that I was more intellectually inclined to such things. I am technical, but not overly brilliant, and certainly not brilliant in terms of advanced mathmetics and equally complex concepts like this idea.
no subject
Date: 2005-01-14 09:49 pm (UTC)Of course, unless you had human interpretation somewhere - or you used the entire web as a training database, of course - you'd still need a picture of a pine tree to make a search on images containing views of pine trees, just as with google now you need the phrase "pine tree" to search for documents containing references to pine trees.
That reduces the problem to that of looking at a photograph and deciding what 'interesting' properties it has - facts about the image that probably imply properties of its subject, and not just things about how it was made. And that field's called 'computer vision'....
It's still in its infancy, but if we can find enough of the right things to look for, it might not be all that much harder than the 'recognise triangles' task. After all, humans interpret what they see, and they do it with relatively little computing power - and we now know, too, that they do it with something very vaguely like the method I suggest (only with more recursive heirarchy, to support greater abstraction and fault tolerance).
no subject
Date: 2005-01-14 09:53 pm (UTC)The triangle example is right, and yes, more complex images are a problem, but problems are made to be solved, aren't they?
I mean, your brain does it all the time, and it's just a machine too.
no subject
Date: 2005-01-14 10:39 pm (UTC)I now grasp what you are saying... it just brings me to the old idea that the recognition system is only as good as its teacher... and I can envision that in some cases, its going to take a human to go "no, that isnt a correct correlation" and tell the system to exclude something from its learning in that context.
Anyway, all very cool... and I am sure it will be something that I am quite amazed with once brainiacs like you guys design and make it work.
no subject
Date: 2005-01-14 10:46 pm (UTC)The trick is, between here (the current computers that have very complex rules, but rules still) and there (our minds, which have underlying rules, but are sort of free-floating in their actual top-level operations), there's a bit of work to be done. :-)
Ok, I have to go and resume reading "Gödel, Escher, Bach" now!
no subject
Date: 2005-01-14 10:46 pm (UTC)It just amazes me how you can take human deductive reasoning and translate that into a rule set for a program to emulate. But hey, guess that is why I am not writing the rule sets huh. ;)
anyway, you guys rock for taking the time to explain this to me... it must seem like you're talking to a child on this one. lol
no subject
Date: 2005-01-14 10:57 pm (UTC)It's very strange the way our brain function, and not fully understood, by a long stretch, but we can say this much: however it works, it is based on simple rules, from which emerge an incredible complexity.
Maybe
no subject
Date: 2005-01-16 10:07 pm (UTC)A lot of this kind of technological development hinges on defining the terms of the problem so that you win.
Of course, the same is true of humans: there are things that humans notice and things that they do not. It's not that humans have such amazing insight or 'intuition', it's that the things that humans do notice have words for them, and the things that humans do not notice, well, they stay unnoticed! Most researchers aren't actualy interested in Artificial Intelligence, they're studying Aritificail Stupidity - because they core problem they need to solve is not getting all the answers right so much making the same mistakes as humans - so no human will criticise the results.
no subject
Date: 2005-01-16 10:22 pm (UTC)As to recognising things from different angles - that's actually a mathematical trick. It's even been demonstrated that there are (at least) two different kinds of humans - some people tell if two pictures are of the same object from different angles by mentally rotating them until they do (or don't) match; others do it by looking only properties of the pictures that are rotationally invariant and seeing if those properties match exactly (the experiments to figure out how you do it work by measuring how your response time in answering the question change as the puzzlr changes - 'rotaters' take a short time that grows steadily longer as the angle between the pictures increases, and can be confused by certain kinds of details in the images. The 'thinkers' take longer for the 'easy' cases but take almost the same amount of time even for the hard cases where there's a big difference between the pictures - and the extra time they need for detailed images is just the time it takes for them to understand the picture, it doesn't make the matching task itself harder. Well, all this assuming I remember all the details correctly - this all comes from a discussion I had with someone studying this stuff many years ago). Anyway, the point is that there are tricks to do with what you pay attention to and what you don't - 'shapes' change as things rotate, but there are facts about shapes that stay the same, and by paying more attention to these second order properties you can get a big win.
no subject
Date: 2005-01-17 02:08 am (UTC)ah yes, a very important concept that I had lost in this discussion. adding this fact back into my mind makes things VERY much more logical.
Most researchers aren't actualy interested in Artificial Intelligence, they're studying Aritificail Stupidity - because they core problem they need to solve is not getting all the answers right so much making the same mistakes as humans - so no human will criticise the results
very interesting way to look at AI, that I had not considered before
no subject
Date: 2005-01-17 02:09 am (UTC)ahhhhhhhhhhhhhhh I see said the blind man. This is the example that I needed to really pull it all together. I TOTALLY get it now.