swestrup: (Default)
[personal profile] swestrup
One of the things I've always wanted is a system for abstracting the contents of images in such a way that you could:

  • Notice that two images of the same event, taken at different angles or times are the same,
  • Identify the larger image from which a smaller one was cut. (from which famous painting did Monty Python steal its 'foot').
  • Identify that two images took place at the same location, although the events and people may be completely different.
  • Identify common visual elements among different images, such as people and objects.
  • Figure out what is going on in an image (and give it attributes accordingly. Is this a sports picture, or porn, or both?)

The problem has always been finding a good image signature algorithm and having a database from which to draw for comparison. While I was working at Softguard, we were trying to solve the second half of the problem, and were considering the first as a possible place to extend the technology.

Anyway, according to this article, someone thinks they have a handle on doing all this.

Date: 2005-01-14 04:42 am (UTC)
From: [identity profile] lasher.livejournal.com
You know, I have pondered this concept in passing as well. However, the key to all of this is the development of a commonly accepted definition scheme... or as you called it, algorithm. The database part is the easy part. Once you have data defined, the search and correlation of that data is simply a matter of technology.

Even assuming that a commonly accepted "vocabulary" is defined, you have the issue of failure of some individuals to follow it. You also have the issue of people using synonymous terminology. For example: one might describe a picture as "trees" and another as a "forest". Both would be correct, yet more to index, cross reference, and present as options after a search.

Then you also have the issue of a continually growing vocabularly... not to mention language translations. Someone who posts the picture and speaks spanish may put an attribute on the image of "el rio". However, an english speaker would usually do a search using the attribute "river".

The variables would be endless. Though, there are people who are WAY MORE mathematically inclined than I am... and those people are usually good at defining the "algortihms" to figure such things out.

Date: 2005-01-14 01:56 pm (UTC)
From: [identity profile] lasher.livejournal.com
See, I think that the data representation *is* a big thing.... especially because your data is only as good as the people entering it. Because a description is so subjective, I think that there would be problems.

But Note: I am thinking in terms of a system for all images all over the internet. If I were to narrow that down to say, just medical scans / images... or just botanical specimins (ex: single items represented in an image)... then I can totally see how this would work. It's when you think of "any joe blow out in the world" contributing that I think this kind of starts to fall apart.

Date: 2005-01-14 09:11 pm (UTC)
From: [identity profile] sps.livejournal.com
I think Sti is talking about a system in which the image interpretation is fully automated, so that image-to-image links don't require human intervention or rely on human knowledge or opinion. That's tough, but it doesn't run afoul of the same kinds of problems as you describe. When you come to indexing human commentary on the images - that's where the social process of vocabulary control comes into play.

That kind of thing is very difficult, but increasingly being studied - in fact there's some work in that area done here at McGill's CIM.

But only the largest languages break a million words, you know; and there are reputation-correlating technologies to deal with the contributor noise problem.

Date: 2005-01-14 09:26 pm (UTC)
From: [identity profile] lasher.livejournal.com
Hmm, I guess that I'm going to have to study up on this more because I can't seem to grasp the concept of automating the image interpretation. Some human has to define the rules for interpreting the content of an image originally, and write that into programs, right?

Maybe I am just not properly thinking of the use / context of how this would be used... and making this harder than it really is. As an example, I imagine a search of the images that are trademarked with the US Trademark Office. They have a searchable database. You put in your criteria, but the search criterion only cover the text that goes along with the image that is trademarked. How can you make a computer program recognize that a formation of pixels in a jpg (for example) represents a triangle in order to link it to other images of a triangle? Ok, a triangle is maybe too simple of an image because I can see how a program could do that... but complex images would be a problem right?

Do you see what I am getting at? Where am I going wrong?

Sometimes, I really wish that I was more intellectually inclined to such things. I am technical, but not overly brilliant, and certainly not brilliant in terms of advanced mathmetics and equally complex concepts like this idea.

Date: 2005-01-14 09:49 pm (UTC)
From: [identity profile] sps.livejournal.com
Well, think about photographs (which are in some sense empirical objects) rather than drawings (which are clearly human interpretations). You can imagine a piece of software that can determine that two pictures of pine trees are similar, and two pictures of the same pine tree are very similar, without it needing to know what a pine tree is - or even being able to segment it from the surrounding environment (something that actually relies more on interpretation and experience than is generally imagined). What you need is a mechanism for extracting some kind of 'signature' from an image part, so that each image produces a list of around, say, a few hundred signatures out of a possible million or so. Then you could find things - going from picture to picture - using the same kind of technology that google uses to go from text to text.

Of course, unless you had human interpretation somewhere - or you used the entire web as a training database, of course - you'd still need a picture of a pine tree to make a search on images containing views of pine trees, just as with google now you need the phrase "pine tree" to search for documents containing references to pine trees.

That reduces the problem to that of looking at a photograph and deciding what 'interesting' properties it has - facts about the image that probably imply properties of its subject, and not just things about how it was made. And that field's called 'computer vision'....

It's still in its infancy, but if we can find enough of the right things to look for, it might not be all that much harder than the 'recognise triangles' task. After all, humans interpret what they see, and they do it with relatively little computing power - and we now know, too, that they do it with something very vaguely like the method I suggest (only with more recursive heirarchy, to support greater abstraction and fault tolerance).

Date: 2005-01-14 09:53 pm (UTC)
From: [identity profile] pphaneuf.livejournal.com
It's outright image recognition. It's like seeing an elephant when you're a kid, then seeing another, from a slightly different angle, and recognizing that it's of the same kind as this other thing, but you still don't know the word "elephant".

The triangle example is right, and yes, more complex images are a problem, but problems are made to be solved, aren't they?

I mean, your brain does it all the time, and it's just a machine too.

Date: 2005-01-14 10:39 pm (UTC)
From: [identity profile] lasher.livejournal.com
I see what your saying. I think what was/is hanging me up is that maybe I am thinking more literally than I should... and because I am not so great at communicating the abstract ideas floating in my head. In the pine tree example, I can grasp the concept of recognizing similar objects and assigning them the reference "pine tree" (since that is an easy term for this discussion).. but how does it "learn" the different examples of pine trees.. (like in pphaneuf's reply below)... like say, it originally learns pine trees as a tall, triangular, green type thing that has particular properties (since all of my adjectives are humor descriptors).. then it encounters a pine tree, broken in half, no longer triangular, no longer green, roots sticking up, etc. I mean, isnt that that process of distinguishing somewhat subjective? its the human mind and its subjectivity that makes these things work it seems.. no matter how sophisticated an AI system that you develop, it will still be based on rules. It's the minds imagination - the breaking of rules - that gives up perception.

I now grasp what you are saying... it just brings me to the old idea that the recognition system is only as good as its teacher... and I can envision that in some cases, its going to take a human to go "no, that isnt a correct correlation" and tell the system to exclude something from its learning in that context.

Anyway, all very cool... and I am sure it will be something that I am quite amazed with once brainiacs like you guys design and make it work.

Date: 2005-01-14 10:46 pm (UTC)
From: [identity profile] pphaneuf.livejournal.com
That's the magic of it all. The mind's imagination that you refer to, the breaking of rules, is still, itself, based on rules! Just like the computers.

The trick is, between here (the current computers that have very complex rules, but rules still) and there (our minds, which have underlying rules, but are sort of free-floating in their actual top-level operations), there's a bit of work to be done. :-)

Ok, I have to go and resume reading "Gödel, Escher, Bach" now!

Date: 2005-01-14 10:46 pm (UTC)
From: [identity profile] lasher.livejournal.com
yes, i see what you are saying. but dont you think that part of what makes me recognize that elephant from different angles is because my mind inherantly extrapolates things that a program cant because of the program's rule base? sure the rule base will grow over time as it learns (like I was saying the reply to _sps_'s reply above)... example... even if I am a kid and dont know the term elephant.. but i recognize it from different angles... what about the day i see one that only has 3 legs? now compare this to the program... it has learned "elephant" or the representation there of in certain contexts... one of which seems that it would associate 4 legs with one... now it encounters one with 3... it just seems easier that humans learn its a 3 legged elephant easier than a program can... though talking this out to you had just made the light bulb in my mind go BLING... duh, the program just learns the new representation of the elephant that the kids does..

It just amazes me how you can take human deductive reasoning and translate that into a rule set for a program to emulate. But hey, guess that is why I am not writing the rule sets huh. ;)

anyway, you guys rock for taking the time to explain this to me... it must seem like you're talking to a child on this one. lol

Date: 2005-01-14 10:57 pm (UTC)
From: [identity profile] pphaneuf.livejournal.com
The great leap of faith, I think, is to get away from the idea that you need a set of rules to recognize something.

It's very strange the way our brain function, and not fully understood, by a long stretch, but we can say this much: however it works, it is based on simple rules, from which emerge an incredible complexity.

Maybe [livejournal.com profile] swestrup or [livejournal.com profile] _sps_ have better references on this, but my favorite book on the subject is Gödel, Escher, Bach (linking to an informative Wikipedia entry). I actually remember [livejournal.com profile] _sps_ being specifically disappointed by it, but I liked it.

Date: 2005-01-16 10:07 pm (UTC)
From: [identity profile] sps.livejournal.com
Of course, a Google-like system has the huge advantage that most people are only interested in the ten best matches, and are grateful if they get anything at all - which means that you don't need to be especially good at those difficult cases, not like when you are classifying things in a library and you have to choose a single, 'right' answer. Then if you want to get even more sophisticated, you can even refine your own results based on how people use them: if there are soem suggested links that people follow and others that they don't (or ones that people don't return to your page after following and ones that they do) then you can feed that back and refine your measures.

A lot of this kind of technological development hinges on defining the terms of the problem so that you win.

Of course, the same is true of humans: there are things that humans notice and things that they do not. It's not that humans have such amazing insight or 'intuition', it's that the things that humans do notice have words for them, and the things that humans do not notice, well, they stay unnoticed! Most researchers aren't actualy interested in Artificial Intelligence, they're studying Aritificail Stupidity - because they core problem they need to solve is not getting all the answers right so much making the same mistakes as humans - so no human will criticise the results.

Date: 2005-01-16 10:22 pm (UTC)
From: [identity profile] sps.livejournal.com
I think perhaps you are overly concerned by binary reasoning - in some sense it's how computers work inside (just as humans ultimately do what they do with chemistry), but the building blocks can be assembled to make something that isn't itself block-shaped. When the computer goes to compare a picture of a tree with an idealised model of a tree, say, it doesn't have to give a yes/no result, and the systems that do this kind of associative retrieval don't generally do so (unless perhaps they use an incredibly large number of very picky attributes). So maybe you get results like 'shape 83%, colour 91%, texture 76%, size [not enough information]' and, well, that would outrank a lot of other things that are kind of tree like without ever having to worry about how many legs it's got.

As to recognising things from different angles - that's actually a mathematical trick. It's even been demonstrated that there are (at least) two different kinds of humans - some people tell if two pictures are of the same object from different angles by mentally rotating them until they do (or don't) match; others do it by looking only properties of the pictures that are rotationally invariant and seeing if those properties match exactly (the experiments to figure out how you do it work by measuring how your response time in answering the question change as the puzzlr changes - 'rotaters' take a short time that grows steadily longer as the angle between the pictures increases, and can be confused by certain kinds of details in the images. The 'thinkers' take longer for the 'easy' cases but take almost the same amount of time even for the hard cases where there's a big difference between the pictures - and the extra time they need for detailed images is just the time it takes for them to understand the picture, it doesn't make the matching task itself harder. Well, all this assuming I remember all the details correctly - this all comes from a discussion I had with someone studying this stuff many years ago). Anyway, the point is that there are tricks to do with what you pay attention to and what you don't - 'shapes' change as things rotate, but there are facts about shapes that stay the same, and by paying more attention to these second order properties you can get a big win.

Date: 2005-01-17 02:08 am (UTC)
From: [identity profile] lasher.livejournal.com
Of course, a Google-like system has the huge advantage that most people are only interested in the ten best matches,

ah yes, a very important concept that I had lost in this discussion. adding this fact back into my mind makes things VERY much more logical.

Most researchers aren't actualy interested in Artificial Intelligence, they're studying Aritificail Stupidity - because they core problem they need to solve is not getting all the answers right so much making the same mistakes as humans - so no human will criticise the results

very interesting way to look at AI, that I had not considered before

Date: 2005-01-17 02:09 am (UTC)
From: [identity profile] lasher.livejournal.com
So maybe you get results like 'shape 83%, colour 91%, texture 76%, size [not enough information]' and, well, that would outrank a lot of other things that are kind of tree like without ever having to worry about how many legs it's got.

ahhhhhhhhhhhhhhh I see said the blind man. This is the example that I needed to really pull it all together. I TOTALLY get it now.

January 2017

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 13th, 2026 08:31 pm
Powered by Dreamwidth Studios