An idea I had for 'gesture recognition', for simple mouse movement.
An idea which came to me while walking to the office this morning, on how to
implement the 'mouse movement' recognition. Assuming that OpenKinect is not
providing such by itself.
Mail to save the braindump before I forget.
CC you for redundant storage in 2nd brain.
The idea assumes that a person interacting with Kinect/Display will
stand in front of it and have their hand stretched forward, towards
Use the 'Depth' image.
Find the pixel whose value indicates that it is nearest to the
display. Depending on the depth encoding this should be either the
maximum pixel value, or the minimum.
This should at least be the user's hand. Depending on
resolution and posture it may actually be the tip of a finger.
It could also be the whole body if the user is not yet
pointing to the display. That is actually a good thing, as
this would track persons as they pass by, giving us the effect
we want, automatically.
(Possibly find all pixels with that value and compute their centroid).
We now have an x/y location, and we can track it as new depth images
are send to us by the device.
And the actual depth (i.e. max/min) gives us a z-channel too.
(Possibly run a windowed average over the stream of coordinates to
smooth things out).
Depending on the depth resolution Kinect may be able to separate
finger tips from the hand. They might be all in the same plane, or
in different but nearby planes.
Get the bounding box inside of which all nearest the pixels are.
Tracking the size of this box over time gives a sense of 'scale', i.e.
how open (flat, wide) vs closed (small clump) the hand is. This could
be bound to 'zoom'. => What is the tablet gesture called where two
fingers pinch, move apart ? Same idea, using the whole hand.
(We might have to compensate for size changes due to depth changes,
i.e. normalize to a fixed depth)
Having the (normalized) bounding box with the hand/fingers we can use
known techniques on the region to determine rotation of the content
over time (*) and get another channel of movement.
This might even work if the finger tips are not separated from the
hand, if the shape of the whole hand itself is distinctive enough to
This would actually be also another way of getting scale
changes, albeit more computationally expensive.
(*) See Crimp GSoC, Affine Registration: Transform images to compare to
log-polar representation, run FFT on it and perform a phase
The resulting shift values in the log-polar domain translate into
scale and rotation values in the cartesian domain.