KineTcl
Artifact Content
Not logged in

Artifact 62a7d54382abb41874501dad6b34d827cb3570b0:

Wiki page [Mouse Gestures] by andreas_kupries 2011-12-04 00:48:50.
D 2011-12-04T00:48:50.872
L Mouse\sGestures
P 699d7974a78468f35993de68fc944538ee93e439
U andreas_kupries
W 2831
An idea I had for 'gesture recognition', for simple mouse movement.

<hr>

An idea which came to me while walking to the office this morning, on how to 
implement the 'mouse movement' recognition. Assuming that OpenKinect is not 
providing such by itself.

Mail to save the braindump before I forget.
CC you for redundant storage in 2nd brain.

<brain-dump>

	The idea assumes that a person interacting with Kinect/Display will
	stand in front of it and have their hand stretched forward, towards
	the display.

	Use the 'Depth' image.

	Find the pixel whose value indicates that it is nearest to the
	display. Depending on the depth encoding this should be either the
	maximum pixel value, or the minimum.

	<<
		This should at least be the user's hand. Depending on
		resolution and posture it may actually be the tip of a finger.
		It could also be the whole body if the user is not yet
		pointing to the display. That is actually a good thing, as
		this would track persons as they pass by, giving us the effect
		we want, automatically.
	>>

	(Possibly find all pixels with that value and compute their centroid).

	We now have an x/y location, and we can track it as new depth images
	are send to us by the device.

	And the actual depth (i.e. max/min) gives us a z-channel too.

	(Possibly run a windowed average over the stream of coordinates to
	smooth things out).

Advanced idea:

	Depending on the depth resolution Kinect may be able to separate
	finger tips from the hand. They might be all in the same plane, or
	in different but nearby planes.

	Get the bounding box inside of which all nearest the pixels are.
	Tracking the size of this box over time gives a sense of 'scale', i.e.
	how open (flat, wide) vs closed (small clump) the hand is. This could
	be bound to 'zoom'. => What is the tablet gesture called where two
	fingers pinch, move apart ? Same idea, using the whole hand.

	(We might have to compensate for size changes due to depth changes,
	i.e. normalize to a fixed depth)

More advanced:

	Having the (normalized) bounding box with the hand/fingers we can use
	known techniques on the region to determine rotation of the content
	over time (*) and get another channel of movement.

	This might even work if the finger tips are not separated from the
	hand, if the shape of the whole hand itself is distinctive enough to
	latch on.

		This would actually be also another way of getting scale
		changes, albeit more computationally expensive.

(*)	See Crimp GSoC, Affine Registration: Transform images to compare to
	log-polar representation, run FFT on it and perform a phase
	correlation.

	The resulting shift values in the log-polar domain translate into
	scale and rotation values in the cartesian domain.

</brain-dump>
Z c1bee2403fc3a7c81ca0554cddb73f4e