Saturday, June 25, 2011

Lytro - The Start of a Picture Revolution

Article from

The Journey
Today, I am proud to announce the launch of Lytro and share our plans to bring an amazing new kind of camera to the consumer market.

This journey started for me eight years ago when I was in the PhD program at Stanford University. I loved photography then as I do now, but I was frustrated and puzzled by the apparent limitations of cameras. For example, I remember trying to take photos of Mei-Ahn, the five-year-old daughter of a close friend, but because she was so full of life, it was nearly impossible to capture the fleeting moments of her smile or perfectly focus the light in her eyes.

That experience inspired me to start the research that became my dissertation on light field photography, which had capabilities beyond what I could have ever hoped for. The journey soon accelerated with a full-body plunge into the world of entrepreneurship, with a dream to share this new technology with the world.

I am thrilled to finally draw back the curtain and introduce our new light field camera company, one that will forever change how everyone takes and experiences pictures. Lytro’s company launch is truly the start of a picture revolution.

What began in a lab at Stanford University has transformed into a world-class company, forty-four people strong, sparkling with talent, energy and inspiration. It has taken a lot of hard work, late nights and tireless dedication to get Lytro to this point. I want to thank the entire team for their remarkable contributions, spirit, and camaraderie. I want to especially thank the very first believers: Colvin, Tim and Alex, the original magic engine of the company, and Manu, Charles and Allen for personally doing so much to help build this company. Besides the Lytro team, I want to thank my family, and my fiancé Yi (pictured above) for their continued support, confidence, and love.

We have something special here. Our mission is to change photography forever, making conventional cameras a thing of the past. Humans have always had a fundamental need to share our stories visually, and from cave paintings to digital cameras we have been on a long search for ways to make a better picture. Light field cameras are the next big step in that picture revolution.

The Future
Today is a big day for Lytro, but I believe it is just the beginning of a bright and exciting future. Photographers and casual shooters alike will be able to create and share new living pictures. I believe that as people begin to use light field cameras, we will see an explosion in new kinds of photographic art. It will be another wonderful journey to see how people use light field cameras, see where these new living pictures travel, and discover how each person chooses to take this revolution.

Welcome to Lytro! I hope you’ll follow us on the Lytro Blog, so we can keep you updated about the introduction of our first Lytro camera.

Ren Ng
Founder and CEO of Lytro

Thursday, June 23, 2011

Learning Invariance through Imitation

Graham W. Taylor, Ian Spiro, Chris Bregler and Rob Fergus

To appear at CVPR 2011, Colorado Springs, USA June 21-23 (download pdf) (supplementary material)

Sample retrieval results. Each row is a query. We select a test image (column 1) and find its 10 nearest neighbors using our method which we call Pose Sensitive Embedding (PSE). The blue text in each image indicates seed id (left) and distance (in embedded space) from the query (right).

Non-scientific abstract (for geeks and non-geeks)

Computer vision has hit the mainstream with applications such as cars that detect pedestrians, motion capture for animation, and applications that let you cash a cheque by snapping a picture from your mobile phone. A great example of computer vision in the consumer market is Microsoft's Kinect gaming system which can accurately detect the pose of one or more individuals allowing gameplay to be controlled using just the body. Such a system must be able to detect pose reliably under a wide variety of conditions - different players, unusual clothing, poor lighting, cluttered backgrounds, and other sources of variation. One way that we could perform pose estimation is keeping around a large database of examples of people in a variety of poses along with labels indicating the configuration of the body in 2D or 3D. When presented with a new example (without labels) we can compare it against the database to find the best match. We then can assign the labels of the best match to the new example. However, the matching (or similarity) problem is a very tough one - especially due to the large amount of input variability due to the factors described above. If we had many examples of people in similar pose but under differing conditions, we could use machine learning to construct an algorithm that matches based on the important information (e.g. pose) and ignores the distracting information (e.g lighting, clothing, background, etc.). But how do we collect such data? In a somewhat unusual move for computer scientists, we turned to the Dutch progressive-electro band C-Mon and Kypski. Their music video/crowdsourcing project "One Frame of Frame" asks people on the web to replace one frame of the band's music video for the song "More or Less" with a capture from a webcam. A visitor to the band's website is shown a single frame of the video and asked to perform an imitation in front of the camera. The new contribution is spliced into the video which updates once an hour. This turns out to be the perfect data source for learning an algorithm to compute similarity based on pose. Armed with the band's data and a few machine learning tricks up our sleeves, we built a system that is highly effective at matching people in similar pose but under widely different settings.

Scientific abstract (for geeks)

Supervised methods for learning an embedding aim to map high-dimensional images to a space in which perceptually similar observations have high measurable similarity. Most approaches rely on binary similarity, typically defined by class membership where labels are expensive to obtain and/or difficult to define. In this paper we propose crowd-sourcing similar images by soliciting human imitations. We exploit temporal coherence in video to generate additional pairwise graded similarities between the user-contributed imitations. We introduce two methods for learning nonlinear, invariant mappings that exploit graded similarities. We learn a model that is highly effective at matching people in similar pose. It exhibits remarkable invariance to identity, clothing, background, lighting, shift and scale.


Schematic of our approach. We assume for each frame of video, there exists an unobserved low-dimensional representation of pose, Z. A seed image is generated by mapping from pose space to pixels, X, through an unobserved interpretation function. Our method learns a nonlinear embedding, f(X|θ) which approximates Zwith a low-dimensional vector. In the example above, users are asked to imitate seed images taken from a music video (

Tuesday, June 21, 2011

Intel announces Haswell's processor instructions

Intel has announced new processor instructions for the Haswell chip architecture that would be useful for image and video processing. According to an Intel Blog, the target applications include face detection, hash generation, and database manipulation. For example, the new floating-point-multiply-accumulate instructions "operate on scalar, 128-bit packed single and double precision data types, and 256-bit packed single and double-precision data types."

Intel's processor improvements come in a 2-year "tick-tock" succession of architecture improvements (tick) and process shrinks (tock). The current generation processor is the Sandy Bridge, which is followed by Ivy Bridge. Haswell is the newest architecture definition after Ivy Bridge.

Monday, June 20, 2011

3-D movie shows what happens in the brain as it loses consciousness

University of Manchester researchers have for the first time been able to watch what happens to the brain as it loses consciousness.

The anaesthetised brain as revealed by the fEITER scan

Using sophisticated imaging equipment they have constructed a 3-D movie of the brain as it changes while an anaesthetic drug takes effect.

Brian Pollard, Professor of Anaesthesia at Manchester Medical School, will tell the European Anaesthesiology Congress in Amsterdam today (Saturday) that the real-time 3-D images seemed to show that losing consciousness involves a change in electrical activity deep within the brain, changing the activity of certain groups of nerve cells (neurons) and hindering communication between different parts of the brain.

He said the findings appear to support a hypothesis put forward by Professor Susan Greenfield, of the University of Oxford, about the nature of consciousness itself. Prof Greenfield suggests consciousness is formed by different groups of brain cells (neural assemblies), which work efficiently together, or not, depending on the available sensory stimulations, and that consciousness is not an all-or-none state but more like a dimmer switch, changing according to growth, mood or drugs. When someone is anaesthetised it appears that small neural assemblies either work less well together or inhibit communication with other neural assemblies.

Professor Pollard, whose team is based at Manchester Royal Infirmary, said: “Our findings suggest that unconsciousness may be the increase of inhibitory assemblies across the brain’s cortex. These findings lend support to Greenfield’s hypothesis of neural assemblies forming consciousness.”

The team use an entirely new imaging method called “functional electrical impedance tomography by evoked response” (fEITER), which enables high-speed imaging and monitoring of electrical activity deep within the brain and is designed to enable researchers to measure brain function.

The new device was developed by a multidisciplinary team drawn from the Schools of Medicine and Electrical and Electronic Engineering at The University of Manchester, led by Professor Hugh McCann and with support from a Wellcome Trust Translation Award.

The machine itself is a portable, light-weight monitor, which can fit on a small trolley. It has 32 electrodes that are fitted around the patient’s head. A small, high-frequency electric current (too small to be felt or have any effect) is passed between two of the electrodes, and the voltages between other pairs of electrodes are measured in a process that takes less than one-thousandth of a second.

An ‘electronic scan’ is therefore carried out and the machine does this whole procedure 100 times a second. By measuring the resistance to current flow (electrical impedance), a cross-sectional image of the changing electrical conductivity within the brain is constructed. This is thought to reflect the amount of electrical activity in different parts of the brain. The speed of the response of fEITER is such that the evoked response of the brain to external stimuli, such as an anaesthetic drug, can be captured in rapid succession as different parts of the brain respond, so tracking the brain’s processing activity.

“We have looked at 20 healthy volunteers and are now looking at 20 anaesthetised patients scheduled for surgery,” said Professor Pollard. “We are able to see 3-D images of the brain’s conductivity change, and those where the patient is becoming anaesthetised are most interesting.

“We have been able to see a real time loss of consciousness in anatomically distinct regions of the brain for the first time. We are currently working on trying to interpret the changes that we have observed, as we still do not know exactly what happens within the brain as unconsciousness occurs, but this is another step in the direction of understanding the brain and its functions.”

The team at Manchester is one of many worldwide investigating electrical impedance tomography (EIT), but this is its first application to anaesthesia. Professor Pollard said that a huge amount of research still needed to be done to fully understand the role EIT could play in medicine.

“If its power can be harnessed, then it has the potential to make a huge impact on many areas of imaging in medicine,” he said. “It should help us to better understand anaesthesia, sedation and unconsciousness, although its place in medicine is more likely to be in diagnosing changes to the brain that occur as a result of, for example, head injury, stroke and dementia.

“The biggest hurdle is working out what we are seeing and exactly what it means, and this will be an ongoing challenge.”

Sunday, June 19, 2011

Automatic Illustration with Cross-media Retrieval in Large-scale Collections

Filipe Coelho, Cristina Ribeiro - CBMI 2011


In this paper, we approach the task of finding suit-able images to illustrate text, from specific news stories to more generic blog entries. We have developed an automatic illustration system supported by multimedia information retrieval, that analyzes text and presents a list of candidate images to illustrate it. The system was tested on the SAPO-Labs media collection, containing almost two million images with short descriptions, and the MIRFlickr-25000 collection, with photos and user tags from Flickr. Visual content is described by the Joint Composite Descriptor and indexed by a Permutation-Pre x Index. Illustration is a three-stage process using textual search, score filtering and visual clustering. A preliminary evaluation using exhaustive and approximate visual searches demonstrates the capabilities of the visual descriptor and  approximate indexing scheme used.




The Cell: An Image Library

The Cell: An Image Library™ is a freely accessible, easy-to-search, public repository of reviewed and annotated images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes. The purpose of this database is to advance research, education, and training, with the ultimate goal of improving human health.

This library is a public and easily accessible resource database of images, videos, and animations of cells, capturing a wide diversity of organisms, cell types, and cellular processes. The purpose of this database is to advance research on cellular activity, with the ultimate goal of improving human health.

Cells, the building blocks of tissues, undergo dramatic dynamic rearrangements and changes in shape and motility during our lifetimes; abnormalities in these processes underlie numerous human diseases. It is important that scientists and clinicians fully appreciate the structure and dynamic behavior of cells, including cells from diverse organisms, to make advancements to human health.

The Cell includes images acquired from historical and modern collections, publications, and by recruitment.

This Image Library is a repository for images and movies of cells from a variety of organisms. It demonstrates cellular architecture and functions with high quality images, videos, and animations. This comprehensive and easily accessible Library is designed as a public resource first and foremost for research, and secondarily as a tool for education. The long-term goal is the construction of a library of images that will serve as primary data for research.

The Library effort represents not only the creation of the electronic infrastructure, but also a systematic protocol for acquisition, evaluation, annotation, and uploading of images, videos, and animations.

Content Based Image Retrieval Using Visual-Words Distribution Entropy

Savvas A. Chatzichristofis, Chryssanthi Iakovidou, and Yiannis S. Boutalis


Bag-of-visual-words (BOVW) is a representation of images which is built using a large set of local features. To date, the experimental results presented in the literature have shown that this approach achieves high retrieval scores in several benchmarking image databases because of their ability to recognize objects and retrieve near-duplicate (to the query) images. In this paper, we propose a novel method that fuses the idea of inserting the spatial relationship of the visual words in an image with the conventional Visual Words method. Incorporating the visual distribution entropy leads to a robust scale invariant descriptor. The experimental results show that the proposed method demonstrates better performance than the classic Visual Words approach, while it also outperforms several other descriptors from the literature.


This paper will be presented at

5th International Conference on Computer Vision / Computer Graphics Collaboration Techniques and Applications

Terrier 3.5 released

Terrier, IR Platform v3.5 - 16/06/2011

Terrier 3.5, the next version of the open source IR platform from the University of Glasgow (Scotland) has been released.

Significant update: Added Document-at-a-time (DAAT) retrieval for large indices; Refactored tokenisation for enhanced multi-language support; Upgraded Hadoop support to version 0.20 (NB: Terrier now requires Java 1.6); Added synonym support in query language and retrieval; Added out-of-the-box support for query-biased summaries and improved example web-based interface; Added new, 2nd generation DFR models as well as other recent effective information-theoretic models; Included many more JUnit tests (now 300+). Terrier 3.0 indices are compatible with Terrier 3.5.

  • TR-117: Improve fields support by SimpleXMLCollection
  • TR-120: Error loading an additional MetaIndex structure (contributed by Javier Ortega, Universidad de Sevilla)
  • TR-106: Pipeline Query/Doc Policy Lifecycle (contributed by Giovanni Stilo, University degli Studi dell'Aquila and Nestor Laboratory - University of Rome "Tor Vergata")
  • TR-116: Lexicon not properly renamed on Windows
  • TR-118: SimpleXMLCollection - the term near the closing tag is ignored (contributed by Damien Dudognon, Institut de Recherche en Informatique de Toulouse)
  • TR-123: Null pointer exception while trying to index simple document (contributed by Ilya Bogunov)
  • TR-126: Logging improvements
  • TR-124: When processing docid tag in MEDLINE format XML file, xml context path is needed
  • TR-127: Easier refactoring of SinglePass indexers (contributed by Jonathon Hare, University of Southampton)
  • TR-108: Some indexers do not set the IterablePosting class for the DirectIndex (contributed by Richard Eckart de Castilho, Darmstadt University of Technology)
  • TR-136: Hadoop indexing misbehaves when terrier.index.prefix is not "data"
  • TR-137: TRECCollection cannot add properties from the document tags to the meta index at indexing time
  • TR-150: TRECCollection parse DOCHDR tags, including URLs should they exist (see TRECWebCollection)
  • TR-138: IndexUtil.copyStructure fails when source and destination indices are same
  • TR-140: Indexing support for query-biased summarisation
  • TR-144: should not be recursive
  • TR-146, TR-148: Tokenisation should be done separately from Document parsing (the tokeniser can be set using the property tokeniser- see Non English language support in Terrier for more information on changing the tokenisation used by Terrier); Refactor Document implementations (e.g. TRECDocument and HTMLDocument are now deprecated in favour of the new TaggedDocument)
  • TR-147: Allow various Collection implementations to use different Document implementations
  • TR-158: Single pass indexing with default configuration doesn't ever flush memory
  • TR-16,TR-166: Extending query language and Matching to support synonyms
  • TR-157: Remove TRECQuerying scripting files: trec.models, qemodels, trec.topics.list and trec.qrels - use properties inTRECQuerying instead.
  • TR-156: Deploy a DAAT matching strategy - see org.terrier.matching.daat (partially contributed by Nicola Tonellotto, CNR)
  • TR-113: The LGD Loglogistic weighting model (contributed by Gianni Amati, FUB)

Fuller change log at

Semantic hierarchies for image annotation: a survey

Anne-Marie Tousch, Stéphane Herbin and Jean-Yves Audibert

In this survey, we argue that using structured vocabularies is capital to the success of image annotation. We analyze literature on image annotation uses and user needs, and we stress the need for automatic annotation. We briefly expose the difficulties posed to machines for this task and how it relates to controlled vocabularies. We survey contributions in the field showing how structures are introduced. First we present studies that use unstructured vocabulary, focusing on those introducing links between categories or between features. Then we review work using structured vocabularies as an input and analyze how the structure is exploited.


► Literature on image annotation uses and user needs is reviewed. ► Approaches using unstructured or hierarchical vocabularies are compared. ► We argue that structured vocabularies is capital to automatic image annotation.

Keywords: Image annotation; Semantic description; Structured vocabulary; Image retrieval

Saturday, June 18, 2011

Christos Faloutsos: How to find patterns in large graphs

A video of CMU professor Christos Faloutsos‘s recent tech talk at LinkedIn on “Mining Billion-Node Graphs“.

World's First Coins With QR Codes Will Start Circulating in the Netherlands Next Week

Determined to keep physical currency relevant in an age of bitcoins and NFC-equipped phones, the Dutch are turning their coins into little high-tech toys. These are purportedly the world’s firstcoins with QR codes, which currently link to the national mint, but after June 22 will link to a “surprise.”

The coins will be available starting June 22, in both silver (5€ ) and gold (10€). Right now the code resolves as, which takes you to a Dutch Ministry of Finance page (in Dutch).

There’s no explanation of what the surprise might be — maybe a contest, maybe a historical link, perhaps a video tour of the place where Dutch money gets made.

The U.S. produces some pretty high-tech cash, but the bills are designed to thwart counterfeiters, not necessarily to edify collectors or engage smartphone users. As far as we know this is the first currency to be embedded with a link to the Internet.

They likely won’t be replacing Dutch euros anytime soon, however — only a limited number of coins were produced, to celebrate the 100th anniversary of the national mint in Utrecht.

3D Holograms using Xbox Kinect

Leia Holograph Holographs are just about impossible to capture in still images, resulting in that red blob you see to the right--but in person, it looks surprisingly good MIT

Michael Bove, the director of MIT’s Object-Based Media Group, got his grad students a Kinect for Christmas. The range-finding, motion-sensing camera add-on to Microsoft’s Xbox 360 game system turns the human body into a controller, but Bove’s students did something far more amazing with it. “A week later,” he says, “they were presenting holograms with it.” The students had hacked the Kinect, and found that it was a perfect tool for capturing images to project in three dimensions--in other words, for holograms.

(Oh, and a quick note about "holographs:" The word "holography" refers to the technique, and "holograms" are the results of it. "Holograph" is often used as a synonym for "hologram," but as Bove used the word "hologram" during our conversation, that's what I'll use here.)

Home holography video chat may sound like the stuff of Star Wars, but it’s closer than we think. Holography, like traditional 3-D filmmaking, has the end goal of a more immersive video experience, but the tech is completely different. 3-D cameras are traditional, fixed cameras, which simply capture two very slightly different streams to be directed to each eye individually--the difference between the two images creates the illusion of depth. If you change your position in front of a 3-D movie, the image you see will remain the same--it has depth, but only one perspective. (Curious about glasses-free 3-D? Check out our interactive primer.) A hologram, on the other hand, is made by capturing the scatter of light bouncing off a scene as data, and then reconstructing that data as a 3-D environment. That allows for much greater immersion--if you change your viewing angle, you'll actually see a different image, just as you can see the front, sides, and back of a real-life object by rotating around it. "If holography is done right," says Bove, "it's really quite stunning."

Capturing that scatter of light is no easy feat. A standard 3-D movie camera captures light bouncing off of an object at two different angles, one for each eye. But in the real world, light bounces off of objects at an infinite number of angles. Holographic video systems use devices that produce so-called diffraction fringes, basically fine patterns of light and dark that can bend the light passing through them in predictable ways. A dense enough array of fringe patterns, each bending light in a different direction, can simulate the effect of light bouncing off of a three-dimensional object.

The trick is making it live, fast and cheap. It is one of the OBMG’s greatest challenges: the equipment is currently extremely expensive, the amount of data massive. "[We’re] trying to turn holographic video from a lab curiosity into a consumer product," Bove says. They’re getting close. Using the Kinect, which costs just $150, and a laptop with off-the-shelf graphics cards, the OBMG crew was able to project holograms at seven frames per second. Previous breakthroughs, both at MIT and at other institutions like Cornell, could only achieve frame rates of one per every two seconds--far slower than the 30 frames per second required for movies or the 24 frames per second required for television. A week later, the MIT students had gotten the rig up to 15 frames per second. Bove says that's far from the limits of the Kinect hardware. The next step is to bring down the cost of the holographic display.

Hologram From Star Wars:  LucasFilm

The current holographic display is a sophisticated acousto-optic modulator, a device that diffracts and shifts the frequency of light by using sound waves. But the OBMG is hoping to replace the modulator, a one-of-a-kind, highly expensive piece of equipment which was pioneered by Bove's predecessor Stephen Benton, with a consumer model they hope will be able to be manufactured in the near future for a mere few hundred dollars. In a matter of years, truly live holographic video chat could be wholly possible. Princess Leia's holographic plea for help? Child's play. After all, it was pre-recorded.

The challenge with real-time holographic video is taking video data—in the case of the Kinect, the light intensity of image pixels and, for each of them, a measure of distance from the camera—and, on the fly, converting that data into a set of fringe patterns. Bove and his grad students—James Barabas, David Cranor, Sundeep Jolly and Dan Smalley—have made that challenge even tougher by limiting themselves to off-the-shelf hardware.

In the group’s lab setup, the Kinect feeds data to an ordinary laptop, which relays it over the Internet. At the receiving end, a PC with three ordinary, off-the-shelf graphics processing units — GPUs — computes the diffraction patterns.

GPUs differ from ordinary computer chips — CPUs — in that their circuitry has been tailored to a cluster of computationally intensive tasks that arise frequently during the processing of large graphics files. Much of the work that went into the new system involved re-describing the problem of computing diffraction patterns in a way that takes advantage of GPUs’ strengths.

Home holography is a pretty incredible technology; it would have the potential to totally change the basic way we use displays, from media to chat. His path, by using cheap, easily found equipment, could be the fastest way to a holographic future, and it's especially thrilling that the Kinect, a $150 video game accessory most often used to teach the Soulja Boy dance, is a major component in making that future possible.

Processing 1.5+

Processing 1.5Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.


  • » Free to download and open source
  • » Interactive programs using 2D, 3D or PDF output
  • » OpenGL integration for accelerated 3D
  • » For GNU/Linux, Mac OS X, and Windows
  • » Projects run online or as double-clickable applications
  • » Over 100 libraries extend the software into sound, video, computer vision, and more...
  • » Well documented, with many books available

To see more of what people are doing with Processing, check out these sites:

» Processing Wiki
» Processing Discussion Forum
» OpenProcessing
» CreativeApplications.Net
» O'Reilly Answers
» Vimeo
» Flickr
» YouTube

To contribute to the development, please visit Processing on Google Code to read instructions for downloading the code,building from the source, reporting and tracking bugs, and creating libraries and tools.

Thursday, June 16, 2011

Download Kinect for Windows SDK beta


Skeleton tracking image

The Kinect for Windows SDK beta is a starter kit for applications developers that includes APIs, sample code, and drivers. This SDK enables the academic research and enthusiast communities to create rich experiences by using Microsoft Xbox 360 Kinect sensor technology on computers running Windows 7.Learn more >

The Kinect for Windows SDK beta includes the following:
  • Drivers, for using Kinect sensor devices on a computer running Windows 7.
  • Application programming interfaces (APIs) and device interfaces, together with technical documentation.
  • Source code samples.

System requirements

  • Kinect for Xbox 360 sensor
  • Computer with a dual-core, 2.66-GHz or faster processor
  • Windows 7�compatible graphics card that supports DirectX� 9.0c capabilities
  • 2-GB RAM (4-GB RAM recommended)

Installation instructions

To install this SDK beta:

  1. On this page, click Download to start the Kinect for Windows SDK beta download.
  2. Click Run to start Setup, and follow the instructions in the Setup Wizard.
    Or, to save the download on your computer so that you can install it later, click Save.

Wednesday, June 15, 2011

Practical Image and Video Processing Using MATLAB

Up-to-date, technically accurate coverage of essential topics in image and video processing

This is the first book to combine image and video processing with a practical MATLAB®-oriented approach in order to demonstrate the most important image and video techniques and algorithms. Utilizing minimal math, the contents are presented in a clear, objective manner, emphasizing and encouraging experimentation.

The book has been organized into two parts. Part I: Image Processing begins with an overview of the field, then introduces the fundamental concepts, notation, and terminology associated with image representation and basic image processing operations. Next, it discusses MATLAB® and its Image Processing Toolbox with the start of a series of chapters with hands-on activities and step-by-step tutorials. These chapters cover image acquisition and digitization; arithmetic, logic, and geometric operations; point-based, histogram-based, and neighborhood-based image enhancement techniques; the Fourier Transform and relevant frequency-domain image filtering techniques; image restoration; mathematical morphology; edge detection techniques; image segmentation; image compression and coding; and feature extraction and representation.

Part II: Video Processing presents the main concepts and terminology associated with analog video signals and systems, as well as digital video formats and standards. It then describes the technically involved problem of standards conversion, discusses motion estimation and compensation techniques, shows how video sequences can be filtered, and concludes with an example of a solution to object detection and tracking in video sequences using MATLAB®.

Extra features of this book include:

  • More than 30 MATLAB® tutorials, which consist of step-by-step guides to exploring image and video processing techniques using MATLAB®

  • Chapters supported by figures, examples, illustrative problems, and exercises

  • Useful websites and an extensive list of bibliographical references

This accessible text is ideal for upper-level undergraduate and graduate students in digital image and video processing courses, as well as for engineers, researchers, software developers, practitioners, and anyone who wishes to learn about these increasingly popular topics on their own.

About the Author

Oge Marques, PhD, is Associate Professor in the Department of Computer & Electrical Engineering and Computer Science at Florida Atlantic University. He has been teaching and doing research on image and video processing for more than twenty years, in seven different countries. Dr. Marques is the coauthor of Processamento Digital de Imagens and Content-Based Image and Video Retrieval and was editor-in-chief of the Handbook of Video Databases, a comprehensive work with contributions from more than 100 world experts in the field. He is a Senior Member of both the IEEE and the ACM.

Preorder NOW

Google Launches Search By Image – It’s Like Goggles For The Desktop

At the Inside Search event being held at San Francisco, Google has announced a new addition to its search features - Search by Image. The Search by Image feature is something like Google’s image search application for mobile devices – Google Goggles.

Those who have used service like TinEye will be familiar with what Search by Image does. Although technically similar to what Tiny Eye does, Google has taken it much further.

Google Search by Image

Suppose you have an image you want to search about. There are three ways for you to use Search by Image to search for the image:

  • You can upload the image.

  • You can also drag and drop the image into the search box.

  • If the image is online, you can paste the URL of the image into the search box.

  • You can also use extensions for Chrome and Firefox that Google will release.

After the image has been analyzed by the servers at Google, Google will attempt to identify the image and bring up search results related to it. The technology used for Search by Image is similar to that used by Google Goggles for smartphones.

The service is not live yet and Google will roll out the feature in the coming few days. You will know that Search by Image has been activated for you when you see a camera icon at the right of the search box.

Here is a video that Google has released introducing Search by Image:

Tuesday, June 7, 2011

Minority Report-like Interface

Evoluce presented at the RTT Excite a “Minority-Report”-like software for Kinect. WIN&I 2.0 beta multi-gesture software allows to rotate images and zoom in and out. Objects on the screen can be dragged and dropped by simple hand gestures in the air. By closing the hand the user grabs an object and can move it like in the real world.

Multi-touch applications can be controlled by both hands and gestures of fingers without touching the screen surface. Precise finger tracking allows to manipulate not only objects but also icons, images and videos on the screen.  This natural user interface software enables also finger- and hand-tracking of multiple users at the same time. A huge variety of applications in gaming, consumer electronics, office, education, point of sale and medical systems can be controlled by touchless gestures in an intuitive way.

Evoluce will offer soon a Multi-Gesture SDK for Kinect™/Asus Xtion Pro™/ PrimeSensor™

The SDK will support developers to create natural user applications with a controller free experience and offers

• gesture module supporting finger gestures, push, zoom, rotate, drag & drop
• tracking and gesture recognition of both hands
• visual feedback: picture/animation/video
• “multi-touch” application control, e.g. Microsoft Touch Pack for Windows 7
• API´s: Windows 7 multitouch, TUIO
• mouse control
• skeleton tracking based on OpenNI™

Thursday, June 2, 2011

ESSIR 2011: Top-K reasoning for NOT attending the European Summer School on IR

ESSIR 2011 - European Summer School in Information Retrieval

29 Aug - 02 Sep 2011, Koblenz, Germany

Practical top-k reasoning in Information Retrieval:

Top-10 reasons NOT to attend ESSIR 2011

If you don't want to meet authors of best recent IR books as your lecturers, stay home.

If you prefer questions to answers, don't come to the plenary discussion with lecturers about doing successful PhD in IR.

If you are not interested in recent trends and directions in IR, avoid PhD symposiums. Do not submit any papers there!

If you don't need any feedback on your research/PhD topic, do not present anything in the poster/demo session. Grab some yummy fingerfood and disappear.

If you are not interested in highly reasonable fees, try to register as late as possible. Do not apply for school grants!

If you're not satisfied with best-price accommodation, stay home for free.

If your physician told you to avoid excitement, don't join the Rhine ship-cruise to the famous Loreley valley.

If you are indifferent to rivers, rocks, wineyards, volcanos, geysirs, medieval castles and world-best white wines, Rhineland will not attract your attention.

If you don't want to meet new people and exchange ideas, do not join our informal come-together activities.

If you don't trust Web 2.0, do not join our Facebook group. Do not follow us on Twitter, stay off our Flickr and Youtube contents.


OTHERWISE.. what are you actually waiting for ?!


Grant applications: until 01 June 2011 (apply NOW !!!)

Early registration: until 30 June 2011