The GreenAI system consists of different machine vision stations steering related robotic units. For this purpose, each station is equipped with some illumination sources and at least one camera. The advantage of multiple cameras which share the same field of view is that it renders stereo vision possible. As the situation and the resulting need for depth information is different for every station, the decision about the number of cameras is made individually.
The policy for each system station, to some degree, involves artificial intelligence for image recognition. Consequentially, several distinct datasets of color images are built and maintained. The minimum amount of images that is required to achieve sufficient results depends on the complexity of the recognition task and varies significantly between the single stations as a consequence. Also, the process of capturing new images can differ with respect to time consumption for the stations. Thus, the actual number of acquired real images is chosen as compromise of requirements, usefulness, and time expenditure.
For supervised learning, which is the most target-aimed way of training deep neural networks, all dataset images need to be annotated. The need for multiple ample image datasets where each image should be annotated more or less accurately leads to an intense investigation of auxiliary annotation tools.
As to the help with annotation work, a state-of-the-art framework for interactive segmentation from the literature is utilized. The user has the possibility to set positive points (indicates that the spot is part of the object mask to be found) and negative points (indicates that the spot is not part of the object mask to be found). According to the input points by the user, a neural network predicts the object mask. Once the user is content with the suggested object mask, the mask is saved and a new object mask is started. The predictions of a neural network can be tailored to the objects of a dataset of interest by training it with a small pool of already annotated images from the same database. A contrasting juxtaposition of the annotation processes of interactive segmentation and of manual polygon drawing is visualized. On a supplementary basis, a few self-written Python codes are used for polishing and shortening related annotation processes.
Another very viable way to quick enlargement of annotated datasets is data augmentation and, in particular, synthetic data generation. For this reason, an image processing algorithm for the creation of synthetic images from annotated real images is developed. Depending on how a few input configurations are set, synthetic images with certain properties are generated for the distinct stations. This helps to scale the dataset size. While synthetic images can always appear in much larger numbers than the real images and the introduction of synthetic images, in general, has a clearly positive effect on all stations, it can be stated that the special challenges in particular stations of the GreenAI system make the impact of synthetic images even better.
There are various detection challenges in the GreenAI system.
On the one hand, there are a few cases where the approach is clear. For example, instance segmentation is required at some spots. Instance segmentation, however, is such an active research field. Many different architectures are suggested in the literature for the underlying neural network. While there seem to be a few structures that generally outperform certain others, there are groups of networks which have similar overall results but excel in sub disciplines. Due to this, different neural networks are tried out in parallel for the GreenAI stations and compared with respect to their accuracy and speed for the particular task. The ones with the best tradeoff for a specific task are kept and integrated into the system.
On the other hand, there are cases where different recognition approaches are conceivable for accomplishing the same final mission. For example, lines are searched in some of the stations. While the lines could be slightly curved, it is absolutely sufficient for application to limit to straight lines. For finding these lines, several approaches are considered. Firstly, by segmenting for regions which intersect along the lines of interest, the actual lines could be derived indirectly from the intersections of predicted segments. This process is not limited to straight lines. Secondly, straight lines are fully represented by their two endpoints such that the detection of pairs of keypoints would already do the complete job. Similarly, straight lines are fully captured by a point together with an angle and a length. Searching for tuples which consist of the point as well as angle and length are thus possible too. Thirdly, there are many more strategies which may involve traditional image processing partially. At the end, all competing approaches have some individual advantages and disadvantages. It is barely possible to predict the quality of results and, as a consequence, the different approaches are tested in parallel and the final decision for an approach is postponed.
The final system is supposed to cope with a great range of different genera of plants. This means that there are many more genera to follow in addition to the two types of plants for which the proof of concept is actually done. In order to upgrade the system software for more genera of plants, methods of learning similar issues with the new plants from known issues with the old plants are to be tested. In general, major work is planned to be used on the dilemma of generating suitable databases at reasonable efforts. Among other things, there are the following particular ideas and plans for future research work:
- A 3D scanner which is suitable for the precise scan of the project’s plant parts can deliver complete 3D models of plants. It is to be investigated how 2D images can be generated from 3D plant models. The objective is to make these images look as realistic as possible and to create as many unique-looking images as possible from the number of 3D models of plants. This would allow to complement the dataset with many more targeted and annotated images and can also contribute the 3D ground truth for sets of images if stereo vision was to be used.
- Active learning is to be introduced as an effort to reduce the number of images that need to be annotated. In so doing, a user gets feedback from a neural network on which might be the images from the unannotated image pool that would add the most novel information to the training if annotated. If the user follows these suggestions/instructions from the active learning model (as opposed to a random selection of the next image for annotation) and labels the recommended images first, the network is expected to gain even more dataset-targeted knowledge with the same or a smaller quantity of annotated images.
- There is a considerable discrepancy between the time expense for the capture of new raw images and the time expense for specific annotation tasks. In those cases, the raw image acquisition is much less worrisome than the annotation of the images. This is especially true if the GreenAI system is up and running because new images are taken during normal operation without permanent assistance by the user and could be saved for later usage. Hence, it would be ideal if there is a way to use such an abundance of unannotated images for improvement of results. Semi-supervised learning is the corresponding concept of interest. It is to be checked whether (and to which extent) semi-supervised learning is able to surpass supervised learning’s results with the same number of annotated images and a many more unannotated images.
- It is to be investigated if and under which circumstances generative adversarial networks can create even more beneficial synthetic images than the currently used process based on traditional image processing. A focus is put on the combined generation of raw images together with their alleged ground truth as the already implemented method also brings the annotations along. However, since semi-supervised learning is envisaged as well, there may be some more trials for the generation of bare raw images which are utilized as additional unannotated images to train the network semi-supervisedly.