How to do image comparison

I wish to detect if an image is contained in a screenshot. There can be small pixel color variations but not visible for the eye. We cannot trust an exact match.

THe appium documentation says that the function for it is available in all drivers. But I do not find it.

I have installed globally opencv4nodejs
There is no function driver.matchImagesFeatures like for the ruby example.

my driver is webdriverio for Javascript.

Does anyone know?

I am looking for an api in my driver.

Otherwise, I will use opencv4nodejs directly.

And I do not understand the examples.

Here I have to give a det = detector. BUt I don’t know what to use.
WHy is it not possible to just have img1 and img2 as inputs?

I just found in the Java example what can be the detector, somehtin like ‘ORB’

.matchImagesFeatures(screenshot, originalImg, new FeaturesMatchingOptions()
                .withDetectorName(FeatureDetector.ORB)
                .withGoodMatchesFactor(40)
                .withMatchFunc(MatchingFunction.BRUTE_FORCE_HAMMING)
                .withEnabledVisualization());

Would a Java example work for you?

a Java example would suit me because I will understand the algorithm.

I can now run an algo in javascript.
But the change detection gives a very strange result. It shows many lines and dots on top of the image and there is no clear way to say yes or no if it found the image. Moreover, I have to use a smaller screenshot image or the cv.showImage shows only a part o f the image that is not of interest. ANd I cannot zoom out.

Here is the code that is “runnable”. But it does not find the image1 inside the image 2.
I copy pasted it from this link. But there are more detectors to try and options to change.

opencv4nodejs/examples/matchFeatures.js at master · justadudewhohacks/opencv4nodejs · GitHub

const matchFeatures = ({ img1, img2, detector, matchFunc }) => {
// detect keypoints
const keyPoints1 = detector.detect(img1);
const keyPoints2 = detector.detect(img2);

    // compute feature descriptors
    const descriptors1 = detector.compute(img1, keyPoints1);
    const descriptors2 = detector.compute(img2, keyPoints2);

    // match the feature descriptors
    const matches = matchFunc(descriptors1, descriptors2);

    // only keep good matches
    const bestN = 40;
    const bestMatches = matches.sort(
      (match1, match2) => match1.distance - match2.distance
    ).slice(0, bestN);

    return cv.drawMatches(
      img1,
      img2,
      keyPoints1,
      keyPoints2,
      bestMatches
    );
  };

  const isImageLikedByMyself = async () => {
    //const screenImagePath = './appium_thumbs_up.png';
    const screenImagePath = './appium_thumbs_up2.png';
    //driver.saveScreenshot(screenImagePath)
    const screenImage = cv.imread(screenImagePath);
    const likedImagePath = "./referenceImage.png"
    const likedImage = cv.imread(likedImagePath);
    // source https://github.com/justadudewhohacks/opencv4nodejs/blob/master/examples/matchFeatures.js
    // https://docs.opencv.org/master/dc/dc3/tutorial_py_matcher.html
    const orbMatchesImg = matchFeatures({
      img1: likedImage,
      img2: screenImage,
      detector: new cv.ORBDetector(),
      matchFunc: cv.matchBruteForceHamming
    });
    cv.imshowWait('ORB matches', orbMatchesImg);
  };

img1 is a 125p x 125 px image in PNG format with a transparent background
img2 is a screenshot containing a scaled-down copy of img1 + some background color.

Many thanks for the help.

I have attached the img1 and img2.
fa_thumbs_up_liked

First of all, you don’t need to perform a feature-based comparison. You can use the findElemetByImage() method implemented by the AndroidDriver class.

Second, you will need to specify that the template image (the image to be found) needs to be scaled, and specify the scaling factor; otherwise, the method will not find it as its bigger than the one in the screenshot. And since the big image is a screenshot, you don’t need to take it.

So, in your case, you will need to perform something like this:

File template = new File("./appium_thumbs_up.png");

driver.setSetting(Setting.IMAGE_MATCH_THRESHOLD, 0.9f);
driver.setSetting(Setting.FIX_IMAGE_TEMPLATE_SCALE, true);
driver.setSetting(Setting.DEFAULT_IMAGE_TEMPLATE_SCALE, 0.75f);

WebElement e = driver.findElementByImage(
    Base64.getEncoder().encodetoString(
        File.readAllBytes(template.toPath())
    )
);

Set the scale value that works for you to match the template to the size of the image to be found within the screenshot. Then, play with the IMAGE_MATCH_THRESHOLD setting to find what value works best for you (in my code, I am able to make comparisons with a match threshold of up to 0.98).

If you cannot find the method for your case, then do an occurrence comparison instead of a feature-based. The former is to be used when the image to be found is a subset of the target/screenshot; the latter is to be used when the image to be found is basically the same as the target but rotated and/or scaled.

You can find an example on the page you mentioned at the beginning. Notice that you need to get the score from the result of the comparison and determine whether it meets the threshold that you think best.

Hope this helps.

Ave Caesar :slight_smile:

Many thanks for the good info.

I do not have such a function in my driver. I go in appium getting started. I shows a javascript example with webdriverio.
This gives me my driver (called client in the getting started).

And I do not understand when the appium doc about image copmarison says all drivers have image comparison functions.

I just realize that my appium is installed locally without -g and my opencv4nodejs was installed with -g. I will try. I do not like -g since I like to control package versions locally. And opencv4nodejs could only build the C++ with -g.

Anyways, the api doc for my driver does not show such an image comparison function.
https://webdriver.io/docs/api/webdriver.html#findelement

And I do not find anything such in the source code of all folders installed by my driver.

So I guess I will continue with opencv4nodejs. But now I know I need to use occurrence.

I have run an example, but it creates a green rectangle in the middle of the image, which is not a match.

  // template matching
  // https://github.com/appium/appium/blob/master/docs/en/writing-running-appium/image-comparison.md
  // https://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
  // https://github.com/justadudewhohacks/opencv4nodejs/blob/master/examples/templateMatching.js
  const isImageLikedByMyself = async () => {
        //const screenImagePath = './appium_thumbs_up.png';
    const screenImagePath = './appium_thumbs_up2.png';
    //driver.saveScreenshot(screenImagePath)
    const screenImage = cv.imread(screenImagePath);
    const likedImagePath = './image2.png'
      + 'drawable-hdpi/fa_thumbs_up_liked.png';

    // Load images
    const originalMat = await cv.imreadAsync(screenImagePath);
    const waldoMat = await cv.imreadAsync(likedImagePath);

    // Match template (the brightest locations indicate the highest match)
    const matched = originalMat.matchTemplate(waldoMat, 5);

    // Use minMaxLoc to locate the highest value (or lower, depending of the type of matching method)
    const minMax = matched.minMaxLoc();
    const { maxLoc: { x, y } } = minMax;

    // Draw bounding rectangle
    originalMat.drawRectangle(
      new cv.Rect(x, y, waldoMat.cols, waldoMat.rows),
      new cv.Vec(0, 255, 0),
      2,
      cv.LINE_8
    );

    // Open result in new window
    cv.imshow('We\'ve found Waldo!', originalMat);
    await cv.waitKey();
  };

I found an explanation of the algorithm here, but I cant fix my code.
https://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.html

In particular, at the end, it says that it worked better with a normalized version.
How do I normalize my code?

should I use parameter 6 instead of 5 to take the method f and not the e one?

I tired 6 and I get the error but the code run fine with 5:
Mat::MatchTemplate - OpenCV Error: (CV_TM_SQDIFF <= method && method <= CV_TM_CCOEFF_NORMED) in node_modules/opencv-build/opencv/opencv/modules/imgproc/src/templmatch.cpp

here lies the source:
https://github.com/opencv/opencv/blob/master/modules/imgproc/src/templmatch.cpp

So it seems the indexing starts at 0 and the value 5 already picks the CCOEFF_NORMED

Cesar, where can I find the source code of your Java function ```
findElementByImage


I want to see how it uses OpenCV:

THanks.

It’s in the java-client package, specifically in the io/appium/java_client/FindsByImage.java interface which is implemented by RemoteWebDriver.java. The code of the corresponding implementation is this:

  protected WebElement findElement(String by, String using) {
    if (using == null) {
      throw new IllegalArgumentException("Cannot find elements when the selector is null.");
    }

    Response response = execute(DriverCommand.FIND_ELEMENT,
        ImmutableMap.of("using", by, "value", using));
    Object value = response.getValue();
    if (value == null) { // see https://github.com/SeleniumHQ/selenium/issues/5809
      throw new NoSuchElementException(String.format("Cannot locate an element using %s=%s", by, using));
    }
    WebElement element;
    try {
      element = (WebElement) value;
    } catch (ClassCastException ex) {
      throw new WebDriverException("Returned value cannot be converted to WebElement: " + value, ex);
    }
    setFoundBy(this, element, by, using);
    return element;
  }

The value of by parameter sent is "-image".

It seems like webdriverio client simply does not implement any wrappers around this Appium functionality while the other clients do. You could either create a PR for that yourself or submit an improvement request. It is also possible to trigger the corresponding API endpoints directly from Appium server without having any wrappers. You could check the actual implementation in Java/Python/Ruby clients.

You could also check https://github.com/appium/appium-support/blob/master/lib/image-util.js regarding the actual lower-level OpenCV algorithm implementation details.

OK thousands thanks for the good info!

I will try to use the appium server for find by image.

If I cannot, then I will control the details of the algorithm. Maybe customizing OpenCV is more powerful. If the image ratios are not known and if there are differences, maybe I need finer algorithms.

It should not be too hard since all the functionality is in place in appium

Please help me some more.

I could run in Javascript the API point for “-image” using a base64 encoded image produced by openCV. I will make a PR later when everything works.

I have to run appium server manually so that it finds opencv4nodejs with -g, and it does not find it in appium-desktop. Let me clarify here that I am talking about the Javascript appium server, not the client. It seems to me that there is only one appium server, and it is running in Javascript with nodejs. But even if I am wrong the server has the -image funcitonality and it seems to work. But please tell me if you know a way to make the appium-desktop find the opencv4nodesjs. Since it is a binary file that is run, it is hard for me to modify where it finds modules.

And I have a new and more important error. I also do not understand why it uses the resolution of my screen and not the emulator device resolution.

[debug] [BaseDriver] Waited for 9478 ms so far

[debug] [WD Proxy] Matched ‘/screenshot’ to command name ‘getScreenshot’
[debug] [WD Proxy] Proxying [GET /screenshot] to [GET http://127.0.0.1:8200/wd/hub/session/7e58569c-b334-4fb6-b743-0a7ac0d9a42d/screenshot] with no body
[debug] [WD Proxy] Got response with status 200: {“sessionId”:“7e58569c-b334-4fb6-b743-0a7ac0d9a42d”,“value”:"iVBORw0KGgoAAAANSUhEUgAABDgAAAeACAYAAAArYecKAAAABHNCSVQICAgIfAhkiAAAIABJREFU\neJzsvXeYnUd5Nn6/Z3uVtFqt1SVbsmxL7oAr7oAJJrSAgYRg4IspoRlCIIGPACGQL4SQ0ILpoYQS\negvY4G5jbMBFtootybaK1aXVSrvaft7fH7tnNTs785SZ96ys3zW3Ll17zvvOPM8zM08/Lbv5rQtz\n5DkAAFmGiccVuK5pEUPDnOujI6FfFB3NOAliaYXOt+dJ9qdombhxEhm1OqvhydEBitMprayx5x4y\nP2aulLb0OncvhC+lb1I6sfSma4+19KvtG7W2Xc19kshYjfFFIuS8quHDiphv0gHi/SmnXzE8QmkV\nGX9dNKs93+XrgOmPE6H7KNH9WFvh5ofGDp9eT4eP5HgUlZ+GzpXKJdFfyZloz5WTXwvzPGz5q8HP\nRyc2b9Pk9dWwpVAU5fd85+izt6dCzJjOulqoQyXyoEMSuiLGmON8xmrS8SUpJswxprLYvCQyS/hJ\n4VMIkyZF314LBWrPXPvD0eBk0oyz6bp0z+WwXLxMneHgOltb93z65NszU6805+PiEXKP4ufaN9fe\nS+f6eJv/KZo2bY2the6Bb6wtg4835V80ZxZKxwfpHodAO5eThfK9VMJs748mIMaCspkixmsh0UeO\nt+njfHup8ae++Sbs2KbxEaHxzpTDLv4oWWNgxwhurCvnqabeFEWfio8aXj69sOm64PIHGr3l6FKP\n7fGus7Pti8uxqPzM5ztd+s3xsvlKrrtiMJeb+87TB2lB5JtL5fQ+vTJzFpftugpKn//w5SmSGqMI\nH6T1PbYcWr+g0W+JXFTsr/CjrlOycGdQdD3noinN1Vx1BucfYhBT+8U0N1zzJXUvg5J3kstQi0ig\nQ5oInNNw8aBk9QUG…
[BaseDriver] Verifying screenshot size and aspect ratio
[BaseDriver] When trying to find an element, determined that the screen aspect ratio and screenshot aspect ratio are different. Screen is 1080x1776 whereas screenshot is 1080x1920.
[BaseDriver] Resizing screenshot to 1080x1920 to match screen aspect ratio so that image element coordinates have a greater chance of being correct.
[debug] [W3C (bef9de16)] Encountered internal error running command: NoSuchElementError: An element could not be located on the page using the given search parameters.

I tried several times, and I get each time the error about the difference between screen and screenshot aspect ratios.

The input is a small image that is however not of the same scale as the one shown on the device. I will try to adjust the scale.

I found some info about the mismatch. It is because the native menu bar is not part of the screenshot. But I wonder if this error is really preventing the find by image to work.

Can you share the images you’re comparing?