import matplotlib.pyplot as plt import numpy as np %matplotlib inline print('hello world')
There are multiple ways of representing color digitally, the most common one is the RGB Color Model.
In the RGB Color model, any color is represented as a combination of Red, Green and Blue.
color_1 = [255, 0, 0] # red color_2 = [0, 255, 127] # green color_3 = [0, 0, 255] # blue color_4 = [127, 127, 127] # grey plt.imshow(np.array([ [color_1, color_2], [color_3, color_4], ]))
<matplotlib.image.AxesImage at 0x7fe89012cfd0>
By tuning the levels of R, G and B, you can generate more colors:
What is an image?¶
An image is just a "matrix" of "pixels", each one of a different color:
Images in Python¶
With the help of numpy, we can represent images as simple arrays of pixels:
Each value in the BGR 3-tuple has a range of
[0, 255]. How many color possibilities are there for each pixel in an RGB image in OpenCV? That’s easy:
256 * 256 * 256 = 16.777.216.
colors = [ [ [0, 0, 255], # blue [0, 255, 0] # green ], [ [255, 0, 0], # red [255, 255, 0] # yellow ] ] print(np.array(colors).shape) plt.imshow(colors)
(2, 2, 3)
<matplotlib.image.AxesImage at 0x7fe8900496d8>
Numpy is great at managing matrices and multi-dimensional arrays (cubes, tensors, etc). A picture then is just a 3d structure. Each "pixel" is represented as a vector of R, G, B (
[255, 0, 0]).
colors = [ [ [0, 0, 255], # blue [0, 255, 0] # green ], [ [255, 0, 0], # red [255, 255, 0] # yellow ] ] print(np.array(colors).shape) plt.imshow(colors) start_row = 0 for row in colors: start_col = -0.25 for color in row: plt.text(start_col, start_row, str(color)) start_col += 1 start_row += 1
(2, 2, 3)
Also we can give opacity to our image by setting a new dimension of each pixel, known as
colors = [ [ [0, 0, 255, 255], # blue [0, 255, 0, 50] # green ], [ [255, 0, 0, 127], # red [255, 255, 0, 0] # yellow ] ] print(np.array(colors).shape) plt.imshow(colors) start_row = 0 for row in colors: start_col = -0.25 for color in row: plt.text(start_col, start_row, str(color)) start_col += 1 start_row += 1
(2, 2, 4)
Real images are just much larger structures of pixels. For example, a Full HD picture is 1920x1080 pixels. But each pixel is a vector of 3 elements, so its final shape would be:
SIZE = 2 SIZE = 100 colors = np.array( np.array([ np.array([np.random.randint(0, 255, 3) for x in range(SIZE)]) for x in range(SIZE) ]) ) print(np.array(colors).shape) plt.imshow(colors)
(100, 100, 3)
<matplotlib.image.AxesImage at 0x7fe88fd36ef0>
array([147, 94, 187])
Playing with OpenCV¶
OpenCV (Open Source Computer Vision Library) is a library that has many features, tools, algorithms and utilities to manage images and image related resources (like cameras).
Let's see it in action:
# read image import cv2 image = cv2.imread("./data/pydata.png")
As you can see, images are read as regular numpy arrays:
(277, 498, 3)
We can visualize the image at any moment with matplotlib:
<matplotlib.image.AxesImage at 0x7fe8bba770b8>
Using RGB pixel format¶
OpenCV uses by default the BGR pixel format (or color mode). The most common standard for computers and libraries (like matplotlib) is RGB. It's simple to convert from BGR to RGB:
# parse BRG to RGB image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # show image plt.imshow(image)
<matplotlib.image.AxesImage at 0x7fe8bba491d0>
# parse image to grayscale gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # show image plt.imshow(gray, cmap='gray')
<matplotlib.image.AxesImage at 0x7fe8bba17f60>
Resizing images is important for a number of reasons. First, you might want to resize a large image to fit on your screen. Image processing is also faster on smaller images because there are fewer pixels to process. In the case of deep learning, we often resize images, ignoring aspect ratio, so that the volume fits into a network which requires images of certain dimensions.
Note the resizing will deform the image a little bit. It is important to analyze this effect during the exploratory phase, as it can have a negative effect on the results of your model. Flowers and animals might be ok with a little stretching or squeezing, but facial features may not.
This can happen when the dimensions of the original image are not proportionally exact to your desired size. Let's try another strategy of rescaling the image and maintaining the aspect ratio.
WIDTH = 200 HEIGHT = 200
# resize, ignoring aspect ratio resized = cv2.resize(image, (WIDTH, HEIGHT)) # show image plt.imshow(resized)
<matplotlib.image.AxesImage at 0x7fe88728c780>
If you imagine portait images versus landscape images you’ll know that there are a lot of things that can get messed up by doing a slopping resize. Rescaling is assuming that you’re locking down the aspect ratio to prevent distortion in the image. In this case, we’ll scale down the image to the shortest side that matches with the model’s input size.
- Landscape: limit resize by the height
- Portrait: limit resize by the width
At this point only one dimension is set to what the model’s input requires. We still need to crop one side to make a square.
aspect = image.shape / float(image.shape) print(aspect) if(aspect > 1): # landscape orientation - wide image res = int(aspect * HEIGHT) scaled = cv2.resize(image, (res, HEIGHT)) if(aspect < 1): # portrait orientation - tall image res = int(WIDTH / aspect) scaled = cv2.resize(image, (WIDTH, res)) if(aspect == 1): scaled = cv2.resize(image, (WIDTH, HEIGHT)) # show image plt.imshow(scaled)
<matplotlib.image.AxesImage at 0x7fe88726d630>
There are a variety of strategies we could utilize. In fact, we could backpeddle and decide to do a center crop. So instead of scaling down to the smallest we could get on at least one side, we take a chunk out of the middle. If we had done that without scaling we would have ended up with just part of a flower pedal, so we still needed some resizing of the image.
Below we’ll try a few strategies for cropping:
- Just grab the exact dimensions you need from the middle!
- Resize to a square that’s pretty close then grab from the middle.
- Use the rescaled image and grab the middle.
As you can see that didn’t work out so well, except for maybe the last one. The middle one may be just fine too, but you won’t know until you try on the model and test a lot of candidate images. At this point we can look at the difference we have, split it in half and remove some pixels from each side. This does have a drawback, however, as an off-center subject of interest would get clipped.
If you’ve run this tutorial a few times now and are on Round 3, you’ll notice a pretty big problem. You’re missing astronaughts! You can still see the issue with the flower from Round 2 as well. Things are missing after the cropping and that could cause you problems. Think of it this way: if you don’t know how the model you’re using was prepared then you don’t know how to conform your images, so take care to test results! If the model used a lot of different aspect ratio images and just squeezed them to conform to a square then there’s a good chance that over time and lots of samples it “learned” what things look like squeezed and can make a match. However, if you’re looking for details like facial features and landmarks, or really nuanced elements in any image, this could be dangerous and error-prone.
Another strategy would be to rescale to the best size you can, with real data, but then pad the rest of the image with information that you can safely ignore in your model. We’ll save that for another tutorial though since you’ve been through enough here!
Scaled seems to be the best option.
def crop_center(img, cropx, cropy): y,x,c = img.shape startx = x//2-(cropx//2) starty = y//2-(cropy//2) return img[starty:starty+cropy, startx:startx+cropx] # yes, the function above should match resize and take a tuple... # Scaled image cropped = crop_center(scaled, WIDTH, WIDTH) # show image plt.imshow(cropped, cmap='gray')
<matplotlib.image.AxesImage at 0x7fe88715c9b0>
image = cropped.copy() # add text cv2.putText(image, "Cordoba 2018", (135, 280), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2) # add line cv2.line(image, (140, 290), (290, 290), (89, 95, 114), 3) # show image plt.imshow(image, cmap='gray')
<matplotlib.image.AxesImage at 0x7fe8870d4710>
# your code goes here...