In this article, I will summarize my journey to webcam image and audio processing.
For image processing, we use cv2 module.
import cv2video = cv2.VideoCapture(1)#Change device number if needed
fps = int(video.get(cv2.CAP_PROP_FPS))
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
print(fps, width, height)while True:
grabbed, frame = video.read()
print("====New frame====")
frame = cv2.resize(frame, (width//2, height//2))
cv2.imshow("Video", frame)
if cv2.waitKey(2) & 0xFF == ord("q"):
break
Source: https://github.com/aruno14/webcamProcessing/blob/main/test_webcam.py
The mean image is created using the average pixel value of all detected faces during time. In next section, we will use this image as input of our machine prediction model.
import cv2 import…
We often have to deal with very long lists of element, like audio data, and sometimes we want to reduce size to increase processing speed. There are many functions for image but not for 1dim array. I propose you my solution using only NumPy function.
def resize(inputArray, newLength):
block_size = len(inputArray)//newLength
outputArray = inputArray[0:block_size*newLength].reshape((block_size, newLength))
outputArray = np.mean(outputArray, axis=0)//np.max/np.min...
outputArray = outputArray.reshape((newLength))
return outputArray
We reshape the array into a 2dims array constituted of block we will represent by one element using Mean, Max, Min or any function you want to use to reduce size of your array.
Note: this method only works to reduce size and cannot be used to increase the size of the array.
When you heard about Gender Recognition you think about Image Processing; however, we can also identify gender using Voice.
First, check how we can do Gender Recognition using image.
Then, we can check some audio processing:
After that, we download audio dataset: https://commonvoice.mozilla.org/en/datasets
Dataset contains below labels including the gender (not for all files):
client_id path sentence up_votes down_votes age gender accent locale segment
We want to create a model which predict the gain to apply to noisy frequencies to close clear frequencies.
We use STFT to obtain spectrogram of the input, then we predict and apply the gain to it, and finally we use inverse STFT to reconstruct a waveform.
def audioToTensor(filepath:str): audio_binary = tf.io.read_file(filepath) audio, audioSR = tf.audio.decode_wav(audio_binary) audioSR = tf.get_static_value(audioSR) audio = tf.squeeze(audio, axis=-1) frame_step = int(audioSR * 0.008)#16000*0.008=128 spectrogram = tf.signal.stft(audio, frame_length=frame_length, frame_step=frame_step)#->31hz, si 512 -> 64hz spect_image = tf.math.imag(spectrogram) spect_real = tf.math.real(spectrogram) spect_sign = tf.sign(spect_real) spect_real = tf.abs(spect_real) return spect_real, spect_image, spect_sign, audioSR…
This article compares the different Audio Representations implemented in TensorFlow and TensorFlowIO.
audio = tfio.audio.AudioIOTensor(filepath)
audioSR = int(audio.rate.numpy())
audio = audio[:]
audio = tf.squeeze(audio, axis=-1)
audio = tf.cast(audio, tf.float32)
plt.figure("Oscillo: " + filepath)
plt.plot(audio.numpy())
plt.show()
Shape: (16000,)
frame_step = int(audioSR * 0.008)
spectrogram = tf.abs(tf.signal.stft(audio, frame_length=1024, frame_step=frame_step))
plt.figure("Spect: " + filepath)
plt.imshow(tf.math.log(spectrogram).numpy())
plt.show()
In this article, I will try to provide you the simplest TensorFlow Gender Recognition implementation using TensorFlow.
GitHub: https://github.com/aruno14/genderRecognition
We will use UTKFace Dataset.
We use MobileNetV2 Model in order to keep the model small and usable on small devices.
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import pandas
import globimage_size = (48, 48)
batch_size = 32
epochs = 15folders = ["UTKFace/"] countCat = {0:0, 1:0} class_weight = {0:1, 1:1} data, labels = [], [] for folder in folders: for file in glob.glob(folder+"*.jpg"): file = file.replace(folder, "") age, gender = file.split("_")[0:2] age, gender = int(age), int(gender)…
In this article, I will try to provide the simplest TensorFlow Emotion Recognition implementation using TensorFlow.
GitHub: https://github.com/aruno14/emotionRecognition
We will use FER2013 Dataset.
We use MobileNetV2 Model in order to keep the model light and usable on small devices.
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGeneratorimage_size = (48, 48)
batch_size = 32
epoch = 15train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2, horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)train_generator = train_datagen.flow_from_directory(
"emotions/train",
target_size=image_size,
color_mode="grayscale",
shuffle=True,
batch_size=batch_size,
class_mode='categorical')validation_generator = test_datagen.flow_from_directory(
"emotions/test",
target_size=image_size,
shuffle=True,
color_mode="grayscale",
batch_size=batch_size,
class_mode='categorical')print(train_generator.class_indices)
classifier = tf.keras.applications.mobilenet_v2.MobileNetV2(include_top=True, weights=None, input_tensor=None, input_shape=image_size + (1,), pooling=None, classes=7)
classifier.compile(loss='categorical_crossentropy', metrics=['accuracy'])classifier.fit(train_generator, steps_per_epoch=train_generator.samples//batch_size, epochs=epochs, validation_data=validation_generator, validation_steps=validation_generator.samples//batch_size)
classifier.save("emotion_model")
I obtained an accuracy of 0.4873 after 15 epochs, however progression is not over.
Nowadays, we can use high precision voice recognition in our smartphone or any smart devices. However, those systems are provided by big companies like Google, Amazon or Apple, and are not free.
Many people, including myself, thought that it was because of a lack of free data. However, nowdays, we can easily find free data on the Internet.
Voice datasets
Dataset sizes for some languages (2020/10/12)
Some other data here:
Tools
Maybe, it was because no tools are available…
Working in computer science.