We’ve seen AI generate text, then generate images and most recently even generate short videos, even though they still need some improvement.

 
 
 

The results are incredible when you think that no one is actually involved in the creation process of these pieces and it only has to be trained once to then be used by thousands of people like stable diffusion is.

 
 
 

Still, do these models really understand what they are doing? Do they know what the picture or video they just produced really represents?

 
 
 

What does such a model understand when it sees such a picture or, even more complex, a video? Learn more in the video... (there is RTX GPU giveaway information in the video too!)

 
 
 

References

►Read the full article:
https://www.louisbouchard.ai/general-video-recognition/
►Ni, B., Peng, H., Chen, M., Zhang, S., Meng, G., Fu, J., Xiang, S. and
Ling, H., 2022. Expanding Language-Image Pretrained Models for General
Video Recognition. arXiv preprint arXiv:2208.02816.
►Code: https://github.com/microsoft/VideoX/tree/master/X-CLIP
►My Newsletter (A new AI application explained weekly to your emails!):
https://www.louisbouchard.ai/newsletter/

#artificialintelligence #ai #machinelearning #technology #datascience #python #deeplearning #programming #tech #robotics #innovation #bigdata #coding #iot #computerscience #data #dataanalytics #business #engineering #robot #datascientist #art #software #automation #analytics #ml #pythonprogramming #programmer #digitaltransformation #developer