Multimodal decoding of human brain activity into images and text

IRIS

every day, the human brain processes an immense volume of visual information, relying on intricate neural mechanisms to perceive and interpret these stimuli. Recent breakthroughs in functional magnetic resonance imaging (fMRI) have enabled scientists to extract visual information from human brain activity patterns. In this study, we present an innovative method for decoding brain activity into meaningful images and captions, with a specific focus on brain captioning due to its enhanced flexibility as compared to brain decoding into images. our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline that utilizes latent diffusion models and depth estimation. we utilized the natural scenes dataset, a comprehensive fMRI dataset from eight subjects who viewed images from the COCO dataset. we employed the generative Image-to-text transformer (GIT) as our backbone for captioning and propose a new image reconstruction pipeline based on latent diffusion models. the method involves training regularized linear regression models between brain activity and extracted features. additionally, we incorporated depth maps from the controlNet model to further guide the reconstruction process. we propose a multimodal based approach that leverages similarities between neural and deep learning representations and by learning alignment between these spaces, we produce textual description and image reconstruction from brain activity. we evaluate our methods using quantitative metrics for both generated captions and images. our brain captioning approach outperforms existing methods, while our image reconstruction pipeline generates plausible images with improved spatial relationships. In conclusion, we demonstrate significant progress in brain decoding, showcasing the enormous potential of integrating vision and language to better understand human cognition. our approach provides a flexible platform for future research, with potential applications based on a combination of high-level semantic information coming from text and low-level image shape information coming from depth maps and initial guess images.

Ferrante, M., Boccato, T., Ozcelik, F., Vanrullen, R., Toschi, N. (2023). Multimodal decoding of human brain activity into images and text. In PROCEEDINGS OF UNIREPS: THE FIRST WORKSHOP ON UNIFYING REPRESENTATIONS IN NEURAL MODELS (pp.11-26). ML Research Press.

Multimodal decoding of human brain activity into images and text

Ferrante M.;Boccato T.;Ozcelik F.;VanRullen R.;Toschi N.

2023-01-01

Abstract

every day, the human brain processes an immense volume of visual information, relying on intricate neural mechanisms to perceive and interpret these stimuli. Recent breakthroughs in functional magnetic resonance imaging (fMRI) have enabled scientists to extract visual information from human brain activity patterns. In this study, we present an innovative method for decoding brain activity into meaningful images and captions, with a specific focus on brain captioning due to its enhanced flexibility as compared to brain decoding into images. our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline that utilizes latent diffusion models and depth estimation. we utilized the natural scenes dataset, a comprehensive fMRI dataset from eight subjects who viewed images from the COCO dataset. we employed the generative Image-to-text transformer (GIT) as our backbone for captioning and propose a new image reconstruction pipeline based on latent diffusion models. the method involves training regularized linear regression models between brain activity and extracted features. additionally, we incorporated depth maps from the controlNet model to further guide the reconstruction process. we propose a multimodal based approach that leverages similarities between neural and deep learning representations and by learning alignment between these spaces, we produce textual description and image reconstruction from brain activity. we evaluate our methods using quantitative metrics for both generated captions and images. our brain captioning approach outperforms existing methods, while our image reconstruction pipeline generates plausible images with improved spatial relationships. In conclusion, we demonstrate significant progress in brain decoding, showcasing the enormous potential of integrating vision and language to better understand human cognition. our approach provides a flexible platform for future research, with potential applications based on a combination of high-level semantic information coming from text and low-level image shape information coming from depth maps and initial guess images.

Scheda breve

Scheda completa

Scheda completa (DC)

	Nome del convegno
	
				1st Workshop on Unifying Representations in Neural Models (UniReps)
			
	Luogo del convegno
	
				New Orleans, LA (USA)
			
	Anno del convegno
	
				2023
			
	Numero del convegno
	
				1.
			
	Rilevanza del convegno
	
				Rilevanza internazionale
			
	Tipo di relazione
	
				contributo
			
	Data di pubblicazione
	
				2023
			
	Settore disciplinare dell'intervento (valido dal 09/05/2024)
	
				Settore PHYS-06/A - Fisica per le scienze della vita, l'ambiente e i beni culturali
			
	Lingua del contenuto
	
				English
			
	Tipologia
	
				Intervento a convegno
			
	Citazione
	
				Ferrante, M., Boccato, T., Ozcelik, F., Vanrullen, R., Toschi, N. (2023). Multimodal decoding of human brain activity into images and text. In PROCEEDINGS OF UNIREPS: THE FIRST WORKSHOP ON UNIFYING REPRESENTATIONS IN NEURAL MODELS (pp.11-26). ML Research Press.
			
	Tutti gli autori
	
						Ferrante, M; Boccato, T; Ozcelik, F; Vanrullen, R; Toschi, N
					
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
6_Multimodal_decoding_of_human.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 12.51 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	12.51 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/406124

Citazioni

ND

0

8

social impact