1.6B parameter model built by @vikhyatk using SigLIP, Phi-1.5 and the LLaVa training dataset. The model is release for research purposes only, commercial use is not allowed.
Try it out on Huggingface Spaces!
Usage
pip install transformers timm einops
from transformers import AutoModelForCausalLM, CodeGenTokenizerFast as Tokenizer
from PIL import Image
model_id = "vikhyatk/moondream1"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = Tokenizer.from_pretrained(model_id)
image = Image.open('<IMAGE_PATH>')
enc_image = model.encode_image(image)
print(model.answer_question(enc_image, "<QUESTION>", tokenizer))
reupload #4. I assume this service doesn't like emojis.
https://huggingface.co/vikhyatk/moondream1