Oliver Pejic
Oliver Pejic
IAESTE Intern Summer 2024, 12 weeks in August, September and October.
Goal:
- Help with the Hate Speech Project
- Help with evaluation of sentence transformer models using toolkit MTEB
Final Tasks:
- Prepare an MTEB evaluation task for Slovak HATE speech.
- Prepare an MTEB evaluation task for Slovak question answering.
- Machine translate an SBERT evaluation set for multiple slavic languages.
- Write a short scientific paper with results.
Meeting 3.10.:
State:
- Prepared a pull request for Retrieval SK Quad.
- Prepared a pull request for Hate Speech Slovak.
Tasks:
- Make the pull request compatible with the MTEB Contribution guidelines. Discuss it when it is done.
- Submit pull requests to MTEB project.
- Machine Translate a database (HotpotQA, DB Pedia, FEVER) . Pick a database that is short, because translation might be slow.
Non priority tasks:
- Prepare databse and subnit it to HuggingFace Hub.
- Prepare a MTEB PR for the databse.
Meeting 3.9:
State: Studied MTEB framework and transformers.
Tasks:
- Prepare and try MTEB evaluation tasks for the database. For evaluation you can try me5-base model.
- Make a fork of MTEB and do necessary modification, including the documentation references for the task.
- Prepare 2 GITHUB pull requests for the databases, preliminary BEIR script given.
Future tasks:
- Prepare a machine translation system to create another slovak/multilingual evaluation task from English task.
Preparation (7.8.2024):
- Get familiar with SentenceTransformer framework, study fundamental papers and write down notes.
- Get familiar with MTEB evaluation framework.
- Prepare a working environment on Google Colab or on school server or Anaconda.
- Get familiar with existing finetuning scripts.