Youssef Ressaissi

June 30, 2025 Daniel Hladek iaeste 0 minutes, 45 seconds

IAESTE Intern Summer 2025, 1.7. - 31.8.2025

Goal: Evaluate and improve language models for summarization in Slovak medical or legal domain.

Tasks:

  1. Get familiar with basic tools
    • and prepare working environment: HF transformers, datasets, lm-evaluation-harness, HF trl
    • Read several recent papers about summarization using LLM and write a report.
    • Get familiar how to perform and evaluate document summarization using language models in Slovak.
  2. Make a comparison experiment
    • Pick summarization datasets and models. Evaluate several models for evaluation using ROUGE and BLEU metrics.
    • https://github.com/slovak-nlp/resources
    • Describe the experiments. Summarize results in a table. Describe the results.
  3. Improve performance of a languge model.
    • Use more data. Prepare a domain-oriented dataset and finetune a model. Maybe generate artificial data to imporve summarization.
    • Run new expriments and write down the results.
  4. Report and disseminate
    • Prepare a final report with analysis, experiments and conclusions.
    • Publish the fine-tuned models in HF HUB. Publish the paper from the project.