Technische Universität Berlin - Faculty IV - The Berlin Institute for the Foundations of Learning and Data (BIFOLD) / Management of Data Science Processes

Technische Universität Berlin offers an open position:

Research Assistant - salary grade E13 TV-L Berliner Hochschulen - 1st qualification period (PhD candidate)

part-time employment may be possible

The DEEM Lab ( at the Berlin Institute for the Foundations of Learning and Data (BIFOLD) at TU Berlin is looking for a research assistant to work on research at the intersection of Responsible Data Engineering and Machine Learning (ML) under the direction of Prof Dr Sebastian Schelter.

Working field:

The research will focus on complex ML applications, which include data integration and data pre-processing pipelines. Such applications are difficult to build as they utilise data from heterogeneous sources that needs to be integrated and transformed into features before the data can be consumed by ML models. This requires combining different relational and linear algebra operations, which often leads to performance problems and the loss of important information about the origin of the processed data.

The goal of the research will be twofold: First, we want to make the creation of complex ML applications easier for non-expert users, for example when they want to integrate domain-specific knowledge into their applications or when they evaluate the robustness of their ML applications. Secondly, we are developing the foundations for ML applications that guarantee their users’ control over their personal data (e.g. with regard to the "right to be forgotten" from the GDPR) and comply with legal regulations such as the upcoming European AI Act. This will be achieved through novel declarative methods for the automatic design, testing and debugging of ML applications that potentially utilise the code generation capabilities of Large Language Models. The research will lead to efficient and scalable implementations that will be made publicly available as open source libraries.

The position serves the student's own academic qualification (doctorate) and includes collaboration in teaching. This includes the organization of tutorials and internships as well as the supervision of student work.


  • Successfully completed university degree (Master, Diplom or equivalent) in Computer Science or Artificial Intelligence
  • Strong programming skills in Python and at least one other language (Java/Scala/Rust/C++)
  • Knowledge of data processing with dataflow systems, relational databases and/or dataframe libraries (e.g. Apache Spark, DuckDB, Pandas, etc.)
  • Experience with increasing the efficiency, scalability and correctness of data-centric programmes
  • Basic knowledge of machine learning and knowledge of common libraries (e.g. Pandas, Sklearn, Pytorch, SparkML, etc.)
  • Ability to teach in English and/or German is required, willingness to acquire the missing language skills in each case


  • Creativity and independent thinking, self-motivated working style

In addition, applications with proof of the following knowledge are preferred:

  • Experience with real-world data processing systems and/or ML deployments (e.g. from internships, jobs or entrepreneurial experience)
  • Experience with regulations such as the GDPR and the EU AI Act
  • Contributions to open source projects

How to apply:

Please send your written application, quoting the reference number, together with the usual application documents (i.e. at least a cover letter, CV, degree certificates, transcripts of records, etc.) to Technische Universität Berlin - Die Präsidentin - Fakultät IV, Institut für Softwaretechnik und Theoretische Informatik, FG Management von Data Science Prozessen, Prof. Dr.-Ing. Schelter, TEL 9-2, Ernst-Reuter-Platz 7, 10587 Berlin or by e-mail (one PDF file, max. 5 MB) to:

