Find Jobs
Hire Freelancers

Looking for a python developer to help me finish a search engine with tf-idf and cosine similarity + query WITHOUT libraries such as sklearn

€8-30 EUR

Cancelled
Posted over 3 years ago

€8-30 EUR

Paid on delivery
I am looking for a python developer, preferably an expert in NLP, to help me finish a search engine for one of my college courses. The first part of the code, which is an inverted index, is already done. Please DO NOT change any parts of the pre-existing code, except for the parts instructed. It is important to keep the posting lists as they are - DO NOT shorten them. As I only have a limited number of characters, i have added a file that contains a more detailed job description, which examples, as well as a screenshot of what the result should look like. Please read the instructions carefully first and have a look at the screenshot before bidding. It is of great importance to follow the instructions (e.g. NOT using libraries for certain parts) This task should not be too much trouble for a skilled developer. Here is the rough outline of what needs to be done: - the tokens need to be stemmed, using snowballstemmer for German. It MUST be done using a separate function, do not stem in the same function as tokens are counted. I have noted in the code where to add this part. Stemming has also to be done in the queries. So, for example, if you type in "eating" in the queries (both inverted index AND cosine similarty), anything starting with "eat" should be printed out. - tf-idf needs to be calculated. MOST IMPORTANTLY: you CANNOT use any libraries for this. So DO NOT use sklearn, tfidfvectorizer or anything like that. Each part (tf, idf, tfidf) needs to be calculated in a separate function. I have noted where to add these in the code as well. If you use a library like tfidfvectorizer, or anything else that does the same, I cannot accept the code. - cosine similary has to be calculated; also MUST be done using a function, NO libraries (No sklearn, etc.) it has to be calculated based on whatever is typed into a query, comparing to the texts in the corpus. This query has to be accessed using the main function by typing in "2" in the menu. (menu already implemented; please find the corresponding part in the main function to add the query) The user should be able to search for words and then see the cosine similarity, tf, idf, and the final tf-idf for the Top N (e.g. Top 10) ranked document names AND document IDs for each result (please view the screen shot for this) after choosing the option for tf-idf in the menu (menu already implemented, tf-idf is chosen by entering "2"), first, the overall top 10 results (or any other number) for tf-idf should be printed out; without a query (no cosine similarty in this, as it is used for queries only). it should look something like this: Documents: [id: name (|d|)] 0: text1, 1: text2, 2: text3,.... dictionary: [term: idf | (doc: tf), (doc: tf), (doc: tf),...] and then it should ask the user to type something into a query. the result should look something like this (using cosine similarity): Query: food Top 3 containing the queried word(s): filename1 (file ID, tf | idf) filename2 (file ID, tf | idf) filename3 (file ID, tf | idf) (please view the screenshot for details, you will understand what I mean) The user should be able to type in more than just one word, but it the texts don't have to contain every single one of the words typed in in order to appear in the results. the added screenshot, a commented screenshot, and the more detailed project description will give you more details. Please advice these if you need more information. I have also provided some of the texts I am working with. Please note that the code has to be as simple as possible, nothing too hard/fancy. And it should be quite fast as I have to go through almost 4000 texts. To test the query with the texts I provided, I recommend searching for "vater sohn" and see if cosine similarity works.
Project ID: 26972532

About the project

1 proposal
Remote project
Active 4 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
1 freelancer is bidding on average €490 EUR for this job
User Avatar
I have 3+ years of experience as a Python programmer and have worked on several Machine Learning projects mainly targeting the domain of Computer Vision and Digital Image Processing. Get effective Python programming / Machine Learning / Computer Vision / Deep Learning / Digital Image Processing / Algorithms & Design solutions
€490 EUR in 7 days
0.0 (1 review)
0.0
0.0

About the client

Flag of GERMANY
Birkenfeld, Germany
5.0
2
Member since May 30, 2020

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.