I have vector sets, of the format:
wordA|wordB|wordC; 0.8|0.5|0.3
wordA|wordC|wordD; 0.7|0.3|0.1
wordA|wordE|wordF; 0.9|0.2|0.2
wordC|wordE|wordF; 0.3|0.3|0.1
(and 8 more)
The numbers are relevance scores for each word.
The goal is to reduce/cluster this into fewer sets. e.g., to 2 sets.
There are existing methods for clustering, e.g., cosine similarity or
[login to view URL]
Your input is a textfile similar to the above (my file format is slightly more complex and larger). The output should include a measure of similarity. i.e., a way to know when to stop clustering. In some cases, we will reduce to 8 clusters, in others we cluster down to 1-2 clusters.
If you are familiar with nltk or similar, this should be a simple task.
Thanks.
Hello Sir. I have a really strong understanding in NLP. In addition, I have worked with nltk for quite a long time. Therefore, I totally can handle this task.
$55 USD in 3 days
0.0 (0 reviews)
0.0
0.0
2 freelancers are bidding on average $78 USD for this job