Conversation
|
we are using INTEGER as type for the db but saving floating numbers at least for rooms and size.
|
|
@strech345 why did you close the pr? |
i really dont know why. I only changed to draft. But dont know why i close it. |
|
So should I review it now or is this still a draft? |
Would be nice if you can read my comments here and give me some Feedback. Also you can Start reviewing it. I would only extend it like explained. |
Tbh, I don't see an issue with this...?
|
|
|
||
| function normalize(o) { | ||
| const size = o.size || '--- m²'; | ||
| const parts = (o.tags || '').split('·').map((p) => p.trim()); |
orangecoding
left a comment
There was a problem hiding this comment.
@strech345 I have not (yet) checked out your fork and ran it on my machine, bur so far I have 4 comments. Let me know when I should check out the whole thing
lib/FredyPipelineExecutioner.js
Outdated
| return listings.map(this._providerConfig.normalize); | ||
| return listings.map((listing) => { | ||
| const normalized = this._providerConfig.normalize(listing); | ||
| // TODO: every provider should return price, size and rooms in type number. This makes it more strong and strict of the provider output. String formats like "m², Zi,..." should not be part and can be added on fe or massages. Move this logic into the provider-specific normalize function. |
There was a problem hiding this comment.
Can you make sure this is done in the pr?
lib/FredyPipelineExecutioner.js
Outdated
| // like for kleinanzeigen we have tags (includes multiple fields) but will be than extract at normalize, and deleted because its only internal used. | ||
| // I would suggest that we define a standard list like (id, price, rooms, size, title, link, description, address, image, url) | ||
| // it might be that some of this props value is null, wich is ok without id, link, title | ||
| // Also this might be not needed when using typings with typescript. I would suggest to move the whole project to typescript to have save typings. |
There was a problem hiding this comment.
I don't like typescript for various reasons which would take way to long to explain here, but Fredy won't move to Typescript ;)
There was a problem hiding this comment.
thats ok, i now created new type files.
Would be nice to move more to type files so its on one place.
lib/FredyPipelineExecutioner.js
Outdated
| const keys = Object.keys(this._providerConfig.crawlFields); | ||
| // i removed it because crawlFields might be different than fields which are required. | ||
| // like for kleinanzeigen we have tags (includes multiple fields) but will be than extract at normalize, and deleted because its only internal used. | ||
| // I would suggest that we define a standard list like (id, price, rooms, size, title, link, description, address, image, url) |
There was a problem hiding this comment.
I get this, but I think what we then should do is letting each provider define their list of required params. A standard function that is available in every provider which returns an array of required keys..
There was a problem hiding this comment.
Can you please explain me why this is needed? only to understand this.
it still only checks the keys (not the value), so its only about what is added in code.
i changed now the returned obj for every normalize fnc with a new obj like
return {
id,
link,
title: o.title || '',
price: extractNumber(o.price),
size: extractNumber(o.size),
rooms: extractNumber(rooms),
address: `${ciity}, ${road}`,
image,
description: o.description,
};
so its more clear what the provider really returns. I also add for all of this functions the type and add a jsonconfig to activate type checks.
lib/FredyPipelineExecutioner.js
Outdated
| // it might be that some of this props value is null, wich is ok without id, link, title | ||
| // Also this might be not needed when using typings with typescript. I would suggest to move the whole project to typescript to have save typings. | ||
| //const keys = Object.keys(this._providerConfig.crawlFields); | ||
| const keys = ['id', 'link', 'title']; |
There was a problem hiding this comment.
We should not have a fixed list in here.
|
@strech345 You probably want to merge Master into your branch as I have just pushed a pretty big change. (There's a new migration as well, so you must increase the number of yours) |
|
Ok, will take some time this week is full 😉 |
whats about migration |
Which is why I asked you to increase your migra to 12 |
ok i can rename it. Only to make the point clear: Boath migrations are right now on master. |
Fuck you're right. I missed it. Please pull from master https://github.com/orangecoding/fredy/releases/tag/20.0.5 |
|
Hey @strech345 this pr is already massive. I strongly suggest IF you want to get this merged not to make this any bigger as I also have to review it. Also make sure you keep the scope ;/) |
|
from my
from my side its done and ready for review ;-) |
|
@claude review |
orangecoding
left a comment
There was a problem hiding this comment.
@strech345 I've added a few comments. What concerns me is you removed the delete function from the filter
| } | ||
|
|
||
| // filterByJobSettings: when true, filter listings by spec_filter in job settings | ||
| if (filterByJobSettings === true) { |
There was a problem hiding this comment.
Mhm.. instead of having to filter out this (I assume this has to be done, because you store the data to not scrape it many times), how about scraping it and if it hits something from the filter, simply set it to deleted. This way, it won't show up, you do not have to filter it here and you would save a lot of code...
If you pull master from my branch, you'll see how I did it in e.g. in _filterByArea:
if (filteredIds.length > 0) {
deleteListingsById(filteredIds);
}
There was a problem hiding this comment.
i thought you still want to show all in the fe list. Also it would be possible to change the filter values and the listings would be dynamic filtered on the new values.
But i can change it and mark them as deleted. with this i can remove the fe changes.
|
|
||
| if (filteredIds.length > 0) { | ||
| deleteListingsById(filteredIds); | ||
| return filteredListings; |
There was a problem hiding this comment.
Why did you remove the deleteListings here? That was added on purpose to not scrape things infinite?
There was a problem hiding this comment.
might be happend with merge. i dont realize that you did changes here.
i will fix it.
| throw new NoNewListingsWarning(); | ||
| } | ||
| const sendNotifications = notify.send(this._providerId, newListings, this._notificationConfig, this._jobKey); | ||
| // TODO: move this to the notification adapter, so it will handle for all providers in same way. |
There was a problem hiding this comment.
Can you resolve this todo before we proceed?
There was a problem hiding this comment.
i dont did it because it means changes on all notification files. Also this feels not correct. i'm not shure where to put it. Maybei can put it into the notify fnc?
There was a problem hiding this comment.
ok I get it. leave it as it is for now :)
|
|
||
| it('should call deleteListingsById when listings are filtered by area', async () => { | ||
| // TODO: fix this test | ||
| it.skip('should call deleteListingsById when listings are filtered by area', async () => { |
There was a problem hiding this comment.
i disabled the test. dont unsterand how area filter should effect deleteListingsById because delete is called before. Would be nice if you can check this.
| expect(notify.link).toContain('https://www.immobilienscout24.de/'); | ||
|
|
||
| // check if there is at least one valid notification | ||
| const hasValidNotification = notificationObj.payload.some((notify) => { |
There was a problem hiding this comment.
i changed it because some values might missing depending on online listings. Would say other should also work the same to reduce test failures.
|
|
||
| const filteredListings = listings | ||
| // this should never filter some listings out, because the normalize function should always extract all fields. | ||
| .filter((item) => requiredKeys.every((key) => key in item)) |
There was a problem hiding this comment.
like you told me i add it, without knowledge for what.
lib/FredyPipelineExecutioner.js
Outdated
| // like for kleinanzeigen we have tags (includes multiple fields) but will be than extract at normalize, and deleted because its only internal used. | ||
| // I would suggest that we define a standard list like (id, price, rooms, size, title, link, description, address, image, url) | ||
| // it might be that some of this props value is null, wich is ok without id, link, title | ||
| // Also this might be not needed when using typings with typescript. I would suggest to move the whole project to typescript to have save typings. |
There was a problem hiding this comment.
thats ok, i now created new type files.
Would be nice to move more to type files so its on one place.
| } | ||
|
|
||
| // filterByJobSettings: when true, filter listings by spec_filter in job settings | ||
| if (filterByJobSettings === true) { |
There was a problem hiding this comment.
i thought you still want to show all in the fe list. Also it would be possible to change the filter values and the listings would be dynamic filtered on the new values.
But i can change it and mark them as deleted. with this i can remove the fe changes.
|
|
||
| if (filteredIds.length > 0) { | ||
| deleteListingsById(filteredIds); | ||
| return filteredListings; |
There was a problem hiding this comment.
might be happend with merge. i dont realize that you did changes here.
i will fix it.
| * @returns {ParsedListing[]} Listings considered unique enough to keep. | ||
| */ | ||
| _filterBySimilarListings(listings) { | ||
| _deleteSimilarListings(listings) { |
There was a problem hiding this comment.
will than also rename this back. before your changes all filters looks like filters and this not, so i renamed it. now every filter is working with delete so i will rename it back
| throw new NoNewListingsWarning(); | ||
| } | ||
| const sendNotifications = notify.send(this._providerId, newListings, this._notificationConfig, this._jobKey); | ||
| // TODO: move this to the notification adapter, so it will handle for all providers in same way. |
There was a problem hiding this comment.
i dont did it because it means changes on all notification files. Also this feels not correct. i'm not shure where to put it. Maybei can put it into the notify fnc?
|
@strech345 I loose track of the comments in here. Once you adressed all pr comments can you ping me please to check it one more time? Thanks :) |
listings can be filtered by specs