Jump to content

Talk:Spam blacklist/Archives/2020-07

From Meta, a Wikimedia project coordination wiki
Latest comment: 3 years ago by Martin Urbanec in topic Proposed additions

Proposed additions

This section is for completed requests that a website be blacklisted

toiletsandbaths.com



Cross-wiki spam, see COIBot report, same spam farm as w:en"MediaWiki_talk:Spam-blacklist#Bunch_of_sneaky_spam-via-refs (Vermont already globally blacklisted several sites from that farm) GeneralNotability (talk) 01:29, 10 July 2020 (UTC)

@GeneralNotability: Added Added to Spam blacklist. --DannyS712 (talk) 01:30, 10 July 2020 (UTC)

youtube.com/redirect



urls such as

https://www.youtube.com/redirect?q=de.wikipedia.org

or

 https://www.youtube.com/redirect?q=%64%65%2e%77%69%6b%69%70%65%64%69%61%2e%6f%72%67

can be used to circumvent the SBL. so i propose to blacklist

Regex requested to be blacklisted: youtube\.[a-z]+/redirect

-- seth (talk) 21:32, 13 July 2020 (UTC)

@Lustiger seth: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 07:52, 14 July 2020 (UTC)

rentalcarsuae.com



blacklisted on en.wikipedia by User:Ohnoitsjamie, but this is everywhere. --Dirk Beetstra T C (en: U, T) 18:17, 16 July 2020 (UTC)

@Ohnoitsjamie: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 18:18, 16 July 2020 (UTC)

globalgistng.com



See COIBot report, it's a small "news website" (probably more like a blog) that looks like it's getting pushed by COI editors on multiple wikis. GeneralNotability (talk) 14:41, 17 July 2020 (UTC)

@GeneralNotability: Added Added to Spam blacklist. --Martin Urbanec (talk) 14:43, 17 July 2020 (UTC)

grimmstories.com and andersenstories.com





The owner placed over 100 Links on wikipedia.org (see also here and here).

- The website is full of advertising (please deactivate your adblocker before visiting the site)

- The website ask users for donations, although it has more than 100,000 visitors per month

- The website deceives users, in which the website operator pretends to be poor

- The website translates Grimm texts automatically with Google Translate and markets the texts via Google Adsense

- There were many Wikipedia links where the website was cited as the source, although it was not a source of information from the Wikipedia article.

- The website has no imprint and no contact page and therefore violates laws

- The website does not have a privacy statement as required in the EU.

— The preceding unsigned comment was added by Tuniae (talk) 16 Jul 2020 16:31 (UTC)

  • @Tuniae and Wutsje: Looking at the links, and some have been in place for many years. Looking at alsWP, it would seem that they were part of article creation, so I am guessing that they are from articles elsewhere at the WPs. Maybe the site has changed through the years and become less wholesome from a Wikipedia sense. As the site has been used for a long period of time, and I cannot see it on community blacklists (from a quick check), I would like to leave the request open until we can have more community input about what we should do with these sites.  — billinghurst sDrewth 00:59, 17 July 2020 (UTC)
@Billinghurst:: Fine with me. I was already wondering why two (?) single issue editors would suddenly pop up and cross wiki try to get links to this site removed. Wutsje (talk) 01:05, 17 July 2020 (UTC)
@Wutsje: 🤷  — billinghurst sDrewth 01:17, 17 July 2020 (UTC)
@Tuniae, Wutsje, and Billinghurst: Tuniea is mass-removing the links form it.wiki too, ignoring the discussion in related projects. My opinion is that he is acting like a vandal, for some reason. The sites are useful, they have the full fairy tales the articles are about. --M&A (talk) 06:17, 17 July 2020 (UTC)
@M&A: This is not the place to discuss an editor's practices. You are best to talk to your local administrators. If there is demonstrated poor practices xwiki, then talk to the stewards at either SN or SRG.  — billinghurst sDrewth 06:36, 17 July 2020 (UTC)
@All: Example where the site is used as a source: https://en.wikipedia.org/w/index.php?title=The_Devil%27s_Sooty_Brother&type=revision&diff=967963767&oldid=967851627 The quality of the references seems to be truly outstanding. Tuniae Tuniae 09:16, 17 July 2020 (UTC)
@All: And another Example how the Spam of this site works. https://en.wikipedia.org/w/index.php?title=Rundetaarn&type=revision&diff=967966097&oldid=967777191 Andersonstories.com as a source for an architectural building? Tuniae Tuniae 09:23, 17 July 2020 (UTC)
@Billinghurst: of course it is not; what I say it's that the full stories in these sites are useful for the articles. --M&A (talk) 09:46, 17 July 2020 (UTC)
@Tuniae, Wutsje, Billinghurst, and M&A: OK, so there are places where it is an inappropriate source, then that should be improved (better discussed on the talkpage of that article). The rest of the links seem appropriate. Moreover, there is no evidence presented that this was spammed, rather mainly used in good faith.  Declined. --Dirk Beetstra T C (en: U, T) 10:30, 17 July 2020 (UTC)

Group of spammed domains









See w:en:Wikipedia:Sockpuppet_investigations/Philip_Adrian. This appears to be an ongoing refspam campaign across multiple wikis (see the contribs of the users identified in the SPI I linked). I recognize that the COIBot report shows good-faith addition of these sites, but I've looked at them - these websites appear to be piles of ads surrounding blog posts. Anything found on them can almost certainly be found somewhwere more reliable. The last entry hasn't been spammed cross-wiki yet, but since the sockpuppet investigation linked the user spamming that website to others in this group I expect that we'll see cross-wiki spam sooner or later. GeneralNotability (talk) 21:31, 25 July 2020 (UTC)

@GeneralNotability: Added Added to Spam blacklist. --Dirk Beetstra T C (en: U, T) 21:43, 25 July 2020 (UTC)

besthouseion.com



Cross-wiki spam by multiple accounts, see COIBot report. GeneralNotability (talk) 18:41, 28 July 2020 (UTC)

@GeneralNotability: Added Added to Spam blacklist. --DannyS712 (talk) 19:25, 28 July 2020 (UTC)

solarpanelcostprice.com.au



Spammed at multiple wikis, by multiple sockpuppets. See Special:CentralAuth/Spcpsb, Special:CentralAuth/Spcp001, Special:CentralAuth/Shubh.fswa and others. --Martin Urbanec (talk) 15:45, 31 July 2020 (UTC)

@Martin Urbanec: Added Added to Spam blacklist. --Martin Urbanec (talk) 15:45, 31 July 2020 (UTC)

Proposed removals

This section is for archiving proposals that a website be unlisted.

mywikibiz

Pretty sure \bmywikibiz\.com/(?:0000|2356|Rocky_Marciano|User:(?:Books|Boxstuf))\b can be removed since \bmywikibiz\.com\b is already blacklisted in its entirety. Leaving for someone else for a second pair of eyes in case I misunderstood --DannyS712 (talk) 05:56, 18 July 2020 (UTC)

  1. [global] \bmywikibiz\.com\b (mywikibiz.com )
  2. [global] \bmywikibiz\.com/(?:0000|2356|Rocky_Marciano|User:(?:Books|Boxstuf))\b (mywikibiz.com/(?:0000|2356|Rocky_Marciano|User:(?:Books|Boxstuf)) )
  3. [w:en (bl)] \bmywikibiz\.com\/Directory:Logic_Museum\b (mywikibiz.com/Directory:Logic_Museum )
  4. [w:fr (bl)] \bmywikibiz\.com (mywikibiz.com )
  5. [w:ast (bl)] \bmywikibiz\.com\/Directory:Logic_Museum\b (mywikibiz.com/Directory:Logic_Museum )
  6. [w:ast (bl)] \bmywikibiz\.com\b (mywikibiz.com )
  7. [w:bs (bl)] \bmywikibiz\.com\/Directory:Logic_Museum\b (mywikibiz.com/Directory:Logic_Museum )
  8. [w:sq (bl)] \bmywikibiz\.com\/Directory:Logic_Museum\b (mywikibiz.com/Directory:Logic_Museum )
  9. [w:ta (bl)] \bmywikibiz\.com\/Directory:Logic_Museum\b (mywikibiz.com/Directory:Logic_Museum )
  10. [w:ur (bl)] \bmywikibiz\.com\/Directory:Logic_Museum\b (mywikibiz.com/Directory:Logic_Museum )
  11. [b:en (bl)] \bmywikibiz\.com\b (mywikibiz.com )
  12. [s:en (bl)] mywikibiz\.com\/\d{4} (mywikibiz.com/0{4} ) now deleted at enWS
  13. [wikiversity:en (bl)] mywikibiz (mywikibiz )
  14. [commons (bl)] \bmywikibiz\.com/(?:0000|2356|Rocky_Marciano|User:(?:Books|Boxstuf))\b (mywikibiz.com/(?:0000|2356|Rocky_Marciano|User:(?:Books|Boxstuf)) ) now deleted at Commons

The term 'wikibiz' found in 14 rules.

Removed removed regex covering subpages  — billinghurst sDrewth 07:32, 18 July 2020 (UTC)
@Billinghurst: where did you get that report from? --DannyS712 (talk) 09:12, 18 July 2020 (UTC)
@DannyS712: that is a feature of COIBot on IRC (findrules <regex> matches the regex to the rules on the blacklists. --Dirk Beetstra T C (en: U, T) 07:45, 19 July 2020 (UTC)

kurumaerabi.com



  • Regex requested to be blacklisted: \.kurumaerabi\.com

I want to use this site as a source. I would like to know the reason why it is certified as spam even though it is a general site. --HaroHaroRilakkuma (talk) 07:02, 21 July 2020 (UTC)

@HaroHaroRilakkuma: this seems to have been blacklisted a very long time ago and not have been properly logged (I can only assume that this has been a problem a long time ago). It does not appear like there has been any abuse over the last 8+ years that I can see in the LiWa3 database (see User:COIBot/LinkReports/kurumaerabi.com). Removed Removed from Spam blacklist. Dirk Beetstra T C (en: U, T) 08:16, 21 July 2020 (UTC)
@Beetstra:Thank you very much. I will continue to post Wikipedia articles in the future. --HaroHaroRilakkuma (talk) 09:27, 21 July 2020 (UTC)

Cinebible.com



Hi, A section called Popular Manhwas was added to the wiki/manhwa. The section was kept however the cinebible.com was put into blocklist. If I have violated some rules, I was completely unaware of it. I am still new here and learning things Kindly request you remove it from the list. I will be more careful from now. — The preceding unsigned comment was added by Elliot wk (talk)

@Elliot wk: I am guessing that it has been added as the repeated addition of an unreliable source though you would need to hear from @DannyS712:. Can I point you to w:WP:Paid editing and w:WP:Conflict of interest as we would expect compliance to those policies if editing at the Wikipedias.

I don't understand why my fellow admin didn't give you a warning about your activity, I would have thought that would have been a more appropriate response.  — billinghurst sDrewth 15:14, 19 July 2020 (UTC)

@Billinghurst: I apologize if I have done something wrong and will be more careful henceforth. Please kindly help me remove the site from the list. I am still new to all these. I tried contacting @dannyS712 but I am not sure how to do that. I will read all the polices in w:WP:Paid editing properly before contributing further.—The preceding unsigned comment was added by Elliot wk (talk)

Removed Removed per discussion on my talk page --DannyS712 (talk) 20:00, 20 July 2020 (UTC)
@DannyS712: Please do not have removal conversations on your talk page. This is the community page for the discussion and the recording of actions of administrators.  — billinghurst sDrewth 21:57, 20 July 2020 (UTC)
I wasn't planning on it being a removal conversation, but once I removed it I then saw this discussion. Should I copy the discussion to here? Apologies --DannyS712 (talk) 22:01, 20 July 2020 (UTC)
For this occasion, let us just add a permalink.  — billinghurst sDrewth 23:44, 20 July 2020 (UTC)
permalink to talkpage discussion. --Dirk Beetstra T C (en: U, T) 07:43, 21 July 2020 (UTC)

linseis.com



Hello, recently, I edited the article for differential scanning calorimetry. As I noticed there are some links to the manufacturers and I also wanted to add links for the missing companies (linseis and wsk). But because of a blacklist entry back from 2011 I wasn‘t able to add the link for linseis. Considering that the entry is going back for such a long time now and linseis seems to be a reasonable company on that topic I would ask if the deletion of the entry may be possible. Thank you. 2A02:8108:54BF:DAE6:79C7:23DD:4D05:1AA6 08:55, 27 July 2020 (UTC)

Deferred Deferred to English wikipedia whitelist (see w:MediaWiki talk:Spam-whitelist) to request local whitelisting --DannyS712 (talk) 10:01, 27 July 2020 (UTC)
 Declined, seriously, the problem is the other links that are there, they should not be there in the first place, let alone that we have to add other ones. Going there now. Dirk Beetstra T C (en: U, T) 10:25, 27 July 2020 (UTC)
For more info, see en:WP:SPAMHOLE. I have removed all for which I could not find an en.wikipedia article, according to en:WP:LISTCOMPANY. --Dirk Beetstra T C (en: U, T) 10:30, 27 July 2020 (UTC)

Troubleshooting and problems

This section is for archiving Troubleshooting and problems.

Discussion

This section is for archiving Discussions.

IP-address spammer

links


Already done





Already done



Already done



Already done

spammers








Noticed the IP spammer adding the IP-link, which leads to the spammers of calenderdayo.com. I have therefore also included the IP of calenderdayo.com in this request. --Dirk Beetstra T C (en: U, T) 09:02, 2 July 2020 (UTC)

rheingoenheim-info.de



i don't understand the reason for blacklisting. at User:COIBot/XWiki/rheingoenheim-info.de there only 2 selected additions. -- seth (talk) 10:22, 11 July 2020 (UTC)

@Lustiger seth: hijacked, now lands at dolabuy.ru It will have been spambot hits  — billinghurst sDrewth 12:26, 11 July 2020 (UTC)
the problem is that now archived links to the old content are blacklisted, too.
the question is: has there been spamming with that url? the old link additions should not count as spam. -- seth (talk) 15:06, 11 July 2020 (UTC)
@Lustiger seth: why do you not use the whitelist for that, whitelist the whole link with appropriate intermediate .*? inbetween. --Dirk Beetstra T C (en: U, T) 16:07, 11 July 2020 (UTC)
hi!
i use the whitelist, if it is necessary. but in this case i still don't see the necessity for the blacklisting of the domain, because i don't see evidence for spamming.
imho blacklisting this domain is counterproductive, because it prevents people from fixing links that don't work any longer. -- seth (talk) 16:31, 11 July 2020 (UTC)
@seth This Link is only in article from deWiki this Spam blacklist is for all projects. I have looked at it, the entry is correct Please use the whitelist in deWiki.--𝐖𝐢𝐤𝐢𝐁𝐚𝐲𝐞𝐫 👤💬 17:05, 11 July 2020 (UTC)
i know where i am. :-)
links should be added to the global blacklist, if there is spamming across several wikis. would somebody please show me the evidence of spamming with this link in any wiki? -- seth (talk) 17:21, 11 July 2020 (UTC)
@Lustiger seth: redirect sites are also added. One case of abuse is enough, and for regular redirect sites they are even added preemptive. It may hamper archived links, but that is easily solved with a proper whitelisting using a lookahead.
In this case, one edit introduced several of these redirect sites which are all going to the same one target. The page where that was added was deleted as spam, the user who added it was globally locked. It is more than reasonable to blacklist the links used in that blatant abuse. So yes, there was spam. --Dirk Beetstra T C (en: U, T) 17:49, 11 July 2020 (UTC)
i'm not sure, whether i understood it correctly. so please confirm (or correct).
there was exactly one occurence of spamming on a single page (that has been deleted already) and as the new content of that website is obviously worthless, the domain was blacklisted globally? -- seth (talk) 18:03, 11 July 2020 (UTC)
i just made now a short search across the largest wikipedias for all blocked link additions since 2013.
zhwiki: 0
svwiki: 0
dewiki: 13
 20200520055842, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Fritz_Seitz
 20200520055859, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Fritz_Seitz
 20200520055955, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Hamburgum; Fritz_Seitz
 20200520060035, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Hamburgum; Fritz_Seitz
 20200520060100, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Fritz_Seitz
 20200520063237, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Fritz_Seitz
 20200520063242, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Fritz_Seitz
 20200520063252, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Fritz_Seitz
 20200630133613, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Dr Lol; Wilhelm_Caroli
 20200630133731, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, Hamburgum; Wilhelm_Caroli
 20200630140834, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Wilhelm_Caroli
 20200630140839, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Wilhelm_Caroli
 20200630140850, www.rheingoenheim-info.de/index.php/geschichten/91-das-schicksal-von-pfarrer-wilhelm-caroli, CamelBot; Wilhelm_Caroli
cebwiki: 0
viwiki: 0
warwiki: 0
enwiki: 0
nlwiki: 0
ukwiki: 0
ruwiki: 0
ptwiki: 0
plwiki: 0
frwiki: 0
jawiki: 0
itwiki: 0
eswiki: 0
this verifies what i thougt and said already: there is no prevention of spam by this blacklist entry. but several users were blocked from updating old links. i think blacklisting in such cases is counterproductive because it annoys the good guys.
i whitelisted the complete domain rheingoenheim-info.de at dewiki now, because there is no evidence of spamming anywhere. -- seth (talk) 18:45, 11 July 2020 (UTC)
@Lustiger seth: the spam was an edit that was performed. Seeing what was done there, and seeing that both the page was deleted as spam, and that the editor was blocked for spam IS clear evidence of spam, and that was why it was rightfully globally blacklisted. If you want to take the effort of cleaning up any future spam on de then that is your right, but I would, again, suggest to only whitelist the, likely, very few archive links. I really do not see why you have to allow a link that was already spammed and is not what you are ever going to link to, but whatever. —Dirk Beetstra T C (en: U, T) 19:07, 11 July 2020 (UTC)
I see I missed an answer from you (my apologies): yes, but spammers generally do not stop at one attempt. That that happened here is not a reason to suggest that blacklisting such links should not be done at first observation. We are not here to play whack-a-mole. —Dirk Beetstra T C (en: U, T) 19:10, 11 July 2020 (UTC)
Or, what that basically suggests: you think that we should first remove a link that is totally and obviously crap 10 times, then wait until it reappears and clean up again before we should blacklist the crap? And then, because local admins override the blacklist completely, we have to monitor and cleanup that crap again? Please, just whitelist the exact links you need, even if you are right on this occassion. —Dirk Beetstra T C (en: U, T) 19:18, 11 July 2020 (UTC)
  • if one wants to use {{Internetquelle |url=https://example.org |titel=foo |abruf=2020-07-12|archiv-url=https://example.org/archive}} the result would be foo. Archiviert vom Original; abgerufen am 12. Juli 2020.
    this means: if just the archived url is whitelisted, the edit would stell be blocked. -> users get annoyed -> solution: don't block the domain.
  • in my opinion the SBL-procedure (in cases such as this) is inconsistent. sometimes we delete tons of old entries from the SBL for several reasons (less false positives, better performance, better overview, no visible benefit in blocking, ...) if nobody tried to add those links since they were blacklisted. in our case it's even worse: no spammer tried to add links to rheingoenheim-info.de since it was blacklisted, but 3 different normal users tried to fix links and their edits were blocked several times. -> users get annoyed -> solution: don't block the domain.
  • of course we shall prevent spamming in wikipedia, but we should also prevent false positives. in the case of rheingoenheim-info.de we have: 13 false positive blocks, and 0 correct positive blocks. -> users get annoyed -> solution: don't block the domain.
there was spamming in a single case (before the domain was blacklisted). the user got blocked, the article got deleted. that should be enough. maybe a local blacklisting could make sense, if there are no other links to the domain in the same wiki. but it is not reasonable to globally blacklist the domain in such cases. -- seth (talk) 23:18, 11 July 2020 (UTC)
  • Comment Comment@Lustiger seth: The only reason that it will have come to to my attention will have been due to spambots trying to add it. (Did intimate that in my previous answer) I saw it, tested the link, and got redirected. As it is not on the report it would seem that they have been caught by other aspects of a filter, though that usually means it is a matter of time.  — billinghurst sDrewth 00:55, 12 July 2020 (UTC)
    The expectation that I can future guess how often a hijacked domain name is going to be abused into the future by spambots is not reasonable. I have better things to do than whack-a-mole If it was a general user abusing, sure; but spambots? I don't find it reasonable that a hijacked and abused domain should be protected forever to maintain a dead link. There needs to be a middle road, not leaving ourselves open to abuse.

    I agree that it is the use of a big hammer, but the issue is the better spam management and stopping the bots coming in, so we don't have to use the blacklist so readily and regularly.

    Action: What I will look to do where I identify a hijacked domain, and it is still in use, then I will try to remember to put a note on a local mediawiki:spam-whitelist page for that community to work out how to handle — billinghurst sDrewth 01:11, 12 July 2020 (UTC)

(edit conflict)
... in this case. Generally you see other editors trying it on other pages. Spam continues. People have to clean it up -> users get annoyed -> blacklist it on first sight.
... in this case. Generally there are no genuine additions afterwards. Whitelist only the archived link, the original does not need to be linked -> readers of Wikipedia will click it (I do, I always first go to the original!) the spammers will still profit and the user will be annoyed (or worse if there is now something malicious) -> readers get annoyed -> blacklist it on first sight.
So, now you have enabled it for this domain. That Russian website gets the IP of hundreds of annoyed readers of the German Wikipedia. Can now happily install popups that these readers accidentally install (yeah, you can swap ‘ok’ and ‘cancel’ to get your succes). You can install phishing scrips. You now may already have those phishing scripts running on your computer since you tried today to follow the original to check the website. And even if this Russian website does not do something malicious, you are annoying hundreds of innocent readers every week/month/year who click first ‘the original’ and do not get what they expected. Instead of just two (unless a bot can be ‘annoyed’)
So no, blocking an editor and/or reverting/deleting the page is hardly ever enough, and local blacklisting is not a solution either. Global blacklisting is needed and all malicious links should be removed (or better, replaced with an archive link), making sure that no innocent readers get ‘harmed’ by following the original. Believe me, it is worth annoying 2 or 3 genuine editors over (and generally a handful of spam fighters). Please solve that situation properly. —Dirk Beetstra T C (en: U, T) 01:16, 12 July 2020 (UTC)
hi!
i whitelisted the domain at dewiki, so that i could easily fix all remaining occurrences in dewiki. if the domain hadn't been blacklisted, then probably other wikipedians would have fixed the links already moths ago (and would have reduced the probability that somebody clicked on the misleading links).
now i keep it whitelisted, because the empirical probability that a spammer will use the domain is about 0 (see above). it is more likely that somebody will add a completely new spam domain that is and was not blacklisted. as i said already, it's similar as with the zero-hit entries that we delete sometimes. it's inconsistent to keep entries such as this.
nevertheless, i can't fix the removal at ptwiki with the text https://web.archive.org/web/20130409232839/http://www.rheingoenheim-info.de/index.php/weg-durch-die-zeiten2/die-roemer Rufiniana], Die Römer, Antigo sítio de Rheingönheim (alemão), because I would have to ask for local whitelisting there. no, i won't. that's just too much and stupid work that's just not necessary. and i won't do that for other wikis, too.
"Please solve that situation properly": a proper solution would be the removal of the global blacklist entry for there is no single spamming hit (but 13 blocked useful hits) since it was added almost one year ago.
well, ok, i guess, we won't get a step forward this way just (repeating our positions).
let's try to be constructive.
you said, you don't want to play whack-a-mole. so in my opinion it would be good, if the bot generated reports would contain the information on deleted spam.
at the start of this thread we had:
fortunately now User:COIBot/XWiki/rheingoenheim-info.de and User:COIBot/LinkReports/rheingoenheim-info.de are better. but still it is not easy to see, how much of the listed edits should be considered spamming. the only possible spamming edit could be "2019-08-28 11:07:12: wikt:chr:User:CesarMcinnis". but it takes several minutes to see/guess this.
so could the output be further improved such that it is easier for everyone to see, what was spam and what was useful?
is it possible to manually add users to the bot's whitelist? user:Boshomi fixes a lot of links and is the opposite of a spammer. same with User:ⵓ and user:Dr Lol.
in cases where there are more useful edits than spamming edits, there should be more than just one spamming edit, before globally blacklisting the whole website. it would be nice, if there would be something such as an alert if two or more different spamming users added links to the same (spamming) website. this could be a help for everyone (less work for meta admins, less annoying blocked edits for users, less annoying discussions with me). what do you think? -- seth (talk) 22:03, 12 July 2020 (UTC)
@Lustiger seth: There is a phabricator ticket that proposes for a right to edit around the spam blacklist, you may wish to comment there, there is no existing right or ability. I would still argue that better ability to stop the spammers getting through the front door, rather than having all the defences for once they are inside.  — billinghurst sDrewth 23:31, 12 July 2020 (UTC)
@Lustiger seth: my point in repairing was that the links to the original material should be removed, and ONLY the archive links should be in the documents. People will follow original links (as I said, I want to see original documents, only if they are not available I will use the archive), I will agree that now empirical the spamming stopped, and that we could remove it globally, but as it is currently redirecting, I would still prohibit all new additions, bad faith ànd good faith to protect the readers. You do not know where you are sending people, people do not know where they are going when clicking the link in good faith. The whitelisting should hence not be \brheingoenheim-info\.de\b, it should be \barchive\.org\/.*?rheingoenheim-info\.de\b
Regarding making the reports more clear on what is spam and what not: that is a purely human evaluation. The reports of the bot are just reports for analysis. The bot cannot distinguish that.
I should work on making whitelisted users a list on-wiki (e.g. user:LiWa3/UserWhitelist), the only way at the moment is that they have 'given rights' on a wiki, or manually on IRC (and the latter functionality is not optimal in itself).
I have for long been fighting (and will again on next possibility) to have a complete, total overhaul of the spam blacklist. The functionality is completely wrong, it is too black-and-white, it is a complete sledgehammer approach. That was already recognised 13 years ago, but WMF is utter oblivious and prefers to enforce other. A proper implementation would allow for a global whitelist, where we could override the blacklisting for specific cases. Then you can also whitelist specific official domains which are globally blacklisted. Even better, you could whitelist for use on specific pages, specific wikiflavours (wikiversity and wikitravel have other requirements), you could set levels of blocking (only new editors, or only allow admins to circumvent), etc. etc. But WMF ... ignores proper requests. --Dirk Beetstra T C (en: U, T) 06:38, 13 July 2020 (UTC)
hi!
thanks for your answers and your patience (and thanks for continuously pinging me; if i shall do the same, please give me a note; i thought, it's not necessary, because you are on this page anyway several times a day.). sorry for beginning with some beating about the bush now, but i hope this will make things more clear:
here, i distinguish between several types of editors: A) users, some of who might not even know what the SBL is (e.g. newbies or technical not so interested users), B) users with enough technical skills to understand the sbl sufficiently.
if somebody of group A is confronted with a SBL block, they might understand the message and adopt their edit. but many members of group A might be confused so much, that they imediately leave the page (and the wikipedia) or they try different variations of their edit before they give up. we can see this via the sbl log and even better at the edit filter log. it's alarming that sometimes very good large edits were blocked just because the user could not cope with a filter or the sbl.
if a user of group A tries to fix a link, using the above mentioned template:Internetquelle and get's (again) a warning, this user might get overtaxed. the link change would be an improvement (although the original link to the new content is still reachable with this template), but could not be saved.
users who fix/unlink disallowed links are normally members of group B. for them it does not make a difference, whether a disallowed url is a normal link of part of template:Internetquelle (or template:webarchive or template:webcite ...). (almost) all disallowed links can be found via linksearch.
@billinghurst: the right to work around the blacklist would probably given to members of group B. but that would not solve the problems of group A.
@Dirk Beetstra: yes, template:webarchive is better than template:Internetquelle in this case, because the original url is not shown in template:webarchive. but for a user of group A this might be too complicated. however, a group B guy who fixes disallowed links could replace the template afterwards.
\barchive\.org\/.*?rheingoenheim-info\.de\b: this would prevent other archive websites such as archive.today. so it might better be archive\.[a-z]+\/.*rheingoenheim-info\.de. this would prevent other archive websites such as webcitation.org. so it might better be (?:webcitation\.org|archive\.[a-z]+)\/.*rheingoenheim-info\.de. ...
apart from being a bit complicated, this solution still does not cope with problems of group A.
what is spam and what not: of course, humans have to decide. but it should be comfortable for them. if a link is added mostly by whitelisted guys, this is a strong indicator for not being all-time spam. thus such cases (as we have one here) need a special treatment.
user:LiWa3/UserWhitelist: yes, that would be great!
too black-and-white: i agree. in dewiki, we switch to the edit filter with warnings in some cases (although that tool has an overview problem). and in some cases we don't even block an url at all, but let a bot ask the link adder to remove the link. and if they don't remove the link, the article and link are written to a maintenance list, where experienced users cope with the entries. reason: wikipedia has a problem that many newbies leave the wikipedia, because it's to technical/complicated for them. we want to avoid that. -- seth (talk) 09:01, 13 July 2020 (UTC)
@Lustiger seth: Thanks. No need to ping me, unless you want a quick answer.
I understand that for the group A the problem is often too complex. That is often the case, you see that especially with youtu.be vs. youtube.com and the google.com/cse, -amp and -url where people don't understand why we blacklist the link and what to do to avoid it. It, unavoidable, blocks a lot of good-faith material.
Edit-filters for spam fighting is often too intensive on the server, though it gives much more flexibility. It can be done, but you in the end either run into a massive number of regexes in one rule, or a massive number of rules (with the latter likely more preferred, so you can also tailor material to the warning). In en.wikipedia we have XLinkBot as a soft approach to blacklisting.
I have suggested a form of edit filter, but not with the code but having there a plain regex (similar to the blacklist) that is tested against the added external links. See User:Beetstra/Overhaul_spam-blacklist. That would have give flexibility in what to do when a link gets added (hard block, warning and block, just warning), which namespaces (allowing petitions on talkpages is not an issue, we just keep them out of mainspace), or even (for wiki-specific implementation) which pages to allow the link on (no pornhub.com anywhere except on en:Pornhub), allowing admins to just add links (except for e.g. copyright violation stuff), etc. But well .. we wait and wait.
Porting the whitelists and similar to on-wiki is on my list: User:COIBot/Wishlist. Unfortunately no time. --Dirk Beetstra T C (en: U, T) 12:44, 13 July 2020 (UTC)
btw: is COIBot open sourced? -- seth (talk) 22:11, 14 July 2020 (UTC)
@Lustiger seth: I have never further shared it. It is now a massive mess of code that would need a massive cleanup (as would LiWa3) as different modules all have their own different procedures to basically do the same thing, and some parts are even done in different ways in the same code. Similar is true for the database (again, if only WMF would have done that 10 years ago). If only had time to set this all up and clean up the code. --Dirk Beetstra T C (en: U, T) 06:39, 15 July 2020 (UTC)
imho it's ok to open-source a large heap of rubbish. :-) other people can help to make it clean and tidy. i could try to help. -- seth (talk) 08:35, 15 July 2020 (UTC)

thenewyorkcityminute.com now redirects



Can I get comment from others about this site that is now somewhat hijacked to what will be a problematic set of content outside of the original site. It has some pre-existing valid links and would like direction whether we watch / leave / block. Thanks.  — billinghurst sDrewth 01:08, 22 July 2020 (UTC)

@Billinghurst: as earlier discussed for a similar case below, my recommendation would be to replace all existing links with an internet archive link to the original material, and blacklist to prevent new incoming links or people trying to link to the original (probably it is best to whitelist the archive links). I believe that it is a risk for the reader as you have no clue what the current domain owner does - it is not impossible to install malware by following a malicious link, or appearing genuine and trying to take the details of an innocent reader. --Dirk Beetstra T C (en: U, T) 06:49, 22 July 2020 (UTC)
Yep, we need access to semi-automated archive link convert tool. This link is at multiple places, and fixing and requesting whitelisting at these places is going to be a PITA and suck time. While AWB is the easiest tool to set up, it sucks xwiki. I will try to find some time to see what pywikibot may be able to do for us. <ugh> I hate coding.  — billinghurst sDrewth 06:53, 22 July 2020 (UTC)
@Billinghurst: that exercise would not necessarily need us to stop blacklisting it already. The number of links in mainspace seems rather limited. --Dirk Beetstra T C (en: U, T) 07:56, 22 July 2020 (UTC)