Wikimania East Africa – 2025 in Nairobi, Kenya

Wednesday, 24 July 2024 09:00 UTC

The Core Organizing Team (COT) for 2025 is delighted to announce that the 20th Wikimania will be hosted in Nairobi, Kenya.

A giraffe with a beautiful background of the Nairobi City Skyline
A giraffe with a beautiful background of Nairobi City Skyline, Alexmbogo, CC BY-SA 4.0

Nairobi offers a thrilling fusion of modernity and tradition. There are opportunities for thrilling safaris, with Nairobi National Park, located within the city, providing a unique urban safari experience. The city’s rich cultural tapestry unfolds in its museums and historical landmarks, such as the Nairobi National Museum and the Karen Blixen Museum. Culinary delights abound, from bustling markets to gourmet eateries to savoury street food, not forgetting the outstanding Kenyan hospitality and the popular Nairobi nightlife.

In September 2023 the Wikimania Steering Committee announced a new regional model of collaboration and shared that the location for Wikimania 2025 will be in East Africa, followed by Paris in 2026.

The 2025 Core Organizing Team comprises members from the East African region, anchored in the East African Regional and Thematic Hub (EARTHub), who will work together to make the first Wikimania in East Africa an unforgettable experience. An official handover from the 2024 COT will take place during Wikimania in Katowice, Poland. 

Travel to Nairobi made easy

Nairobi promises an unforgettable journey where adventure, culture, and nature converge.

Nairobi is a major transport hub for both regional and international routes. Kenya’s visa policies were recently updated, making it a visa-free country for those wishing to enter as tourists. Attendees will only need to fill out an Electronic Travel Authorization form 72 hours before starting the journey to Nairobi. Wikimania 2025 will be a hybrid conference, providing virtual and in-person participation. The COT is also hard at work connecting with government bodies, partners, and national institutions to ensure that all Wikimedians, including the LGBTQI+ community, have a safe and unforgettable experience in Nairobi. 

Wikimania is the premier event for our movement to collaborate, discuss, and build ideas. It is a place where everyone belongs, where everyone, either virtually or in person, can celebrate free knowledge and the volunteers who help make Wikimedia projects happen. This edition marks a significant milestone as we celebrate 20 years of Wikimania.

As we move forward to August 2025, we say ‘‘karibuni to the Magical Kenya’’.

In 2021, a trade association called NetChoice sued the US states of Florida and Texas, asking courts to block laws aimed at social media from taking effect. Those laws could impact Wikipedia as well, by creating liability for the removal of false or inaccurate material expressing a political viewpoint. The cases rose through the US court system and were heard by the US Supreme Court in February 2024. In July 2024, the Court issued its decision: the Court makes no immediate change to the law, and its opinion contains some good language about content moderation that may be useful for the Foundation in future legal arguments. However, as the cases are instructed to return to the lower courts, the long-term implications of the decision remain to be seen.

A photograph of a tone plaque engraved with the First Amendment to the United States Constitution
Stone plaque engraved with the First Amendment to the United States Constitution. Image by Ed Uthman, CC BY-SA 2.0, via Flickr.

In 2021, the states of Texas and Florida in the United States (US) enacted laws designed to restrict social media platforms’ ability to enforce their own content policies. This was a response to high-profile content moderation decisions, which the states alleged constituted “censorship” by large social media platforms of some users’ viewpoints. NetChoice, a trade association representing large social media platforms and other tech companies, immediately sued to block these laws from taking effect by means of two lawsuits: NetChoice, LLC v. Paxton (.pdf file) in Texas, and Moody v. NetChoice, LLC (.pdf file) in Florida.

NetChoice claimed these state laws would violate the First Amendment rights of its member companies by forcing them to host speech with which they disagree. Two years and several appeals later, the US Supreme Court agreed to hear these challenges. Rather than answer the constitutional questions, the Court decided this month, July 2024, that the lower courts had not done their jobs correctly. It sent the cases back with instructions to try again—essentially pressing the “reset” button on an expensive multiyear legal process.

Background: What the cases are about and why we filed a “friend-of-the-court” or amicus brief

The apparent question at the core of these cases was whether laws that force private companies to host speech violate the constitutional rights of those companies. Both the Texas and Florida laws were written in a way that made it unclear whether their enforcement would apply to platforms like Wikipedia—for instance, by demanding the removal of inaccurate or unverifiable information about a political candidate. The enforcement of either law against a Wikipedia volunteer or the Wikimedia Foundation could disrupt Wikimedia communities’ decision-making processes and damage the quality and reliability of the content on Wikipedia and other Wikimedia projects.

The Foundation filed an amicus brief (.pdf file) to help the Supreme Court understand our concerns: laws that restrict community-led content moderation would infringe the First Amendment rights of Wikipedia volunteers and could damage the quality and reliability of Wikipedia by forcing them to include non-encyclopedic content.

In its ruling, the Supreme Court has reframed the central question of the cases to be about the appropriate judicial analysis when the constitutionality of a law is challenged “on its face” rather than “as applied” to a specific actor—that is to say, it is about challenging a law by arguing that it is always unconstitutional in contrast to arguing that a specific application of it is unconstitutional. Below we explain what that means.

What the Court’s opinion says and its meaning

The Court’s opinion was lengthy, including a majority decision plus four concurrences—opinions mostly agreeing with the majority decision. A few major themes emerged throughout the opinion: the importance of an “on its face” or facial versus “as-applied” challenge; the difficulties of challenging broadly-written laws; and a recognition that internet regulation will affect different websites and applications to different extents.

The way you challenge laws matters

As explained above, in these two cases NetChoice brought forward what the Supreme Court describes as “facial” challenges, asking courts to block the laws before they took effect and claiming that the laws would be unconstitutional if enforced against any of NetChoice’s members. State legislatures and the district courts understood that these laws were designed to punish the three largest social media platforms in 2020 (that is to say, Facebook, Twitter, and YouTube) for actions taken to limit the former US president’s use of his accounts on those platforms. The District Courts in Texas and Florida and, on appeal, the federal Courts of Appeals for the Fifth and Eleventh Circuits, all treated these challenges accordingly. In legal terms, they approached the constitutional questions “as-applied” to Facebook et al.

The Supreme Court rejected this approach. In the Court’s opinion, the lower courts failed to correctly treat these as facial challenges, which require the lower courts to first determine the scope of all possible applications of the laws: could they apply to online platforms like Etsy, Uber, Venmo, and/or Gmail? The lower courts should have then determined whether a substantial portion of those possible applications would have been unconstitutional. Or, as US Supreme Court Justice Kagan instructs: “[A] court must determine a law’s full set of applications, evaluate which are constitutional and which are not, and compare the one to the other.”

Only time will tell how the Fifth and Eleventh Circuit Courts will respond to the ruling, but these cases could have long-term implications for free expression, future legislative proposals, and future constitutional challenges to enacted laws.

Challenging broadly written laws may be more difficult

One of the potential impacts of the Court’s ruling in the NetChoice cases is that it could be more difficult to successfully challenge laws that are broadly written, such as laws that could apply to many different kinds of services and/or actions. The Texas and Florida state laws at issue here both used very broad definitions of “social media,” and restricted a wide range of content moderation methods used in different ways by different platforms. We argued in our amicus brief that the laws are written so broadly that they could potentially be applied to volunteer-run projects like Wikipedia. Even the lawyers responsible for defending the laws before the Court were unable to say to whom the laws apply or what the laws would require a platform to do. This became a stumbling block for the case. When a law is written broadly, determining that “full set of applications”—and evaluating them as Justice Kagan described—becomes more difficult and may become practically impossible.

The Court’s Justices expressed concerns about striking down laws that could have some constitutional applications, but the discussion in both cases had only addressed how the laws might impact a few social media platforms. Moving forward, US state or national legislatures could take advantage of this ruling by drafting laws broadly enough so that no individual parties could successfully challenge the entire law. Additionally, courts may be less willing to hear “facial” challenges, limiting their consideration to laws “as-applied” to individual parties. Litigation costs to challenge poorly written laws will increase, and more plaintiffs will be needed to take down unconstitutional laws. Over time, this could mean that more constitutionally questionable laws remain in effect longer. This would not be an ideal outcome because legal uncertainty can have chilling effects on freedom of expression.

The Court understands that there’s more to the internet than Facebook and YouTube

One positive element of the Court’s decision is an acknowledgment that “the internet” is made of more than just a handful of large social media platforms, and also that attempts to regulate technology giants may sweep in many other kinds of apps and/or websites as well. To quote Justice Kagan: “The online world is variegated and complex, encompassing an ever-growing number of apps, services, functionalities, and methods for communication and connection.” One of the concurring opinions even included citations of our amicus brief (.pdf file; see pages 68 and 85), a specific acknowledgement by the Court that Wikipedia is among the websites that could suffer unexpected consequences from laws like the ones at issue in this case.

What it means for the Wikimedia Foundation and projects

The Court’s decision in the NetChoice cases is good enough for now: there is no immediate change to the law, and it contains some good quotes about content moderation that may be useful for the Foundation in future legal arguments. However, the long-term implications of the decision remain to be seen.

No change for now

The Court’s ruling sends the cases back to the Fifth and Eleventh Circuit Courts, and the Texas and Florida laws are still on hold, for now. However, the preliminary injunctions blocking these laws from taking effect may not last forever. We will be sure to monitor the status of these state laws and provide updates as the lower courts take action, especially if either state’s laws are allowed to go into effect.

Good language on content moderation could help the Wikimedia projects in the future

The majority opinion for the NetChoice cases had one clear mandate: The lower courts must redo their approach to analyzing the legal challenges brought by NetChoice. However, five Justices agreed to offer additional guidance on how lower courts should address the First Amendment questions at the heart of these cases. In general, these Justices (i.e., Kagan, Roberts, Sotomayor, Kavanaugh, and Barrett) supported the idea that the First Amendment protects acts involving “editorial discretion,” including all of the ways Wikipedia volunteer editors contribute to and maintain Wikipedia’s encyclopedic content.

The majority opinion clearly opposed the notion that US state or federal governments can force private companies to host speech from any political viewpoint, acknowledging that online platforms can and do curate the content on their websites as they see fit. Overall, the majority opinion indicated that the Texas and Florida laws would likely run afoul of the US Constitution in those contexts.

Some uncertainty about long-term implications for state laws

While it is reassuring that a majority of the Court views content moderation as a kind of protected speech, it is not clear how the lower courts will implement the Court’s instructions to try again. There is at least a chance that the lower courts will let the Texas and/or Florida laws take effect so that new legal challenges can be brought against an actual application of the law. As the Court noted, no one really knows who might be subject to either of these laws, so how they’ll be enforced and which platforms could see legal action remain open questions.

There is also some chance that the lower courts reconsider these cases as “as applied” constitutional challenges based on the facts they already have. This approach could produce legal confusion if, for example, the Texas law is found to be unconstitutional as applied to Facebook and X (formerly Twitter) newsfeeds, but courts do not determine whether the statutes might be lawfully applied to any other website or application and leave the statutes in place.

There are certainly other possible routes for these cases to take as well, many of which could leave an unknown number of websites and applications on uncertain legal footing.

What comes next

The lower courts must either follow instructions from the Supreme Court or send cases back to district courts. The reason is that in the US legal system, appellate courts like the Fifth and Eleventh Circuit Courts, as well as the Supreme Court, can typically only consider the facts established in the record by the trial courts of first impression. This means that when judges hear cases on appeal, they are limited to the scope of information presented to the courts where the cases were initially heard. Generally, appellate courts may not gather new facts or evidence: they can only say whether or not the previous court applied the law correctly based on the record. 

For the NetChoice cases, this means that the circuit courts likely will not add to the record of facts and arguments that came from the district courts. This also means the circuit courts are unlikely to consider whether the Texas law could be applied to Wikipedia and, if that were so, whether that application would be constitutional. In practice, the circuit courts may have little choice but to send the cases back to the district courts with the Supreme Court’s instructions on how to address facial challenges.

Conclusion

This is the second year in a row that the US Supreme Court has ruled on a case of major significance to the Wikimedia projects, and it seems unlikely that it will be the last. Lawmakers are rightfully concerned about a variety of potential harms online, although many legislative proposals to address those harms come at the expense of freedom of expression online.

The Wikimedia Foundation remains committed to defending the rights of Wikimedia volunteer editors and readers to share and receive knowledge online, and we will continue to challenge laws that threaten those rights. As the NetChoice cases make their way back through lower courts, or possibly return to the Supreme Court someday, we will continue to track them.

We hope to create a legal environment in which no judge can limit freedom of expression online without first considering the impact of their actions on the Wikimedia projects.

Arabic and the Web

Tuesday, 23 July 2024 10:50 UTC


A Call to Action for Arabic Content on Wikipedia: Bridging the Digital Divide

I recall attending a Wikipedia workshop organized by the Institute of Computer Science at the University of Oxford. The pressing question raised was: why, despite having nearly half a billion Arabic speakers, is Arabic content on Wikipedia less than 5%? Moreover, of this small percentage, perhaps only a third is truly useful. This query, seeking answers and potential solutions, highlighted the long road ahead to enrich Arabic content online.

Arabic speakers, unlike many Americans or Europeans, often master multiple languages. For instance, many Algerians speak French, and many Egyptians speak English. Despite this multilingualism, the time spent learning additional languages instead of focusing on scientific knowledge can be detrimental. Those who do not master a secondary language often lag in their fields, struggling to grasp the cultural nuances embedded in the language.

Focusing specifically on Algeria, how many of its citizens actively contribute to Wikipedia? The number of contributors and the amount of encyclopedic content they add remains uncertain. However, a 140-page report from Oxford, which I reviewed and highly recommend for its excellent analyses, sheds light on these issues.

In summary, to address the dearth of Arabic content, we must define clear objectives, organize efforts, and direct work towards these goals with persistence, adaptability, and repetition. I remain optimistic about our future.

For more detailed insights, you can access the full report here: Oxford Study Report.

(This conversation was originally part of a social media exchange and has been compiled here to reflect multiple responses with a conversational tone.)

Enable GingerCannot connect to Ginger Check your internet connection
or reload the browserDisable in this text fieldEditEdit in GingerEdit in Ginger×

Using Wikipedia as a Tool for Climate Action

Tuesday, 23 July 2024 07:00 UTC

Did you know that Africa, home to a billion-plus people, contributes a mere 8% to global waste? Yet, the continent bears the brunt of climate change’s wrath. To change this narrative, Walewale Wiki Hub and Tamale Wiki Hub (led by Christian Yakubu and Abdul Rahim Ziblim) join the #WikiforhumanRights 2024 campaign on the theme #KnowledgeforSustainableFuture. As part of the campaign, we hosted a launch event on the 15th of June 2024 to introduce our communities to the campaign and outline activities. A recording can be found here (Google Drive link)

Ruby Damenshie-Brown, the WikiforHumanRights African coordinator, highlighted a glaring issue: Wikipedia’s underrepresentation of Africa. Our continent’s stories, especially those about our environment, are often overlooked. This knowledge gap is a missed opportunity to raise awareness and inspire action.

Let’s flood Wikipedia with African climate stories! By translating and creating articles, uploading images, videos, and audio on Wikimedia Commons, we can vividly showcase the impact of climate change on our communities. ‘’Remember, every piece of content on Wikipedia has the potential to reach a global audience of millions in over 300 languages!’’-Ruby Dameshie-Brown

Wikimedian Stephen Dakyi emphasized the importance of contributing to our local Wikipedia platforms. By sharing information in our native languages, we can empower communities to understand and address climate challenges. He also took participants through practical editing and how to use the translation tool on Wikipedia to translate content in our indigenous language.

It’s time to rewrite Africa’s climate story. Let’s use Wikipedia to amplify our voices, inspire change, and build a sustainable future.

#ClimateAction #Wikipedia #Africa #EnvironmentalAwareness #WikiForHumanRight2024 #KnowledgeforSustainableFuture

Building a less terrible URL shortener

Tuesday, 23 July 2024 05:33 UTC

The demise of goo.gl is a good opportunity to write about how we built a less terrible URL shortener for Wikimedia projects: w.wiki. (I actually started writing this blog post in 2016 and never got back to it, oops.)

URL shorteners are generally a bad idea for a few main reasons:

  1. They obfuscate the actual link destination, making it harder to figure out where a link will take you.
  2. If they disappear or are shut down, the link is broken, even if the destination is fully functional.
  3. They often collect extra tracking/analytics information.

But there are also legitimate reasons to want to shorten a URL, including use in printed media where it's easier for people to type a shorter URL. Or circumstances where there are restrictive character limits like tweets and IRC topics. The latter often affects non-ASCII languages even more when limits are measured in bytes instead of Unicode characters.

At the end of the day, there was still considerable demand for a URL shortener, so we figured we could provide one that was well, less terrible. Following a RfC, we adopted Tim's proposal, and a plan to avoid the aforementioned flaws:

  1. Limit shortening to Wikimedia-controlled domains, so you have a general sense of where you'd end up. (Other generic URL shorteners are banned on Wikimedia sites because they bypass our domain-based spam blocking.)
  2. Proactively provide dumps as a guarantee that if the service ever disappeared, people could still map URLs to their targets. You can find them on dumps.wikimedia.org and they're mirrored to the Internet Archive.
  3. Intentionally avoid any extra tracking and metrics collection. It is still included in Wikimedia's general webrequest logs, but there is no dedicated, extra tracking for short URLs besides what every request gets.

Anyone can create short URLs for any approved domain, subject to some rate limits and anti-abuse mechanisms via a special page or the API.

All of this is open source and usable by any MediaWiki wiki by installing the UrlShortener extension. (Since this launched, additional functionality was added to use multiple character sets and generate QR codes.)

The dumps are nice for other purposes too, I use them to provide basic statistics on how many URLs have been shortened.

I still tend to have a mildly negative opinion about people using our URL shortner, but hey, it could be worse, at least they're not using goo.gl.

Tech/News/2024/30

Tuesday, 23 July 2024 00:08 UTC

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Feature News

  • Stewards can now globally block accounts. Before the change only IP addresses and IP ranges could be blocked globally. Global account blocks are useful when the blocked user should not be logged out. Global locks (a similar tool logging the user out of their account) are unaffected by this change. The new global account block feature is related to the Temporary Accounts project, which is a new type of user account that replaces IP addresses of unregistered editors that are no longer made public.
  • Later this week, Wikimedia site users will notice that the Interface of FlaggedRevs (also known as “Pending Changes”) is improved and consistent with the rest of the MediaWiki interface and Wikimedia’s design system. The FlaggedRevs interface experience on mobile and Minerva skin was inconsistent before it was fixed and ported to Codex by the WMF Growth team and some volunteers. [1]
  • Wikimedia site users can now submit account vanishing requests via GlobalVanishRequest. This feature is used when a contributor wishes to stop editing forever. It helps you hide your past association and edit to protect your privacy. Once processed, the account will be locked and renamed. [2]
  • Have you tried monitoring and addressing vandalism in Wikipedia using your phone? A Diff blog post on Patrolling features in the Mobile App highlights some of the new capabilities of the feature, including swiping through a feed of recent changes and a personal library of user talk messages for use when patrolling from your phone.
  • Wikimedia contributors and GLAM (galleries, libraries, archives, and museums) organisations can now learn and measure the impact Wikimedia Commons is having towards creating quality encyclopedic content using the Commons Impact Metrics analytics dashboard. The dashboard offers organizations analytics on things like monthly edits in a category, the most viewed files, and which Wikimedia articles are using Commons images. As a result of these new data dumps, GLAM organisation can more reliably measure their return on investment for programs bringing content into the digital Commons. [3]

Project Updates

  • Come share your ideas for improving the wikis on the newly reopened Community Wishlist. The Community Wishlist is Wikimedia’s forum for volunteers to share ideas (called wishes) to improve how the wikis work. The new version of the wishlist is always open, works with both wikitext and Visual Editor, and allows wishes in any language.

Learn more

  • Have you ever wondered how Wikimedia software works across over 300 languages? This is 253 languages more than the Google Chrome interface, and it’s no accident. The Language and Product Localization Team at the Wikimedia Foundation supports your work by adapting all the tools and interfaces in the MediaWiki software so that contributors in our movement who translate pages and strings can translate them and have the sites in all languages. Read more about the team and their upcoming work on Diff.
  • How can Wikimedia build innovative and experimental products while maintaining such heavily used websites? A recent blog post by WMF staff Johan Jönsson highlights the work of the WMF Future Audience initiative, where the goal is not to build polished products but test out new ideas, such as a ChatGPT plugin and Add a Fact, to help take Wikimedia into the future.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe. You can also get other news from the Wikimedia Foundation Bulletin.

On June 22, 2024, the Wikimedia User Group Nigeria hosted a virtual ceremony to celebrate the achievements of Nigerian contributors to the Wiki Loves Africa 2024 and Wiki Loves Folklore 2024 campaigns. This event marked a significant milestone in promoting Nigerian culture and heritage on Wikimedia projects.

The ceremony brought together Wikimedia enthusiasts, photographers, and culture enthusiasts from across Nigeria, showcasing the country’s rich cultural diversity and creative talent. The event highlighted the importance of collaborative efforts in documenting and sharing Nigeria’s cultural heritage with a global audience.

Wiki Loves Africa

Wiki Loves Africa is an annual contest where individuals across Africa can contribute media related to the theme of the year to Wikimedia Commons for use on Wikipedia and other Wikimedia projects. This year’s theme was “Africa Creates,” which aimed to showcase the creative and cultural aspects of Africa. The contest was held from March 1 to April 30, 2024, and received an overwhelming response from Nigeria, with:

  • 3,021 images submitted
  • 182 registered participants
  • 76 new users joining the contest

Jury Process

The Wiki Loves Africa in Nigeria 2024 jury process commenced on May 17, 2024, and spanned two weeks. A panel of 8 professional photographers and experts from Nigeria, renowned for their attention to detail, constituted the jury. The montage tool was used for the jury process which consisted of three rounds:

  • Round 1 (Yes/No): Each juror reviewed 290 images, making decisions on which ones to advance. This round focused on removing images that didn’t align with the contest scope.
  • Round 2 (Rating): The second round presented 188 images to each juror for evaluation, rating them from 1 to 5 stars. 5 stars for exceptional images that strongly align with the theme, 4 stars for good images within the theme, and 3-1 stars for Images ranked according to the juror’s discretion.
  • Round 3 (Ranking): The jurors assessed the top 20 images and ranked them from top 1 to 20. After this round, the jury engaged in deliberations, which led to the final conclusions.

Meet the Jury

Winning Images

Wiki Loves Africa Nigeria 2024 Top 10 Images:

Wiki Loves Africa Nigeria 2024 3rd Place:

Wiki Loves Africa Nigeria 2024 2nd Place:

Wiki Loves Africa Nigeria 2024 1st Place:

Wiki Loves Folklore

Wiki Loves Folklore is an international photographic competition that celebrates cultural diversity worldwide through Wikimedia projects. This year’s contest was held from February 1 to March 31, 2024, and encouraged individuals to share photos showcasing Nigerian culture, folklore, and storytelling. The contest saw:

  • 819 images submitted
  • 55 registered participants
  • 21 new users joining the contest

Jury Process

The Wiki Loves Folklore in Nigeria 2024 jury process commenced on May 17, 2024, and spanned two weeks. A panel of 7 professional photographers from Nigeria, renowned for their attention to detail, constituted the jury. The montage tool facilitated the entire process, which consisted of three rounds:

  • Round 1 (Yes/No): This initial round focused on removing images that didn’t align with the contest scope. Each juror reviewed 118 images, making decisions on which ones to advance.
  • Round 2 (Rating): In this round, each juror assessed 49 images, rating them from 1 to 5 stars. 5 stars for exceptional images that strongly aligned with the theme, 4 stars for good images within the theme, and 3-1 stars for Images ranked according to the juror’s discretion.
  • Round 3 (Ranking): The jurors ranked their top images from 1 to 14. After this round, the jury engaged in deliberations, leading to the final conclusions.

Meet the Jury

Winning Images

Wiki Loves Folklore Nigeria 2024 Top 10 Images:

Wiki Loves Folklore Nigeria 2024 3rd Place:

Wiki Loves Folklore Nigeria 2024 2nd Place:

Wiki Loves Folklore Nigeria 2024 1st Place:

Announcement Ceremony

The Wikimedia User Group Nigeria hosted a virtual ceremony on June 22, 2024, to celebrate the outstanding contributions of Nigerians to the Wiki Loves Africa and Wiki Loves Folklore campaigns. The event was moderated by Barakat Adegboye and featured insightful keynote addresses from Suyash Dwivedi, Wilson Oluoha (represented by Ogali Hillary), and Tiven.

The ceremony commenced with a warm welcome by Ayokanmi Oyeyemi, WUGN Programs Director, who outlined the session’s objectives. Suyash Dwivedi, Chairman of the Wikimedia Commons Photographers User Group, then gave his keynote speech which emphasized the significance of photographer contributions to Wikimedia projects, highlighting their impact on cultural preservation and sharing.

The Wiki Loves Africa award session followed, with Wilson Oluoha, International Organizer of Wiki Loves Africa’s representative, Ogali Hillary, providing an overview of the campaign’s success and Nigerian contributions. After that, Barakat Adegboye announced the winners, from third to first place, and invited them to share their inspiring stories and remarks. To round up the WLA award session, Oludare Kehinde and Blaize Itodo, jury representatives, shared their experiences and the technicalities of the jury process.

The Wiki Loves Folklore award session began with Tiven, International Organizer of Wiki Loves Folklore’s keynote speech, emphasizing the campaign’s goals and the importance of continued participation. This was followed by the announcement of the winners Wiki Loves Folklore Nigeria 2024 winners by Barakat Adegboye who also invited the winners to share their experiences and inspirations. To round up this session, Gaspard Linouenou Koutchika and Enoch Olaonipekun, Wiki Loves Folklore Nigeria 2024 jury representatives, provided valuable insights into the jury process.

Dappasolomon001 receiving his certificates of award presented by Olusola Olaniyan (WUGN Chairman) and Ayokanmi Oyeyemi (WUGN Programs Director) for emerging as the winner of the the Wiki Loves Africa in Nigeria 2024 and Wiki Loves Folklore in Nigeria 2024, at the WUGN’s office in Lagos

After the awards sessions, Rhoda James briefly announced the ongoing Wiki Loves Earth campaign, encouraging participation from photographers and Wikimedians across Nigeria. The announcement ceremony was rounded up by Olusola Olaniyan, WUGN Chairman, who delivered closing remarks, expressing gratitude to all participants and contributors. The ceremony successfully celebrated Nigerian creativity and cultural diversity, inspiring future contributions to Wikimedia projects.

Conclusion

The Wiki Loves Folklore and Wiki Loves Africa campaigns in Nigeria 2024 have been a resounding success, showcasing the country’s rich cultural diversity and creative talent. The impressive participation and engagement demonstrated the power of collaborative efforts in documenting and sharing Nigeria’s cultural heritage with a global audience.

We extend our heartfelt gratitude to all participants, whose contributions have enriched Wikimedia projects with unique perspectives and insights into Nigerian culture. We appreciate the dedication and expertise of the jury members, who tirelessly evaluated entries and provided valuable feedback.

Our sincere thanks also go to the guests, including Suyash Dwivedi, Wilson Oluoha, Ogali Hillary, and Tiven, whose keynote addresses inspired and motivated us. We appreciate the support of the Wikimedia User Group Nigeria, whose efforts made this event possible.

As we celebrate this success, we look forward to future collaborations and continued contributions to Wikimedia projects. We encourage everyone to keep sharing their knowledge, experiences, and creativity, ensuring that Nigerian culture and heritage continue to thrive on the global stage. Together, let’s keep building a repository of knowledge that showcases the best of Nigeria and Africa.

Ever thought about who owns the copyright of a picture taken of you? You may be surprised to learn that the person who takes the picture, regardless of who owns the camera or who is in the picture, owns the copyright. At least this is true in the United States, whose copyright laws govern content on English Wikipedia and Wikimedia Commons. Of course, most Wikipedia users aren’t thinking about such questions, and that leads to a lot of trouble when they upload images to Wikipedia. 

Recently, I, a user researcher at the Wikimedia Foundation, embarked on a research project in partnership with the Structured Content team, to learn how users are interacting with and interpreting the upload interface, and how they understand the rules for uploading images to Wikipedia. The team has been focused on decreasing the number of ineligible files on Wikimedia Commons, and learned that there is a disproportionate number of image deletions originating from Wikipedia as compared to Wikimedia Commons. We hypothesized that the upload interface in the Visual Editor on Wikipedia may either encourage or fail to prevent ineligible file uploads.

Our challenge was finding the right users to interview. Initially, we wanted to speak with users who had uploaded copyright infringements to Wikipedia, but this group was difficult to contact. We decided to contact any user that has uploaded images to English and Arabic Wikipedia in the past six months. We reached out to every user who had EmailUser enabled and interviewed those who responded.

To our surprise, nearly all the users we interviewed had unknowingly uploaded copyright-violating images, according to the strictest interpretations of copyright law. However, most of these images had not been moderated, revealing that the issue of copyright violations is more widespread than our deletion metrics suggest.

Key Findings from the Research

Misunderstanding “Own Work”

One of the most prevalent issues we uncovered was the widespread misunderstanding of the “own work” designation. With two exceptions, all users we interviewed incorrectly labeled images as “own work”, the only choice presented on the upload interface. In our interviews, it became clear that users often believed that “own work” meant they had the right to use the image, even if they didn’t create it themselves. Understandably, this confusion stemmed from users’ lack of understanding of copyright laws, and their experiences with social media platforms where the rules are often more lenient. Those who had doubts about their eligibility to upload the image were dissuaded from pursuing a more lengthy process for uploading someone else’s work.

Copyright Confusion

All of our participants were unaware of the intricate copyright laws that govern image uploads on Wikipedia. For example, an IT professional uploaded their company’s logo, assuming it was permissible since the company owned the logo. However, logos are typically copyrighted by default, and the correct process for releasing the copyright of a logo is first proving ownership through the Volunteer Response Team (VRT) process, a step that is not communicated during the upload process on Wikipedia.

One of the most telling stories from our research came from one participant who was the marketing director for an intellectual property law firm. They uploaded an image of their CEO to the corresponding Wikipedia article, only to discover it had been removed due to the copyright of the photographer embedded in the image metadata. The participant faced significant difficulties understanding and rectifying the situation, leading to feelings of exasperation, despite dealing with U.S. copyright law on a daily basis.

Licensing Confusion

Not only are users confused about copyright, but our research revealed a significant lack of understanding regarding image licensing in general. Many users did not realize that uploading an image entails releasing it under a free license, as Wikimedia Commons does not host unfree media. Additionally, they did not understand the implications of releasing an image under a free license, which means that anyone on the internet can use the image for any purpose. Lacking this understanding, users regularly upload other’s copyrighted images, which produces a massive moderation burden for our volunteer moderators, and compromises the ownership status of the image.

Two User Patterns

Another unexpected finding is how neatly all of the users interviewed fell into one of two categories. For simplicity, we have labeled them Self-Promoters and Marketers. Self-Promoters upload images of themselves to an article about themselves, either published, deleted, or still in their sandbox. They are unaware of Wikipedia’s notability requirement and Common’s rule against personal photos. Additionally, their image is technically the property of the person who took the photo, and in all cases, that was not the person who uploaded the image. Marketers, on the other hand, upload images on behalf of their employers or an entity they represent, often logos or professional images, assuming they have the right to do so because of their work relationship. Both user types highlight a significant gap in understanding and interpreting Wikipedia’s image policies.

Our Takeaways

This research underscored the complexities of uploading images to Wikipedia. The findings emphasize the need for clearer and easily accessible guidelines, better user education, and more intuitive tools to support our users in contributing in accordance with Wiki policies. By addressing these areas, we can create a more user-friendly and compliant environment for contributors, encouraging more productive participation in building the world’s largest free encyclopedia.

For a more detailed report on our findings and recommendations, you can access the full report on Wikimedia Commons.

As we wrap up the financial year 2023-24, we can share that we partnered with 12 new organisations and added 14 new content collections to The Wikipedia Library in the last 12 months. Twelve of the 14 collections are included in the Library Bundle, which gives immediate access to all eligible editors. Seven of the new collections are in a  language other than English. This focus on diversifying the language of materials in the Library will continue this year. 

Here’s what changed last quarter. 

Illustration of a book with text highlighted in different colors

New partnerships

We partnered with three new organisations in the last three months and they are now available in the library.

  • Institute of Electrical and Electronics Engineers (IEEE) is providing access to IEEE Xplore for Wikimedians. This collection offers quality technical literature in engineering and technology and was first requested by the community in the Suggest page of The Wikipedia Library. This collection is in the Library Bundle so the content is available instantly to all eligible editors. 
  • We partnered with Central European University Press, making their eBooks available to Wikimedians for free. This access is provided through JSTOR, which is already available in the Library Bundle. The press release for this partnership includes a quote from librarian and Wikipedia editor, Claudia Serbanuta: “Through partnerships like CEU Press, the Wikipedia Library provides crucial resources to editors in our region and worldwide who are not affiliated with academic institutions and don’t have access to research libraries. Together, we are building a repository of knowledge that transcends borders, enabling individuals to explore, learn, and contribute to our collective understanding of our shared history.” 
  • We also partnered with l’Informé, an independent media organisation based in Paris. Thanks to Fabien from the French Wikimedia community who helped us secure this partnership!

The Wikipedia Library now has a dedicated page on the Foundation’s website to explain why the world’s top publishers partner with Wikipedia. Please feel welcome to use this page in your own outreach to potential partners. 

Extending access

In the last three months Brill, Al-Mahnal and World Scientific extended access for another year. 

In April, Perlego gave every participant in the EveryBookItsReader campaign access to their eBooks catalogue of more than a million titles. This is the second consecutive year that Perlego has supported this multilingual campaign to improve information about books and authors.

Let’s Connect and community conferences

In April, the Let’s Connect team organised a Learning Clinic introducing  The Wikipedia Library and explaining how we build partnerships. You can find the slides and recordings here in English, Arabic, French, Spanish and Portuguese.

We will be presenting a lighting talk about The Wikipedia Library at Wikimania and will also be attending WikiCon North America in October. We look forward to connecting in-person soon. 

Have you ever tried explaining the free knowledge movement to your cousin? Or maybe you’ve struggled to describe to your neighbor how misinformation is addressed on Wikipedia? Or perhaps all of your friends ask you about the Fundraising messages on Wikipedia?

If any of this sounds familiar, then you should take a look at the A Wiki Minute animated video series created by the Wikimedia Foundation’s Communications department. As of this writing, this series explains the answers to common questions about the Wikimedia movement through 13 engaging, one-minute videos, available in six different languages.

The A Wiki Minute project began in 2023 with 10 “basic” questions, each explaining a core part of the Wikimedia ecosystem, like “How does Wikipedia work?”; “How can you join the Wikimedia free knowledge movement?”; or “What makes a Wikipedia article unique?”. The purpose of this content was to create evergreen content that could support our initiatives and campaigns in a simple and straightforward manner. Earlier this year, we created 3 more videos, to clarify ”How does Wikipedia protect readers’ Privacy?”; “Who is in charge of content on Wikipedia?”; and “What makes Wikipedia different from social media platforms?”.

But how effective are these videos in building understanding and even affinity for Wikimedia and our projects? In order to measure the “health” of our brand, we run recurring and ad-hoc surveys to assess how Wikimedia brands are resonating with global audiences. One important metric we track is the Net Promoter Score (NPS), which indicates the strength of the Wikipedia user experience and our brand reputation overall.

We applied this question to surveys about A Wiki Minute videos in the US and Nigeria, asking: “How LIKELY is it that you would RECOMMEND Wikipedia to a friend or colleague, after having watched the video?” We were excited by the positive impact of these videos to our NPS score in both markets. After watching a single minute video, NPS scores went up 16 to 38 points above our baseline in the US and increased 21 to 24 points above our baseline in Nigeria. The evidence is clear: these videos increase affinity for Wikipedia and our other projects, and they work well in educating folks about our movement.

The Foundation has found numerous opportunities to promote these videos, particularly in support of Communications campaigns like Open the Knowledge: Stories, Journalism Awards, Knowledge is Human, and Wikipedia Needs More Women. They’re also being used consistently by our PR and social media teams to increase understanding of the Wikimedia movement and ecosystem. 

But we’ve been excited to hear that many Wikimedians have also started to find ways to incorporate these videos in their work as well. For example, French Wikipedia added a video to their Wikipedia in Brief page, and the Celebrate Women project added videos to their meta page. We’d love to hear more of your stories of how you’ve been able to leverage the A Wiki Minute videos in your work. Share how you’re using these videos on our Meta-wiki talk page, and inspire other community members to share these videos far and wide. 

Tech News issue #30, 2024 (July 22, 2024)

Monday, 22 July 2024 00:00 UTC
previous 2024, week 30 (Monday 22 July 2024) next

Tech News: 2024-30

Semantic MediaWiki 4.2.0 released

Sunday, 21 July 2024 15:00 UTC

July 18, 2024

Semantic MediaWiki 4.2.0 (SMW 4.2.0) has been released today as a new version of Semantic MediaWiki.

It is a feature release that brings a faceted search interface (Special:FacetedSearch) and adds the source parameter to the "ask" and "askargs" API modules. Compatibility was added for MediaWiki 1.40.x and 1.41.x as well as PHP 8.2.x. It also contains maintenance and translation updates for system messages. Please refer to the help pages on installing or upgrading Semantic MediaWiki to get detailed instructions on how to do this.

weeklyOSM 730

Sunday, 21 July 2024 10:18 UTC

11/07/2024-17/07/2024

lead picture

The OpenStreetMap Calendar for July 2024 [1] | © thomersch, OSMCAL

Mapping

  • Mirikaaa, from the Mapbox Data Team, posted on the OSM Community forum about their project to improve the representation of Indonesia’s road network in OpenStreetMap, such as adding roads, correcting alignments and missing links, correcting names, ensuring that road classifications are consistent, and other similar issues.
  • The proposal to specify ordering-only phone number, SMS-only phone numbers, and related tags is open for voting until Monday 29 July.

Mapping campaigns

  • Mateusz Konieczny has developed a website to cross-reference the AllThePlaces dataset with existing OpenStreetMap data to identify missing or outdated entries and improve the accuracy of locations such as shops and services. He also highlighted the importance of verifying data before importing it to ensure reliability.
  • The UN Mappers blog reported on the completion of a project to fix disconnected roads in Somalia, thanking all the volunteers who participated in the MapRoulette challenge. The article describes the methodology used, the results achieved, and the issues faced during the project.

Community

  • Anne-Karoline Distel blogged about her mapping of unrecorded burial grounds.
  • Brazil Singh and his team have run a workshop, at Jahangirnagar University in Bangladesh, providing practical and theoretical training in OpenStreetMap and Mapillary, engaging participants in hands-on mapping activities and discussions on geospatial technologies.
  • Antonin Delpeuch described the experience of contributing to Organic Maps as a novice mobile application developer trying to add a feature to display the smoking status of places. Despite initial support, challenges included setting up the development environment, navigating the codebase, and dealing with project governance and code formatting guidelines. Ultimately, after mixed feedback and potential rejection of the feature, he decided not to continue contributing due to these difficulties.
  • OpenStreetMap is celebrating its 20th anniversary, marking two decades of global, community-driven mapping. The platform has grown from a small UK-based project to a major provider of open source geospatial data, with tens of thousands of contributors worldwide. This website highlights key milestones, encourages participation in local celebrations, and invites contributors to sign a digital birthday card.
  • ManuelB701 blogged about the various faux pas you can commit when mapping pavements.
  • Michael Reichert is on his way from Karlsruhe to the SotM EU 2024 in Łódź by bike. He has shared updates and experiences from his journey on Mastodon.
  • Jiří Eischmann discussed the importance of contributing to OpenStreetMap, highlighting its widespread use by several major platforms such as Apple Maps, TomTom, and Strava. They highlighted the impact of users’ contributions in improving map accuracy and explained how changes to OSM benefit many applications, even if the direct use of OSM isn’t always obvious. The post aims to encourage more people to contribute by outlining the different ways to get involved, from simple edits to more advanced mapping tasks.
  • The UN Mapper of the Month for July is Sami Skhab, a Tunisian cartographer with extensive experience in GIS and remote sensing.
  • Christoph Hormann reflected on twenty years of OpenStreetMap, examining how the project has evolved and diverged from its original ideals of local, community-driven mapping. Chris found that while there are trends towards large-scale data addition and organisational control, the core values of local knowledge sharing and egalitarian collaboration among contributors remain strong. He also discussed the potential for future changes in OSM’s structure and the importance of maintaining respect for its founding principles.
  • Valerie Norton elaborated on mapping trails with the atv tag (for small wheeled vehicles) and how she decided on the tags to use for that.

Events

  • Tobias Jordans has compiled English translations for some of the recent SOTM FR talks, which are now available with English subtitles.
  • At the 16th ‘mapbox/OpenStreetMap Online Meetup’, held on Friday 19 July, ‘Team Anno’, led by Tokyo gubernatorial candidate Anno Takahiro, discussed the use of web maps in elections, in particular their innovative ‘Election Bulletin Board Map’. Hosted by Aoyama Gakuin University’s Furuhashi Laboratory and supported by Mapbox Japan, the event aimed to explore the future role and potential of digital maps in election campaigns and geospatial technologies.

Maps

  • geoObserver discussed the history, current state and future trends of OpenStreetMap map design. They highlighted the importance of effective cartographic design in presenting OSM data, covering aspects such as colour schemes, symbols and interactive web maps. The discussion is based on three in-depth posts from Christoph Hormann’s blog, covering digital cartography, typography and data visualisation within the OSM community.

OSM in action

  • Canadian software company Parallel 42 Systems has created a web app that helps users to visit street art in Windsor, Ontario and Detroit, Michigan. Motor City Murals provides walking directions that allow users to move at their own pace. Using Pytheas, an open source project dedicated to these types of tours, OpenRouteService, and self-collected data, Motor City Murals provides previews of murals as well as information on the artist and surrounding area. While visitors must be within a defined bounding box to receive routing, all of the map contents are available regardless of location.
  • The Lucky Map tool, on the Yakumoin website, helps users determine auspicious directions and locations based on Nine Star Ki astrology, allowing them to search for shrines, temples, and other significant sites, while providing features for registering and customising personal points of interest.

Software

  • [1] Thomas Skowron blogged about OpenStreetMap Calendar, software he started developing five years ago, which weeklyOSM includes in each issue (see below). Thank you and congratulations!
  • Jake Coppinger has explored how urban intersections can be optimised for safety and efficiency through data analysis and innovative design. The project presents preliminary findings on traffic patterns, accident rates, and potential improvements. It also proposes solutions such as improved signal timing and redesigned layouts to reduce congestion and accidents (we reported earlier).
  • David Larlet discussed the upcoming release of uMap 3, which includes key features such as real-time collaboration and remote data importers. These enhancements aim to improve map editing and data integration, and were supported by NLnet sponsorship and community feedback. The update also brings a new user documentation website and various interface improvements.
  • Mapswipe is now available in your browser. You can check out the training deck on Google Presentations.
  • The OSM WordPress plugin, which is currently under temporary review, had previously allowed users to view geotagged posts, create maps, and integrate geospatial data into WordPress sites.

Programming

  • Luuk van der Meer’s presentation at useR! 2024 introduced the sfnetworks package for analysing OpenStreetMap (OSM)-based road networks using R. The package integrates spatial networks and provides tools for advanced spatial analysis.
  • osm4vr, written by ctrlw, allows users to explore the world in virtual reality using OpenStreetMap data. The tool supports static and dynamic loading of OSM data, including building footprints and simple 3D structures, and uses the A-Frame framework for VR experiences. It allows users to fly around VR environments and includes a search function for locating places.

Releases

  • The latest July release of Organic Maps introduced two major features funded by the NGI0 Entrust Fund: improved address lookup in the US using TIGER data and improved text rendering for various scripts including Devanagari, Arabic and Thai. Other updates included new fonts, improved map interaction and fixes for Android and iOS.
  • The OSM Apps Catalog now supports Wikidata and, together with the OSM wiki and taginfo, over 1000 unique apps using OSM data are documented.
  • GraphHopper has introduced a user-friendly update to its mapping service, ‘Route Planning Step-by-Step’, which allows walkers and cyclists to easily create and modify routes by right-clicking or long-pressing on the map to set start and end points and add additional locations. This update improved route customisation directly on desktop and mobile devices, making the planning process more intuitive and flexible.

Did you know …

  • … FieldMaps provides two types of global edge-matched subnational boundaries datasets? The ‘Humanitarian’ dataset uses OCHA Common Operational Datasets and geoBoundaries, integrated with OpenStreetMap for edge matching, for humanitarian use. The ‘Open’ dataset uses only geoBoundaries for clear licensing; it is suitable for academic or commercial use, with the US Geological Survey used for edge matching. Both datasets require attribution and open access to derived works.
  • … osm-api-js is a JavaScript/TypeScript wrapper for the OpenStreetMap API? It provides features such as automatic conversion of OSM XML to JSON, OAuth 2 authentication, and compatibility with both Node.js and browser environments. This library provides various methods to interact with OSM data, including access to features, changesets, user data and more, and aims to simplify the integration of OSM functionality into applications.
  • … OpenCage provides educational content on geocoding, OpenStreetMap, open data and unique geographic facts, hosts monthly geo quizzes and promotes its geocoding API and related services on its Geothreads blog?

OSM in the media

  • Simon Poole highlighted the redirection of Potlatch’s Wikipedia page to the general OpenStreetMap page, removing the article’s detailed content. This change will affect users looking for specific information about Potlatch. In addition, Tim Berners-Lee’s TED Talk highlighted the importance of open data and advocated for its global adoption and innovation potential, which resonates with the open data ethos of OpenStreetMap and its tools.

Other “geo” things

  • Mark Litwintschik described an AI model that extracted over 280 million building footprints from high-resolution imagery across East Asia, using 100 TB of imagery from Google Earth. He explained his setup and analysis process, highlighting the accuracy and challenges of the dataset, and includes steps for using Python and DuckDB for data handling.
  • A new machine learning framework developed by IIASA researchers forecasts global rooftop growth from 2020 to 2050, supporting sustainable energy planning and urban development. Using big data from millions of building footprints and other geospatial datasets, the study predicts a significant increase in rooftop area, particularly in emerging economies, highlighting the potential for rooftop solar.
  • The EU’s Next Generation Internet (NGI) programme, which has funded the development of open source software, is at risk of being terminated according to an internal document. This possibility has raised concerns among developers, such as those at Framasoft, who rely on NGI for support. Despite the current uncertainty, the EU may rebrand the initiative as ‘Open Europe Stack’ under a new programme, albeit with reduced funding and increased bureaucracy. The decision will be formalised in 2025.
  • OpenCage’s Mastodon #geoweirdness thread continued by focusing on the Lesser Antilles, having previously covered the Greater Antilles. The series looked at the unique and interesting geographical features of these Caribbean islands.
  • Paul Knightly discussed the problems with Google Maps and other driving apps, following up on a New York Times op-ed that highlighted shortcomings in these apps, such as directing drivers to unsafe or inefficient routes (we reported earlier). The conversation highlighted the need for better map data and app functionality to improve the user experience and safety.
  • Esri has integrated Overture Maps data into ArcGIS. This collaboration aims to improve data accuracy and support a variety of public and private sector applications, providing users with customisable map styles and new data themes.
  • Radar imaging has revealed an accessible cave conduit beneath the Mare Tranquillitatis (Sea of Tranquility) on the Moon. The discovery, detailed in Nature Astronomy and reported by Gizmodo, suggests the presence of a stable lunar lava tube that could provide shelter for future lunar explorers. The radar data suggests that this cave is structurally sound and could provide protection from cosmic radiation, temperature extremes and micrometeorite impacts, making it a promising candidate for future human habitation on the Moon.
  • Insidemap described a collaborative cultural project to document dry-stone dwellings in the Pyrenees region using the Wikipedra database, a cross-border initiative to catalogue these structures. The project uses various methods, including aerial photography and field verification, to map and preserve these historic structures, with plans to expand the data and improve public access through platforms such as uMap.
  • Explore the new QGIS website, which went live on Friday 12 July.
  • Bayerischer Rundfunk highlighted the security risks posed by the trade in location data, showing how detailed movement profiles of individuals, including military and intelligence personnel, can be reconstructed using data from smartphone apps. Their investigation reveals significant vulnerabilities, particularly for sensitive locations such as military bases, and highlighted the need for stricter regulations and awareness to prevent spying through commercially available data.
  • TomTom and Microsoft have partnered to create AI-enabled smart maps, with the aim of improving navigation and geospatial services by integrating Microsoft’s AI technologies with TomTom’s map data to provide more accurate, responsive and intelligent mapping solutions for various applications.
  • Sen Kushida discussed the historical significance of abandoned railway lines in Japan, highlighting their unique features and the renewed importance of rail freight due to the current shortage of truck and bus drivers.

Upcoming Events

Where What Online When Country
Łódź State of the Map Europe 2024 2024-07-18 – 2024-07-21 flag
Preet Vihar Tehsil 10th OSM Delhi Mapping Meetup 2024-07-21 flag
München Mapathon @ TU Munich 2024-07-22 flag
Richmond MapRVA – Bike Lane Surveying & Mapping Meetup 2024-07-23 flag
Stadtgebiet Bremen Bremer Mappertreffen 2024-07-22 flag
San Jose South Bay Map Night 2024-07-24 flag
Berlin OSM-Verkehrswende #61 2024-07-23 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting 2024-07-25
Wien 72. Wiener OSM-Stammtisch 2024-07-25 flag
Lübeck 144. OSM-Stammtisch Lübeck und Umgebung 2024-07-25 flag
Gambir Mapping Talks: OpenSource WebGis dengan OpenStreetMap 2024-07-26 flag
Bengaluru GeoMeetup Bengaluru 2024-07-27 flag
Potsdam Radnetz Brandenburg Mapping Abend #8 2024-07-30 flag
Ondres Panoramax Partie – Pays Basque Sud Landes 2024-07-31 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen (online) 2024-07-31 flag
Brazaville State of the Map Congo 2024-08-01 – 2024-08-03 flag
OSMF Engineering Working Group meeting 2024-08-02
中正區 OpenStreetMap x Wikidata Taipei #67 2024-08-05 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by MatthiasMatthias, PierZen, Raquel Dezidério Souto, SeverinGeo, Strubbl, TheSwavu, barefootstache, derFred, isoipsa, mcliquid, miurahr, rtnf.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Summary: Quechua communities around the world gather millions of people. Unfortunately, there are only a few resources available in Quechua language, and they are mainly stored in unstructured formats. In this post, we will present the idea of building a knowledge base to support under-resourced language communities through wiki projects.

Ideas for engaging under-resourced language speakers

The availability of interoperable linguistics resources is nowadays more urgent in order to save and help under-resourced languages and their communities. Despite these efforts, not all languages are represented or made accessible in a structured format, and their indigenous communities and their resources have received less attention.

To overcome these limitations, we propose a general approach of building knowledge bases for under-resourced languages based on the Wikimedia infrastructure. Our approach has been validated with the Puno Quechua language, and it is described in the following Qichwabase workflow:

Qichwabase workflow to build the Quechua Language and Knowledge Base
  1. Identifying sources: In this step, various data and knowledge sources are retrieved and collected. For instance, dictionaries, vocabularies, multiword expressions, books, etc.
  2. Data modelling: It describes the structure under which the identified resources will be modelled. For example, for describing lexicographical data, it is a good approach to visit all documentation on Wikidata Lexemes.
  3. Hosting infrastructure: In this step, a host for the knowledge base is selected, e.g. Wikibase is the infrastructure that drives Wikidata.
  4. Knowledge ingestion: It describes the ways to validate and import data into the knowledge base. For instance, it can be manual or semi-automatic.
    • Reconciliation (OpenRefine): It is important to avoid duplicates into the knowledge base. The reconciliation task prevents from importing duplicate entities, lexemes, etc.
    • WikibaseIntegrator: It allows the ingestion of large amounts of data into the knowledge base.
    • EntitySchema: It allows the definition of constraints for modelled data in the knowledge base. Furthermore, it is also possible to create forms in order to allow users to enter data that must conform to the constraints.
  5. Deployment infrastructure: It should allow the language community to contribute, improve, and validate the knowledge in the knowledge base. Additionally, it must provide ways for users, researchers, and language learners to retrieve the knowledge.

What were the objectives?

A series online and offline workshops were organized in Peru to familiarize Quechua speakers with the approach and on how they can help and contribute in the curation of the knowledge on Qichwabase. The following objectives have been addressed:

  • Keep the linguistic diversity of Puno quechua orality with Lingualibre, which allows the recording of voice messages for words, phrases, and more.
  • Document current events via Wikimedia Commons, which allows users to contribute with media resources, such as images and recordings.
  • Decentralize demographic contributions with Wikibase, which allows communities to design, create, curate and deploy a mature decentralized knowledge base.
  • Engage educators, students, researchers, and developers to work together for a more inclusive internet for the next generations.

How does it relate to the collaboration of the open?

We wanted to show the efforts we are doing regarding the quechua language and its communities. This could help other communities to address similar issues.

  • Collaboration. We developed methods, tools, and guidelines that can be reused across Quechua language communities, and adapted to any community or language.
  • Open. We aimed that minority communities from South America can work closely together in order to leverage and improve the development of common approaches to save knowledge and bring inclusiveness and diversity to the movement.

How this journey started?

In 2019, I started to work on projects that can be aligned with the Movement Strategy, for instance, by reducing the gap for underrepresented communities. I have been presenting my work, first at Wikimania Conference 2019, then working on datathons, courses, and presentations aligned with underrepresented communities as the Quechua community is. Read more: Quechua Language based Knowledge Graph

Reference Work

  • Huaman, E., Lindemann, D., Caruso, V., & Huaman, J. L. (2023). QICHWABASE: A Quechua Language and Knowledge Base for Quechua Communities. arXiv preprint arXiv:2305.06173.
  • Huaman, E., Huaman, J. L., & Huaman, W. (2022, November). Getting Quechua Closer to Final Users Through Knowledge Graphs. In Annual International Conference on Information Management and Big Data (pp. 61-69). Cham: Springer Nature Switzerland.

The Wikimedians of the United Arab Emirates User Group hosted a virtual meetup on May 31, 2024, bringing together Wikimedians for an evening of networking and exciting announcements. The agenda centered around the user group’s strategic roadmap for the next two years, outlining a series of initiatives designed to strengthen Wikipedia’s Arabic content and empower local editors.

The UAE Wikimedians User Group’s roadmap prioritizes the growth and enrichment of Arabic Wikimedia projects. Key areas of focus include:

  • Localize Campaigns to Address Knowledge Gaps

The meetup unveiled two exciting localized campaigns designed to enrich Wikimedia projects with diverse perspectives:

  • SheSaid in MENA: This campaign, a localized adaptation of the global “SheSaid” initiative, will focus on increasing content on Wikiquote related to amplifying the voices of women in the Middle East and North Africa (MENA) region.
  •  “One Librarian, One Editor” program, renamed “One Editor, One Reference” This program provides support to editors, ensuring the accuracy and reliability of Wikipedia articles by fostering a culture of proper citation.
  • Bridge the Gender Gap: By fostering Initiatives that aim to increase female participation in Wikimedia projects, both as editors and contributors.
  • Emirati Women’s Day Edit-a-Thon: This initiative aims to add more Wiki contributions about Emirati Women, highlighting their achievements and contributions to society.

1 Million Wiki Project: A Groundbreaking Collaboration

The virtual meetup served as a platform to announce a groundbreaking collaboration between the UAE Wikimedians User Group and the Emirates Literature Foundation. The ambitious “1 Million Wiki Project”, which will launch in August, aims to publish a staggering 1 million contributions to Wikipedia over the next two years. This project has the potential to significantly expand the Arabic digital knowledge base and provide financial support to volunteers and contributors from different Wikimedia projects, who have ambitious goals to enrich these platforms but lack the financial means to do so.

A Thriving Wikimedia Community in the UAE

The virtual meetup was filled with the enthusiasm and dedication of the attendees, reflecting the UAE’s thriving Wikimedian community. The strategic direction outlined by the user group demonstrates a commitment to inclusivity, knowledge creation, and empowering local editors. With these initiatives in place, the Wikimedians of UAE User Group is poised to make a significant impact on Wikimedia projects in the years to come.

Selected photos from the Meetup

Wikimedians of The United Arab Emirates Chart Course for the Future at Virtual Meetup.jpg

Flattening papers

Saturday, 20 July 2024 03:24 UTC

Fremantle

· archiving · family history · Cossack · ArchivesWiki · Wikimedia ·

This morning I've been working on another round of flattening HMW232, which is a box full of letters, receipts, telegrams, price lists, cheques, product samples, and other documents mostly dating from around the 1880s and '90s and accumulated by my great-great-grandfather Shakespeare Hall. He (and his brother at times?) ran a general store in Cossack (in Western Australia, not the historical Ukranian state), and much of these papers appear to be related to that. I'm not really sure, because at the moment I'm just focussing getting them cleaned, flattened, and stored before starting the scanning process.

My approach at the moment is to clean them of any loose dirt, unfold and flatten them, and add them to manila folders with one to three pages per folder. It seems to work best when there are fewer, so their folds don't interfere with each other. These manila folders are then stacked up in piles of about a dozen, between melamine chipboard boards, in a stack eight boards high. This seems to be about the sensible limit to weight, as well as my patience with this process. It means I do it for a few hours every few months.

After a few months, the papers are taken out, grouped by type, and stored permanently in those white archival folders (I guess we don't call them 'manila' because they're the wrong colour?) and kept in polyprop archive boxes. The scans go on Commons and the archival descriptions on ArchivesWiki.

How Many Languages Does Wikimedia Search Support?

Thursday, 18 July 2024 18:33 UTC

TL;DR: On-wiki search “supports” a lot of “languages”. “Search supports more than 50 language varieties” is a defensible position to take. “Search supports more than 40 languages” is 100% guaranteed! Precise numbers present a philosophical conundrum.

Recently, someone asked the Wikimedia Search Platform Team how many languages we support.

This is a squishy question!

The definition of what qualifies as a language is very squishy. We can try to avoid some of the debate by outsourcing the decision to the language codes we use—different codes equal different languages—though it won’t save us.

Another squishy concept is what we mean by “support”, since the level of language-specific processing provided for each language varies wildly, and even what it means to be “language-specific” is open to interpretation. But before we unrecoverably careen off into the land of philosophy of language, let’s tackle the easier parts of the question.

Full Support

“Full” support for many languages means that we have a stemmer or tokenizer, a stop word list, and we do any necessary language-specific normalization. (See the Anatomy of Search series of blog posts, or the Bare-Bones Basics of Full-Text Search video for technical details on stemmers, tokenizers, stop words, normalization, and more.)

CirrusSearch/Elasticsearch/Lucene

The wiki-specific custom component of on-wiki search is called CirrusSearch, which is built on the Elasticsearch search engine, which in turn is built on the Apache Lucene search library.

Out of the box, Elasticsearch 7.10 supports these 33 languages, so CirrusSearch does, too.

  • Arabic, Armenian, Basque, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Latvian, Lithuanian, Norwegian, Persian, Portuguese, Romanian, Russian, Sorani (Central Kurdish), Spanish, Swedish, Turkish, and Thai.

Notes:

  • Sorani has language code ckb, and it is often called Central Kurdish in English.
  • Persian and Thai do not have stemmers, but that seems to be because they don’t need them.

Elasticsearch 7.10 also has two other language analyzers:

  • The “Brazilian” analyzer is for Brazilian Portuguese, which is represented by a sub-language code (pt-br). However, the Brazilian analyzer has all separate components, and we do use it for the brwikimedia wiki (“Wiki Movimento Brasil”).
  • The “CJK” (which stands for “Chinese, Japanese, and Korean”) analyzer only normalizes non-standard half-width and fixed-width characters (ア→ア and A→A), breaks up CJK characters into overlapping bigrams (e.g., ウィキペディア is indexed as ウィ, ィキ, キペ, ペデ, ディ, and ィア), and applies some English stop words. That’s not really “full” support, so we won’t count it here. (We also don’t use it for Chinese or Korean.)

We will count Brazilian Portuguese as a language that we support, but also keep a running sub-tab of “maybe only sort of distinct” language varieties.

We’ll come back to Chinese, Japanese, Korean, and the CJK analyzer a bit later.

Simplified language tree in Swedish, Public Domain, by Ingwik

Other Open Source Analysis

We have found some open source software that does stemming or other processing for particular languages. Some as Elasticsearch plugins, some as stand-alone Java code, and some in other programming languages. We have used, wrapped, or ported as needed to make the algorithms available for our wikis.

  • We have open-source Serbian, Esperanto, and Slovak stemmers that we ported to Elasticsearch plugins.
    • There are currently no stop word lists for these languages. However, for a typical significantly inflected alphabetic Indo-European language, a decent stemmer is the biggest single improvement that can be added to an analysis chain for that language. Stop words are very useful, but general word statistics will discount them even without an explicit stop word list.
    • Having a stemmer (for a language that needs one) can count as the bare minimum for “full” support.

[†] English is weird in that it is not significantly inflected. Non-Indo-European languages can have very different inflection patterns (like Inuit—so much!—or Chinese—so little!), and non-alphabetic writing systems (like Arabic or Chinese) can have significantly different needs beyond stemming to count as “fully” supported.

  • For Chinese (Mandarin) we have something beyond the not-so-smart (but much better than nothing!) CJK analyzer provided by Elasticsearch/Lucene. Chinese doesn’t really need a stemmer, but it does need a good tokenizer to break up strings of text without spaces into words. That’s the most important component for Chinese, and we found an open-source plugin to do that. Our particular instantiation of Chinese comes with additional complexity because we allow both Traditional and Simplified characters, often in the same sentence. We have an additional open-source plugin to convert everything to Simplified characters internally.
  • For Hebrew we found an open-source Elasticsearch plugin that does stemming. It also handles the ambiguity caused by the lack of vowels in Hebrew (by sometimes generating more than one stem).
  • For Korean, we have another open-source plugin that is much better than the very basic processing provided by the CJK analyzer. It does tokenizing and part-of-speech tagging and filtering.
  • For Polish and Ukrainian, we found an open-source plugin for each that provides a stemmer and stop word list. They both needed some tweaking to handle odd cases, but overall both were successes.

Shared Configs

Some languages come in different varieties. As noted before, the distinction between “closely related languages” and “dialects” is partly historical, political, and cultural. Below are some named language varieties with distinct language codes that share language analysis configuration with another language. How you count these is a philosophical question, so we’ll incorporate them into our numerical range.

  • Egyptian Arabic and Moroccan Arabic use the same configuration as Standard Arabic. Originally they had some extra stop words, but it turned out to be better to use those stop words in Standard Arabic, too. Add two languages/language varieties.
  • Serbo-Croatian—also called Serbo-Croat, Serbo-Croat-Bosnian (SCB), Bosnian-Croatian-Serbian (BCS), and Bosnian-Croatian-Montenegrin-Serbian (BCMS)—is a pluricentric language with four mutually intelligible standard varieties, namely Serbian, Croatian, Bosnian, and Montenegrin. For various historical and cultural reasons, we have Serbian, Croatian, and Bosnian (but no Montenegrin) wikis, as well as Serbo-Croatian wikis. The Serbian and Serbo-Croatian Wikipedias support Latin and Cyrillic, while the Croatian and Bosnian Wikipedias are generally in Latin script. The Bosnian, Croatian, and Serbo-Croatian wikis use the same language analyzer as the Serbian wikis. Add three languages/language varieties.
  • Malay is very closely related to Indonesian—close enough that we can use the Elasticsearch Indonesian analyzer for Malay. (Indonesian is a standardized variety of Malay.) Add another language/language variety.

Moderate Language-Specific Processing

These languages have some significant language-specific(ish) processing that improves search, while still lacking some obvious component (like a stemmer or tokenizer).

  • For Japanese, we currently use the CJK analyzer (described above). This is the bare minimum of custom configuration that might be considered “moderate” support. It also stretches the definition of “language-specific”, since bigram tokenizing—which would be useful for many languages without spaces—isn’t really specific to any language, though the decision to apply it is language-specific.
    • There is a “full” support–level Japanese plugin (Kuromoji) that we tested years ago (and have configured in our code, even), but we decided not to use it because of some problems. We have a long-term plan to re-evaluate Kuromoji (and our ability to customize it for our use cases) and see if we could productively enable it for Japanese.
  • The Khmer writing system is very complex and—for Historical Technological Reasons™—there are lots of ways to write the same word that all look the same, but are underlyingly distinct sequences of characters. We developed a very complex system that normalizes most sequences to a canonical order. The ICU Tokenizer breaks up Khmer text (which doesn’t use spaces between words) into orthographic syllables, which are very often smaller than words. It’s somewhat similar to breaking up Chinese into individual characters—many larger “natural” units are lost, but all of their more easily detected sub-units are indexed for searching.
    • This is probably the maximum level of support that counts as “moderate”. It’s tempting to move it to “full” support, but true full support would require tokenizing the Khmer syllables into Khmer words, which requires a dictionary and more complex processing. On the other hand, our support for the wild variety of ways people can (and do!) write Khmer is one place where we currently outshine the big internet search engines.
  • For Mirandese, we were able to work with a community member to set up elision rules (for word-initial l’, d’, etc., as in some other Romance languages) and translate a Portuguese stop word list.
Romance languages diagram, CC BY-SA 4.0, by El Bux

Minimal Language-Specific Processing

Azerbaijani, Crimean Tatar, Gagauz, Kazakh, and Tatar have the smallest possible amount of language-specific processing. Like Turkish, they use the uppercase/lowercase pairs İ/i and I/ı, so they have the Turkish version of lowercasing configured.

However, Tatar is generally written in Cyrillic (at least on-wiki). Kazakh is also generally in Cyrillic on-wiki, and the switch to using İ/i and I/ı in the Kazakh Latin script was only made in 2021, so maybe we should count that as half?

(Un)Intentional Specific Generic Support

Well there’s a noun phrase you don’t see every day—what does it even mean?

Sometimes a language-specific (or wiki community–specific) issue gets generalized to the point where there’s no trace of the motivating source. Conversely, a generic improvement can have an outsized impact on a specific language, wiki, or community.

For example, the Nias language uses lots of apostrophes, and some of the people in its Wikipedia community are apparently more comfortable composing articles in word processors, with the text then being copied to the Nias Wikipedia. Some word processors like to “smarten” quotes and apostrophes, automatically replacing them with the curly variants. This kind of variation makes searching hard. When I last looked (some time ago) it also resulted in Nias Wikipedia having article titles that only differ by apostrophe curliness—I assume people couldn’t find the one so they created the other. Once we got the Phab ticket, we added some Nias-specific apostrophe normalization that fixed a lot of their problems.

Does Nias-specific apostrophe normalization count as supporting Nias? It might arguably fall into the “minimal” category.

About a year later, we cautiously and deliberately tested similar apostrophe normalization for all wikis, and eventually added it as a default, which removed all Nias-specific config in our code.

Does general normalization inspired by a strong need from the Nias Wiki community (but not really inherent to the Nias language) count as supporting Nias? I don’t even know.

At another time, I extended some general normalization upgrades that remove “non-native” diacritics to a bunch of languages, and an unexpectedly large benefit was that it was super helpful in Basque, because Basque searchers often ignore Spanish diacritics on Spanish words, while editors use the correct diacritics in articles, creating a mismatch.

If I hadn’t bothered to do some analysis after going live, I wouldn’t have known about this specific noticeable improvement. On the other hand, if I’d known about the specific problem and there wasn’t a semi-generic solution, I would’ve wanted to implement something Basque-specific to solve it.

Does a general improvement that turns out to strongly benefit Basque count as supporting Basque? I don’t even know! (In practice, this is a slightly philosophical question, since Basque has a stemmer and stopword list, too, so it’s already otherwise on the “full support” list.)

I can’t think of any other language-specific cases that generalized so well—though Nias wasn’t the first or only case of apostrophe-like characters needing to be normalized.

Of course, general changes that were especially helpful to a particular language are easy to miss, if you don’t go looking for them. Even if you do, they can be subtle. The Basque case was much easier for me, personally, to notice, because I don’t speak Basque, but I know a little Spanish, so the Spanish words really stood out as such when looking at the data.

Vague Categorical Support

It’s easy enough to say that the CJK analyzer supports Japanese (where we are currently using it) and that it would be supporting Chinese and Korean if we were using it for those languages—in small part because it has limited scope, and in large part because it seems specific to Chinese, Japanese, and Korean because of the meaning of “CJK”.

But what about a configuration that is not super specific, but still applied to a subset of languages?

Back in the day, we identified that “spaceless languages” (those whose writing system doesn’t put spaces between words) could benefit from (or be harmed by) specific configurations.

We identified the following languages as “spaceless”. We initially passed on enabling an alternate ranking algorithm (BM25) for them (Phab T152092), but we also deployed the ICU tokenizer for them by default.

  • Tibetan, Dzongkha, Gan, Japanese, Khmer, Lao, Burmese, Thai, Wu, Chinese, Classical Chinese, Cantonese, Buginese, Min Dong, Cree, Hakka, Javanese, and Min Nan.

14 of those are new.

We eventually did enable BM25 for them, but this list has often gotten special consideration and testing to make sure we don’t unexpectedly do bad things to them when we make changes that seem fine for languages with clearer word boundaries (like Phab T266027).

And what about the case where the “category” we are trying to support is “more or less all of them”? Our recent efforts at cross-wiki “harmonization”—making all language processing that is not language-specific as close to the same as possible on all wikis (see Phab T219550)—was a rising language tide that lifted all/most/many language boats. (An easy to understand example is acronym processing, so that NASA and N.A.S.A. can match more easily. However, some languages—because of their writing systems—have few if any native acronyms. Foreign acronyms (like N.A.S.A.) still show up, though.)

Family tree of the Indo-European languages, CC BY-SA 4.0, by EnriBrahimaj

Beyond Language Analysis

So far we’ve focussed on the most obviously languagey of the language support in Search, which is language analysis. However, there are other parts of our system that support particular wikis in a language-specific way.

Learning to Rank

Learning to Rank (LTR) is a plugin that uses machine learning—based on textual properties and user behavior data—to re-rank search results to move better results higher in the result list.

It makes use of many ranking signals, including making wiki-specific interpretations of textual properties—like word frequency stats, the number of words in a query or document, the distribution of matching terms, etc.

Arguably some of what the model learns is language-specific. Some is probably wiki-specific (say, because Wikipedia titles are organized differently than Wikisource titles), and some may be community-specific (say, searchers search differently on Wikipedia than they do on Wiktionary).

The results are the same or better than our previously hand-tuned ranking, and the models are regularly retrained, allowing them to keep up with changes to the way searchers behave in those languages on those wikis.

Does that count as minimal language-specific support? Maybe? Probably?

We tested the LTR plugin on 18 wikis:

  • Arabic, Chinese, Dutch, Finnish, French, German, Hebrew, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Swedish, and Vietnamese.

One of those, Vietnamese, is new to the list.

Cross-Language Searching

Years ago we worked on a project on some wikis to do language detection on queries that got very few or no results, to see if we could provide results from another wiki. The process was complicated, so we only deployed it to nine of the largest (by search volume) Wikipedias:

  • Dutch, English, French, German, Italian, Japanese, Portuguese, Spanish, and Russian.

Those are all covered by language analyzers above. However, for each of those wikis, we limited the specific languages that could be identified by the language-ID tool (called TextCat), to maximize accuracy and relevance.

The specific languages allowed to be identified per wiki are listed in a table in a write up about the project.

The consolidated list of those languages is:

  • Afrikaans, Arabic, Armenian, Bengali, Breton, Burmese, Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Korean, Latin, Latvian, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Tagalog, Telugu, Thai, Ukrainian, Urdu, and Vietnamese.

Nine of those are not covered by the language analyzers, and eight are not covered by the LTR plugin: Afrikaans, Breton, Burmese, Georgian, Icelandic, Latin, Tagalog, Telugu, and Urdu. (Vietnamese is covered both by Learning to Rank and TextCat.)

Does sending queries from the largest wikis to other wikis count as some sort of minimal support? Maybe. Arguably. Perhaps.

Conclusions?

What, if any, specific conclusions can we draw? Let’s look again at the list we have so far (even though it is also right above.)

We have good to great support (“moderate” or “full”) for 44 inarguably distinct languages, though it’s very reasonable to claim 51 named language varieties.

The Search Platform team loves to make improvements to on-wiki search that are relevant to all or almost all languages (like acronym handling) or that help all wikis (like very basic parsing for East Asian languages on any wiki). So, how many on-wiki communities does the Search team support? All of them, of course!

Exactly how many languages is that? I don’t even know.

(This blog post is a snapshot from July 2024. If you are from the future, there may be updated details on MediaWiki.org.)

Problems with our secondary datacenter

Thursday, 18 July 2024 12:50 UTC

Jul 18, 12:50 UTC
Resolved - This incident has been resolved.

Jul 18, 12:49 UTC
Update - Everything is back to normal.

Jul 18, 12:02 UTC
Monitoring - A fix has been implemented and we are monitoring the results.

Jul 18, 11:56 UTC
Investigating - We are currently investigating this issue.

Semantic MediaWiki 4.2.0 released

Thursday, 18 July 2024 00:00 UTC

Discover what is new in Semantic MediaWiki 4.2.

As maintainers of Semantic MediaWiki (SMW), we are responsible for releasing new versions. Today we released version 4.2, the 69th release of SMW. This release follows SMW 4.1.3, which was released on February 17th 2024.

Highlights

Semantic MediaWiki now comes with out-of-the-box faceted search via the new Special:FacetedSearch page.

Chameleon skin

Version 4.2 brings compatibility with MediaWiki 1.40, MediaWiki 1.41, and PHP 8.2. It also improves compatibility with MediaWiki 1.42, which is expected to mostly work.

You can now set the source for queries run via the "ask" or "askargs" API endpoints via the new source parameter.

Bug fixes

  • Fixed ask queries with a conjunction of negations failing when using the Elasticsearch backend
  • Fixed property linking for languages with fallback languages
  • Fixed footer sorting on table results
  • Improved handling of logos

Credits

Semantic MediaWiki logo

Over 15 people contributed to this release. We would like to thank all contributors.

Special credits go to Bertrand Gorge, Niklas Laxström, Mark A. Hershberger, Jaider Andrade Ferreira, Youri van den Bogert and alistair3149.

Upgrading

Semantic MediaWiki 4.2 is a minor release. It contains new features, improvements, and bug fixes. Because it is a minor release, it does not contain any breaking changes, does not require running the update.php script, and does not drop support for older versions of MediaWiki.

We recommend that everyone running older versions of SMW upgrades. Especially if you are running SMW 4.0.1 or older, as these versions contain a known security vulnerability.

Get the new version via Composer:

  • Step 1: if you are upgrading from SMW older than 4.0.0, ensure the SMW version in composer.json is ^4.2.0
  • Step 2: run composer in your MediaWiki directory: composer update --no-dev --optimize-autoloader

Get the new version via Git:

This is only for those that have installed SMW via Git.

  • Step 1: do a git pull in the SemanticMediaWiki directory
  • Step 2: run composer update --no-dev --optimize-autoloader in the MediaWiki directory

Professional Semantic MediaWiki Services

At Professional Wiki, we provide Semantic MediaWiki services, including SMW hosting, SMW software development, SMW consulting, and various MediaWiki services.

You can try out Semantic MediaWiki via the free trial on ProWiki.

Some time ago I celebrated a birthday in an Italian restaurant in Haifa, and I saw a pack of pasta of a curious shape on a shelf there. I asked whether they serve it or sell it.

“No”, they told me, “it’s just a display”.

This answer didn’t satisfy me.

I added the pasta’s name, Busiate, to my shopping list.

I searched for it in a bunch of stores. No luck.

I googled for it and found an Israeli importer of this pasta. But that importer only sell in bulk, in crates of at least 12 items. That’s too much.

And of course, I searched Wikipedia, too. There’s an article about Busiate in the English Wikipedia. There also an article about this pasta in Arabic and in Japanese, but curiously, there’s no article about it in the Wikipedia in the Italian language, nor in the Sicilian language, given that this type of pasta is Sicilian.

So I… did a few things about it.

I improved the article about Busiate in the English Wikipedia: cleaned up references, cleaned up formatting, and updated the links to references.

I also improved the references and the formatting to the article about Pesto alla trapanese, the sauce with which this pasta is traditionally served.

And I cleaned up the Wikidata items associated with the two articles above: Q48852218 (busiate) and Q3900766 (pesto alla trapanese).

And I also translated all the names of the Wikidata properties that are used on these items to Hebrew. I usually do this when I do something with any Wikidata item: I only need to translate these property names once, and after that all the people who use Wikidata in Hebrew will see items in which these properties are used in Hebrew. There are more than 6000 properties, and the number is constantly growing, so it’s difficult to have everything translated, but every little translation makes the experience more completely translated for everyone.

I added references to the Wikidata item about the sauce. Wikidata must have references, too, and not only Wikipedia. I am not enthusiastic about adding random recipe sites that I googled up as references, but luckily, I have The Slow Food Dictionary of Italian Regional Cooking, which I bought in Italy, or more precisely in Esino Lario, where I went for the 2016 Wikimania conference.

Now, a book in Wikidata is not just a book. You need to create an item about the book, and another item about the edition of a book. And since I created those, I create Wikidata items for the dictionary’s original Italian author Paola Gho, for the English translator John Irving, and for the publishing house, Slow Food.

And here’s where it gets really nerdy: I added each of the sauce’s ingredients as values of the “has part” property, and added the dictionary as a reference for each entry. I initially thought that it’s overdone, but you know what?—When we’ll have robot cooks, as in the movie I, Robot, busiati col pesto trapanese will be one of the first things that they will know how to prepare. One of the main points of Wikidata is that it’s supposed to be easy to read for both people and machines.

And since I have a soft spot for regional languages, I also added the sauce’s Sicilian name under the “native label” property: pasta cull’àgghia. The aforementioned Slow Food Dictionary of Italian Regional Cooking actually does justice to the regional part in its title, and gives the names of the different food items in the various regional languages of Italy, so I could use it as a reliable source.

And I translated the Wikipedia article into Hebrew: בוזיאטה.

And I also created the “Sicilian cuisine” category in the Hebrew Wikipedia. A surprisingly large number of articles already existed, filed under “Italian cuisine”: Granita, Arancini, Cannoli, and a few others. Now they are organized under Sicilian cuisine. (I hope that some day Wikipedia categories will be managed more automatically with the help of Wikidata, so that I wouldn’t have to create them by hand.)

Finally, I found the particular issue of the Gazzetta Ufficiale of the Italian Republic, in which busiati col pesto trapanese was declared as a traditional agricultural food product, and I added that issue as a reference to the Wikidata item, as well.

And all of this yak shaving happened before I even tasted the damn thing!

So anyway, I couldn’t find this pasta anywhere, and I couldn’t buy it from the importer’s website, but I wanted it really badly, so I called the importer on the phone.

They told me they don’t have any stores in Jerusalem that buy from them, but they suggested checking a butcher shop in Mevaseret Tsiyon, a suburb of Jerusalem. Pasta in a butcher shop… OK.

So I took a bus to Mevaseret, and voilà: I found it there!

And I made Busiate, and I made the sauce! It’s delicious and totally worth the effort.

Of course, I could just eat it without editing Wikipedia and Wikidata on the way, but to me that would be boring.

My wife and my son loved it.

These are the busiate with pesto alla trapanese that I made at home. I uploaded this photo to Wikimedia Commons and added it to the English Wikipedia article as an illustration of how Busiate are prepared. I wonder what do Wikipedians from Sicily think of it.

There is a story behind every Wikipedia article, Wikidata item, and Commons image. Millions and millions of stories. I wrote mine—you should write yours!

It sometimes happens in people’s lives that someone tells them something that sounds true and obvious at the time. It turns out that it actually is objectively true, and it is also obvious, or at least sensible, to the person who hears it, but it’s not obvious to other people. But it was obvious to them, so they think that it is obvious to everyone else, even though it isn’t.

It happens to everyone, and we are probably all bad at consistently noticing it, remembering it, and reflecting on it.

This post is an attempt to reflect on one such occurrence in my life; there were many others.

(Comment: This whole post is just my opinion. It doesn’t represent anyone else. In particular, it doesn’t represent other translatewiki.net administrators, MediaWiki developers or localizers, Wikipedia editors, or the Wikimedia Foundation.)


There’s the translatewiki.net website, where the user interface of MediaWiki, the software that powers Wikipedia, as well as of some other Free Software projects, is translated to many languages. This kind of translation is also called “localization”. I mentioned it several times on this blog, most importantly at Amir Aharoni’s Quasi-Pro Tips for Translating the Software That Powers Wikipedia, 2020 Edition.

Siebrand Mazeland used to be the community manager for that website. Now he’s less active there, and, although it’s a bit weird to say it, and it’s not really official, these days I kind of act like one of its community managers.

In 2010 or so, Siebrand heard something about a bug in the support of Wikipedia for a certain language. I don’t remember which language it was or what the bug was. Maybe I myself reported something in the display of Hebrew user interface strings, or maybe it was somebody else complaining about something in another language. But I do remember what happened next. Siebrand examined the bug and, with his typical candor, said: “The fix is to complete the localization”.

What he meant is that one of the causes of that bug, and perhaps the only cause, was that the volunteers who were translating the user interface into that language didn’t translate all the strings for that feature (strings are also known as “messages” in MediaWiki developers’ and localizers’ jargon). So instead of rushing to complain about a bug, they should have completed the localization first.

To generalize it, the functionality of all software depends, among many other things, on the completeness of user interface strings. They are essentially a part of the algorithm. They are more presentation than logic, but the end user doesn’t care about those minor distinctions—the end user wants to get their job done.

Those strings are usually written in one language—often English, but occasionally Japanese, Russian, French, or another one. In some software products, they may be translated into other languages. If the translation is incomplete, then the product may work incorrectly in some ways. On the simplest level, users who want to use that product in one language will see the user interface strings in another language that they possibly can’t read. However, it may go beyond that: writing systems for some languages require special fonts, applying which to letters from another writing system may cause weird appearance; strings that are supposed to be shown from left to right will be shown from right to left or vice versa; text size that is good for one language can be wrong for another; and so forth.

In many cases, simply completing the translation may quietly fix all those bugs. Now, there are reasons why the translation is incomplete: it may be hard to find people who know both English and this language well; the potential translator is a volunteer who is busy with other stuff; the language lacks necessary technical terminology to make the translations, and while this is not a blocker —new terms can be coined along the way—, this may slow things down; a potential translator has good will and wants to volunteer their time, but hasn’t had a chance to use the product and doesn’t understand the messages’ context well enough to make a translation; etc. But in theory, if there is a volunteer who has relevant knowledge and time, then completing the translation, by itself, fixes a lot of bugs.

Of course, it may also happen that the software actually has other bugs that completing the localization won’t fix, but that’s not the kind of bugs I’m talking about in this post. Or, going even further, software developers can go the extra mile and try to make their product work well even if the localization is incomplete. While this is usually commendable, it’s still better for the localizers to complete the localization. After all, it should be done anyway.

That’s one of the main things that motivate me to maintain the localization of MediaWiki and its extensions into Hebrew at 100%. From the perspective of the end users who speak Hebrew, they get a complete user experience in their language. And from my perspective, if there’s a bug in how something works in Wikipedia in Hebrew, then at least I can be sure that the reason for it is not the translation is incomplete.


As one of the administrators of translatewiki, I try my best to make complete localization in all languages not just possible, but easy.¹ It directly flows out of Wikimedia’s famous vision statement:

Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.

I love this vision, and I take the words “Every single human being” and “all knowledge” seriously; they implicitly mean “all languages”, not just for the content, but also for the user interface of the software that people use to read and write this content.

If you speak Hindi, for example, and you need to search for something in the Hindi Wikipedia, but the search form works only in English, and you don’t know English, finding what you need will be somewhere between hard and impossible, even if the content is actually written in Hindi somewhere. (Comment #1: If you think that everyone who knows Hindi and uses computers also knows English, you are wrong. Comment #2: Hindi is just one example; the same applies to all languages.)

Granted, it’s not always actually easy to complete the localization. A few paragraphs above, I gave several general examples of why it can be hard in practice. In the particular case of translatewiki.net, there are several additional, specific reasons. For example, translatewiki.net was never properly adapted to mobile screens, and it’s increasingly a big problem. There are other examples, and all of them are, in essence, bugs. I can’t promise to fix them tomorrow, but I acknowledge them, and I hope that some day we’ll find the resources to fix them.


Many years have passed since I heard Siebrand Mazeland saying that the fix is to complete the localization. Soon after I heard it, I started dedicating at least a few minutes every day to living by that principle, but only today I bothered to reflect on it and write this post. The reason I did it today is surprising: I tried to do something about my American health insurance (just a check-up, I’m well, thanks). I logged in to my dental insurance company’s website, and… OMFG:

What you can see here is that some things are in Hebrew, and some aren’t. If you don’t understand the Hebrew parts, that’s OK, because you aren’t supposed to: they are for Hebrew speakers. But you should note that some parts are in English, and they are all supposed to be in Hebrew.

For example, you can see that the exclamation point is at the wrong end of “Welcome, Amir!“. The comma is placed unusually, too. That’s because they oriented the direction of the page from right to left for Hebrew, but didn’t translate the word “Welcome” in the user interface.² If they did translate it, the bug wouldn’t be there: it would correctly appear as “ברוך בואך, Amir!“, and no fixes in the code would be necessary.

You can also see a wrong exclamation point in the end of “Thanks for being a Guardian member!“.

There are also less obvious bugs here. You can also see that in the word “WIKIMEDIA” under the “Group ID” dropdown, the letter “W” is only partly seen. That’s also a typical RTL bug: the menu may be too narrow for a long string, so the string can be visually truncated, but it should happen at the end of the string and not in the beginning. Because the software here thinks that the end is on the left, the beginning gets truncated instead. This is not exactly an issue that can be fixed just by completing the localization, but if the localization were complete, it would be easier to notice it.

There are other issues that you don’t notice if you don’t know Hebrew. For example, there’s a button with a weird label at the top right. Most Hebrew speakers will understand that label as “a famous website”, which is probably not what it is supposed to say. It’s more likely that it’s supposed to say “published web page”, and the translator made a mistake. Completing the translation correctly would fix this mistake: a thorough translator would review their work, check all the usages of the relevant words, and likely come up with a correct translation. (And maybe the translation is not even made by a human but by machine translation software, in which case it’s the product manager’s mistake. Software should never, ever be released with user interface strings that were machine-translated and not checked by a human.)

Judging by the logo at the top, the dental insurance company used an off-the-shelf IBM product for managing clients’ info. If I ask IBM or the insurance company nicely, will they let me complete the localization of this product, fixing the existing translation mistakes, and filing the rest of the bugs in their bug tracking software, all without asking for anything in return? Maybe I’ll actually try to do it, but I strongly suspect that they will reject this proposal and think that I’m very weird. In case you wonder, I actually tried doing it with some companies, and that’s what happened most of the time.

And this attitude is a bug. It’s not a bug in code, but it is very much a problem in product management and attitude toward business.


If you want to tell me “Amir, why don’t you just switch to English and save yourself the hassle”, then I have two answers for you.

The first answer is described in detail in a blog post I wrote many years ago: The Software Localization Paradox. Briefly: Sure, I can save myself the hassle, but if I don’t notice it and speak about it, then who will?

The second answer is basically the same, but with more pathos. It’s a quote from Avot 1:14, one of the most famous and cited pieces of Jewish literature outside the Bible: If I am not for myself, who is for me? But if I am for my own self, what am I? And if not now, when? I’m sure that many cultures have proverbs that express similar ideas, but this particular proverb is ours.


And if you want to tell me, “Amir, what is wrong with you? Why does it even cross your mind to want to help not one, but two ultramegarich companies for free?”, then you are quite right, idealistically. But pragmatically, it’s more complicated.

Wikimedia understands the importance of localization and lets volunteers translate everything. So do many other Free Software projects. But experience and observation taught me that for-profit corporations don’t prioritize good support for languages unless regulation forces them to do it or they have exceptionally strong reasons to think that it will be good for their income or marketing.

It did happen a few times that corporations that develop non-Free software let volunteers localize it: Facebook, WhatsApp, and Waze are somewhat famous examples; Twitter used to do it (but stopped long ago); and Microsoft occasionally lets people do such things. Also, Quora reached out to me to review the localization before they launched in Hebrew and even incorporated some of my suggestions.³

Usually, however, corporations don’t want to do this at all, and when they do it, they often don’t do it very well. But people who don’t know English want—and often need!—to use their products. And I never get tired of reminding everyone that most people don’t know English.

So for the sake of most humanity, someone has to make all software, including the non-Free products, better localized, and localizable. Of course, it’s not feasible or sustainable that I alone will do it as a volunteer, even for one language. I barely have time to do it for one language in one product (MediaWiki). But that’s why I am thinking of it: I would be not so much helping a rich corporation here as I would be helping people who don’t know English.

Something has to change in the software development world. It would, of course, be nice if all software became Freely-licensed, but if that doesn’t happen, it would be nice if non-Free software would be more open to accepting localization from volunteers. I don’t know how will this change happen, but it is necessary.


If you bothered to read until here, thank you. I wanted to finish with two things:

  1. To thank Siebrand Mazeland again for doing so much to lay the foundations of the MediaWiki localization and the translatewiki community, and for saying that the fix is to complete the localization. It may have been an off-hand remark at the time, but it turned out that there was much to elaborate on.
  2. To ask you, the reader: If you know any language other than English, please use all apps, websites, and devices in this language as much as you can, bother to report bugs in its localization to that language, and invest some time and effort into volunteering to complete the localization of this software to your language. Localizing the software that runs Wikipedia would be great. Localizing OpenStreetMap is a good idea, too, and it’s done on the same website. Other projects that are good for humanity and that accept volunteer localization are Mozilla, Signal, WordPress, and BeMyEyes. There are many others.⁴ It’s one of the best things that you can do for the people who speak your language and for humanity in general.

¹ And here’s another acknowledgement and reflection: This sentence is based on the first chapter of one of the most classic books about software development in general and about Free Software in particular: Programming Perl by Larry Wall (with Randal L. Schwartz, Tom Christiansen, and Jon Orwant): “Computer languages differ not so much in what they make possible, but in what they make easy”. The same is true for software localization platforms. The sentence about the end user wanting to get their job done is inspired by that book, too.

² I don’t expect them to have my name translated. While it’s quite desirable, it’s understandably difficult, and there are almost no software products that can store people’s names in multiple languages. Facebook kind of tries, but does not totally succeed. Maybe it will work well some day.

³ Unfortunately, as far as I can tell, Quora abandoned the development of the version in Hebrew and in all other non-English languages in 2022, and in 2023, they abandoned the English version, too.

⁴ But please think twice before volunteering to localize blockchain or AI projects. I heard several times about volunteers who invested their time into such things, and I was sad that they wasted their volunteering time on this pointlessness. Almost all blockchain projects are pointless. With AI projects, it’s much more complicated: some of them are actually useful, but many are not. So I’m not saying “don’t do it”, but I am saying “think twice”.

IA Upload upgraded

Tuesday, 16 July 2024 09:49 UTC

Fremantle

· IA Upload · PHP · upgrades · Wikimedia ·

I shifted IA Upload on to a new server today, where it's running on Debian 12 and PHP 8.2. So that means it's time to upgrade the tool's PHP dependencies, and as it's a Slimapp app, it seems that the first step is to get simplei18n working with a more modern version of Twig. So it's not going to get done today, it seems…

Redesigned Wikimedia wishlist is open

Tuesday, 16 July 2024 04:40 UTC

Fremantle

· Wikimedia · Community Tech · work ·

The new system for the Community Wishlist was launched yesterday. It replaces the old annual system of having a set period each year when people can propose wishes, with some weeks following of voting etc. In the new system, wishes get submitted whenever, and are gathered together into focus areas and those are what will be voted on (again at any time).

I think it's an improvement. The software for running it certainly is! We've built a data entry form, which reads and writes a wikitext table. There are also other parts that read all the wish templates into a (Toolforge) database and then write out various tables (all wishes, recent ones, etc.) into wiki pages.

There's more info about the launch in a Diff post: Share your product needs with the Community Wishlist

In Janice Radway’s classic Reading the Romance of 1984, she referred to the romance-purchasing customers of a small-town bookstore as a “female community … mediated by the distances of modern mass publishing. Despite the distance, the Smithton women feel personally connected to their favorite authors because they are convinced that these writers know how to make them happy” (Radway 1991, 97).

Reading the Romance is an important work because it gave attention to an otherwise dismissed genre and conceived of the readership as a community, even if only vaguely. Radway partly improved on this in her 1991 edition, admitting her theorization of community was “somewhat anemic in that it fails to specify precisely how membership in the romance-reading community is constituted.” Radway conceded the concept of an “interpretative community” (previously used to refer to critics and scholars of literature) might help, but “it cannot do complete justice to the nature of the connection between social location and the complex process of interpretation” (Radway 1991, 8).

This notion of “interpretive community” was coined in the seven years between her first and second editions. And, as she noted, it wasn’t a great fit. An “interpretive community” is a “collectivity of people who share strategies for interpreting, using, and engaging in communication about a media text or technology” (Lindlof 1988, 2002). Radway’s subjects shared little of this.

Rather, Radway was speaking of parasocial relationships between the readers and the author where mass media permit an “illusion of a face-to-face relationship with the performer” (Horton and Wohl 1956, 215)—the authors, in Radway’s case.

It’s interesting that while the concept of parasociality had existed for decades, Radway overlooked it and instead reached for the wrong one: interpretive communities.

References

Horton, Donald, and R. Richard Wohl. 1956. “Mass Communication and Para-Social Interaction.” Psychiatry 19 (3): 215–29. http://dx.doi.org/10.1080/00332747.1956.11023049.
Lindlof, Thomas R. 1988. “Media Audiences as Interpretive Communities.” Annals of the International Communication Association 11 (1): 81–107. http://dx.doi.org/10.1080/23808985.1988.11678680.
———. 2002. “Interpretive Community: An Approach to Media and Religion.” Journal of Media and Religion 1 (1): 61–74. http://dx.doi.org/10.1207/S15328415JMR0101_7.
Radway, Janice. 1991. Reading the Romance: Women, Patriarchy, and Popular Literature. Chapel Hill: University of North Carolina Press.

Tech News issue #29, 2024 (July 15, 2024)

Monday, 15 July 2024 00:00 UTC
previous 2024, week 29 (Monday 15 July 2024) next

Tech News: 2024-29

weeklyOSM 729

Sunday, 14 July 2024 09:59 UTC

04/07/2024-10/07/2024

lead picture

Gallery of Overpass Ultra map examples [1] | © dschep | map data © OpenStreetMap contributors

Mapping campaigns

  • The humanitarian collaborative mapping campaign in response to the 2024 Rio Grande do Sul Floods (Brazil) is ongoing. The effects of the disaster that led to landslides, floods, and a dam collapse persist and 5,000 people are still homeless in the state. Everyone can collaborate in the open projects.

OpenStreetMap Foundation

  • OpenStreetMap experienced a DDoS attack on Thursday 11 July, causing significant access issues and intermittent service disruptions, which the technical team is actively working to resolve.

Events

  • The State of the Map Working Group is happy to announce that ticketing and programme websites for SotM 2024 are now accessible. Early bird tickets are available at a discounted price until Wednesday 31 July.
  • Did you miss the call for general and academic presentations for the State of the Map 2024? You can still showcase your project or map visualisation by submitting a poster before Sunday 25 August. For inspiration take a look at the posters from SotM 2022.
  • The SotM France 2024 videos are now available on PeerTube .
  • The State of the Map US 2024 highlighted some new developments in pedestrian mapping, the integration of AI into mapping processes, and climate and historical data projects, with presentations on accessibility mapping, OpenStreetMap data validation, and participatory GIS for public land management.

Education

  • The IVIDES.org carried out a hybrid workshop on collaborative mapping with OpenStreetMap and Web mapping using uMap, for a group of geography students from the Federal University of Ceará (Brazil), Pici campus (Fortaleza) and the general public. Raquel Dezidério Souto wrote about this experience in her diary and the files and video are available in Portuguese.

OSM research

  • Lasith Niroshan and James D. Carswell introduced DeepMapper, an end-to-end machine learning solution that automates updates to OpenStreetMap using satellite imagery.

Maps

  • [1] TrailStash, ‘the home for #mapping projects by @dschep’, tooted that they have created a gallery of Overpass Ultra map examples.

OSM in action

  • Bristow_69 noted that the Dialogues en Humanités festival is using a nice OpenStreetMap-based map, but unfortunately has not given proper credit to OpenStreetMap.
  • EMODnet’s (European Marine Observation and Data Network) map viewer includes base and feature layers from OpenStreetMap.
  • NYC Street Map represents an ongoing effort to digitise official street records, bring them together with other street information, and make them easily accessible to the public. The app was developed with OpenMapTiles and OSM contributors’ data. Users can find the official mapped width, name, and status of specific streets and how they may relate to specific properties. It is possible see how the street grid has changed over time in a chosen area.
  • Ola Cabs have replaced Google with OSM in their Ola Maps navigation application. The change aimed to reduce costs and provide faster, more accurate searches and improved routing. This transition is part of Ola’s broader strategy to improve users’ experience and independence of navigation technology, which was first introduced in its electric vehicles with MoveOS 4 earlier this year.
  • UtagawaVTT maintains the web platform Opentraveller, where contributors can register their mountain bike and electric bike travel routes and consult online data.

Software

  • HOT has released the production version of fAIr, an assistant for mapping with AI, to a wider audience of OSM communities. The software has been tested and the production website is now accessible (login with your OSM account).
  • Adam Gąsowski has introduced his OSM Helper UserScript, designed to streamline the use of community-built tools by automatically generating relevant links based on what the user is looking at. Future plans include integrating AI for automated tagging and developing a browser extension for Chrome and Firefox.
  • Gramps Web, the open-source, self-hosted family tree application, has added a historical map layer based on OpenHistoricalMap.
  • The 20.1.0.1 beta release of Vespucci included numerous updates, such as the removal of pre-Android 5 code, improvements to error handling and memory management, enhancements to the property editor, and new features such as GeoJSON label support and layer dragging.

Programming

  • MapBliss is an R package for creating beautiful maps of your Leaflet adventures. It allows users to create print-quality souvenir maps, plot flight paths, control label positions, and add custom titles and borders. The package integrates several dependencies and is open for contributions and feature requests.
  • Mattia Pezzotti is documenting his progress in integrating Panoramax with OpenStreetMap as part of Google Summer of Code 2024, providing weekly updates on new features and improvements such as viewing 360-degree images, adding filters, and improving the user interface. This ongoing project was previously covered in weeklyOSM 723.
  • JT Archie described how they optimised large-scale OpenStreetMap data by converting it to a SQLite database, using full-text search and compression techniques, in particular the Zstandard seekable format, to handle data efficiently and improve query performance.

Did you know …

  • … the release of Taiwan TOPO v2024.07.04 continues the tradition of weekly updates started in September 2016? Taiwan TOPO provides detailed topographic data for Taiwan.

OSM in the media

  • In an op-ed in The New York Times, Julia Angwin criticised society’s overreliance on turn-by-turn navigation in Google Maps and calls for greater investment in OpenStreetMap as a public good.

Other “geo” things

  • The Ammergauer Alpen natural park has implemented a visitor monitoring system using sensors and GPS data to manage and protect natural areas while supporting sustainable tourism.
  • Geomob has tooted about the release of the episode #241 of their Geomob podcast, which covers a wide variety of issues, such as the distortion of some electoral maps and the use of drones in agriculture.
  • The Olympic torch relay route can be viewed on the Paris 2024 official website. The uMap Trajet Flamme Olympique 2024, created by @IEN52, shows all the 67 stages of the parcours, including overseas territories. Some other uMaps show the passage of the Olympic Torch in selected cities.
  • The Philippines’s Second Congressional Commission on Education and the Department of Education are partnering to conduct a comprehensive nationwide mapping of private schools starting this July. This initiative aims to inform government policies, optimise resource allocation, and enhance complementarity between the public and private education systems.
  • TomTom and East View Geospatial have partnered to provide Australia’s Department of Defence with global map data, leveraging TomTom’s Orbis Maps for accurate geospatial information critical to national security and disaster response. TomTom’s Orbis Maps is made by conflating open data from Overture and OSM with TomTom partners’ data and TomTom’s proprietary data in a controlled environment.
  • Marcus Lundblad has published his annual ‘Summer Maps’ blog post for 2024, with updates to map visualisations, improvements to search functionality and dialogue interfaces, the addition of a playground icon, support for public transport routing, and the introduction of hill shading for showing terrain topology.
  • Researchers at the Sun Yat-sen University, in collaboration with international experts, have detailed, in the Journal of Remote Sensing, a framework for building extraction using very high-resolution images in complex urban areas, addressing the limitations of existing datasets for urban planning and management.

Upcoming Events

Where What Online When Country
Salt Lake City OSM Utah Monthly Map Night 2024-07-11 flag
Lorain County OpenStreetMap Midwest Meetup 2024-07-11 flag
Amsterdam Maptime Amsterdam: Summertime Meetup 2024-07-11 flag
Berlin DRK Online Road Mapathon 2024-07-11 flag
Wildau 193. Berlin-Brandenburg OpenStreetMap Stammtisch 2024-07-11 flag
Zürich 165. OSM-Stammtisch Zürich 2024-07-11 flag
Bochum Bochumer OSM-Treffen 2024-07-11 flag
Bangalore East OSM Bengaluru Mapping Party 2024-07-13 flag
Portsmouth Introduction to OpenStreetMap at Port City Makerspace 2024-07-13 – 2024-07-14 flag
København OSMmapperCPH 2024-07-14 flag
Strasbourg découverte d’OpenStreetMap 2024-07-15 flag
Richmond MapRVA – Bike Lane Surveying & Mapping Meetup 2024-07-16 flag
England OSM UK Online Chat 2024-07-15 flag
Missing Maps London: (Online) Mid-Month Mapathon 2024-07-16
Bonn 177. OSM-Stammtisch Bonn 2024-07-16 flag
Hannover OSM-Stammtisch Hannover 2024-07-17 flag
Łódź State of the Map Europe 2024 2024-07-18 – 2024-07-21 flag
Zürich Missing Maps Zürich Mapathon 2024-07-18 flag
Annecy OSM Annecy Carto-Party 2024-07-18 flag
OSMF Engineering Working Group meeting 2024-07-19
Cocody OSM Africa July Mapathon – Map Ivory Cost 2024-07-20 flag
München Mapathon @ TU Munich 2024-07-22 flag
Stadtgebiet Bremen Bremer Mappertreffen 2024-07-22 flag
San Jose South Bay Map Night 2024-07-24 flag
Berlin OSM-Verkehrswende #61 2024-07-23 flag
[Online] OpenStreetMap Foundation board of Directors – public videomeeting 2024-07-25
Lübeck 144. OSM-Stammtisch Lübeck und Umgebung 2024-07-25 flag
Wien 72. Wiener OSM-Stammtisch 2024-07-25 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Aphaia_JP, MatthiasMatthias, PierZen, Raquel Dezidério Souto, Strubbl, TheSwavu, YoViajo, barefootstache, derFred, mcliquid, miurahr, rtnf.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Teaching AI in Schools

Saturday, 13 July 2024 03:30 UTC

Artificial Intelligence (AI) is a hot topic these days, and it’s natural to wonder how it fits into education. In this article, we will explore the best practices, concerns, and recommendations for integrating AI into school curriculums. I will also provide references to useful tools and learning materials. Importance of AI education at schools Why is there a growing interest in teaching AI in schools? AI has become deeply integrated into society, creating new applications and possibilities while also introducing ethical concerns.

A number of tools hosted on Toolforge rely on the replicated MediaWiki databases, dubbed "Wiki Replicas".

Every so often these servers have replication lag, which affects the data returned as well as the performance of the queries. And when this happens, users get confused and start reporting bugs that aren't solvable.

This actually used to be way worse during the Toolserver era (sometimes replag would be on the scale of months!), and users were well educated to the potential problems. Most tools would display a banner if there was lag and there were even bots that would update an on-wiki template every hour.

A lot of these practices have been lost since the move to Toolforge since replag has been basically zero the whole time. Now that more database maintenance is happening (yay), replag is happening slightly more often.

So to make it easier for tool authors to display replag status to users with a minimal amount of effort, I've developed a new tool: replag-embed.toolforge.org

It provides an iframe that automatically displays a small banner if there's more than 30 seconds of lag and nothing otherwise.

As an example, as I write this, the current replag for commons.wikimedia.org looks like:

The replica database (s4) is currently lagged by 1762.9987 seconds (00:29:22), you may see outdated results or slowness. See the replag tool for more details.

Of course, you can use CSS to style it differently if you'd like.

I've integrated this into my Wiki streaks tool, where the banner appears/disappears depending on what wiki you select and whether it's lagged. The actual code required to do this was pretty simple.

replag-embed is written in Rust of course, (source code) and leverages in-memory caching to quickly serve responses.

Currently I'd consider this tool to be beta quality - I think it is promising and ready for other people to give it a try, but know there are probably some kinks that need to be worked out.

The Phabricator task tracking this work is T321640; comments there would be appreciated if you try it out.

ഭാഷ തടസ്സമാകാതിരിക്കാൻ സഞ്ചാരികളെ സഹായിക്കാൻ AI Kiosk കൾ സ്ഥാപിക്കും എന്ന മന്ത്രി മുഹമ്മദ് റിയാസ് നിയമസഭയിൽ പറഞ്ഞെന്ന് പത്രത്തിൽ വായിച്ചു. നിർമിതബുദ്ധിയിൽ പ്രവർത്തിക്കുന്ന കിയോസ്കുകൾ അവർക്ക് അവരുടെ ഭാഷയിൽ മറുപടി കൊടുക്കുമെന്നാണ് മന്ത്രി പറഞ്ഞത്. ഭാഷ തടസ്സമാകാതിരിക്കാൻ സഞ്ചാരികളെ സഹായിക്കാൻ AI Kiosk കൾ സ്ഥാപിക്കും -ദേശാഭിമാനി പത്രം - ജൂലൈ 12, 2024 ചില ചോദ്യങ്ങൾ ഏതെങ്കിലും വിനോദസഞ്ചാരകേന്ദ്രത്തെക്കുറിച്ച് നിലവിൽ സഞ്ചാരികൾ അറിയുന്നതും സംശയങ്ങൾ തീർക്കുന്നതും എങ്ങനെയാണ്? അതിൽ എന്ത് പോരായ്മകളാണ് ഉള്ളത്? ഇന്റർനെറ്റ് കണക്ഷനുള്ള മൊബൈൽ ഫോണുകളിൽ ലഭ്യമല്ലാത്ത എന്തു സൗകര്യമാണ് ഈ കിയോസ്കുകളിൽ ഉണ്ടാകുക? ഇന്റർനെറ്റിൽ ലഭ്യമല്ലാതിരിക്കുകയും എന്നാൽ കിയോസ്കുകളിൽനിന്നു മാത്രം അറിയാൻ കഴിയുന്നതുമായ എന്തെങ്കിലും വിവരങ്ങൾ ഉണ്ടോ?

This Month in GLAM: June 2024

Friday, 12 July 2024 02:31 UTC