Writing NLG Pages for Wikipedia

I have written numerous survey papers about NLG over the years, mostly for academic venues.  Around ten years ago, I was discussing this with a colleague, who commented that if I wanted to write a survey which got read, I should write something for Wikipedia.  I had never considered this before, but I decided to give it a try, so I started working on the Natural Language Generation entry in Wikipedia (which was pretty minimal before I got involved), and also child entries such as Realization (linguistics) (which didnt exist before I got involved).

Overall, I think my colleague was certainly right that Wikipedia pages got read much more than my academic surveys.  Amongst other things, I think the Wikipedia NLG pages encouraged the use of the simplenlg realiser, because I used it for examples in some of the pages.  This was partially because I was very familiar with it, but also because it was far easier to explain simplenlg to a non-specialist audience than alternative realisers such as KPML.

But there were also some less-than-ideal aspects of Wikipedia, I comment on a few of these below.

Multiple Authors

Perhaps the biggest difference between a Wikipedia page and a conventional survey is lack of control.  When I write a survey paper for an an academic conference, journal, edited collection, or encyclopedia, I have complete control over what goes into it.  Of course editors and reviewers usually have to agree to publish what I write, but I have ultimate control; they cant force me to write something I disagree with, I can always just walk away.

With Wikipedia, in contrast, anyone can edit a Wikipedia page.  At its best, this is a great way of enhancing the quality of a page, as people with different insights, perspectives, and backgrounds contribute really useful content which I would not have written; and indeed also find mistakes and improve what I have written.  At its worst, though, you get people editing a Wikipedia page purely to promote their research, agenda, or company.   Unfortunately, most editors seem more interested in promotion than in enhancing the quality of the survey.  The lowest point for me was when someone from an NLG company (not Arria) tried to edit the NLG Wikipedia page to attack one of his company’s competitors.

I work part-time for Arria NLG, but I have never tried to promote Arria on the Wikipedia NLG page, thats not appropriate.  But clearly there are plenty of people out there who think Wikipedia is just another marketing venue where they promote their company, research, etc.

At one point I tried to police this kind of thing, by frequently checking the Wikipedia NLG pages and removing marketing and promotional material.  But these days I just check once in a while and remove stuff that’s really blatant.  Its a shame, I wish there was some way to stop people treating Wikipedia as just another (free) marketing channel, but I dont see how to do this.

I should add that I am thinking of adding a link from the Wikipedia NLG page to this blog.  This could certainly be considered self-promotion, but I also genuinely think (and many people have told me) that there is useful and interesting information in this blog for people interested in NLG.

No Refereeing

As an academic, I am used to having referees or editors check anything I write before it gets published; this is an essential part of the “quality assurance” process for academic publications.  But there is no refereeing or editing of Wikipedia before pages are published.  There are mechanisms for resolving disputes if two authors want to do different things to a page, but I’ve never used them.

There are definitely some good points to the lack of refereeing.  It means that people outwith the “research mainstream” can express their views, and also that material can be changed very quickly if necessary.  But there also are some bad points, in particular people can modify Wikipedia pages to say things that are clearly false or self-serving.    As a reader, I have confidence that any academic publication I read has been checked for mistakes by people who know the field well; I do not have such confidence in Wikipedia.

Academic refereeing usually checks both whether a paper is accurate and whether it is “interesting”.   It would be nice if Wikipedia pages were at least checked for accuracy (so I had confidence in what I read), but its hard to see how this could work in the Wikipedia model.

