As I wrote the Japanese crossword puzzle in 4 hours

I lazily look through the list of courses held by schoolchildren recently laid out by colleagues from Sirius ... So, what is this? “Search for combinatorial objects using SAT-solvers”? "We made a sudoku solver, Japanese crosswords and more?"

The thought pops up that the exhaustive NP-problems are reducible to one another, and in particular, are reducible to the search for the feasibility of a Boolean formula . One of the authors of Habr expressed this idea here , and frankly, for me it is like magic.

Of course, as a person who has completed a course in discrete mathematics and the complexity of algorithms, I theoretically know that tasks are reducible to one another. But it’s usually difficult to do this, and I’d better take my word for the professors and other smart people.

But then it is offered to ... SCHOOLCHILDREN! Inside, the awl began to stir in p ... creativity and said: “Well, it’s probably not difficult to fasten it down, once the students are offered. Can't I figure it out ?? Look, they promise that they will use the Python library, but I generally know the python ... ”

And the time was about 9 pm, which somewhat dulled my critical look at the complexity of the problem ... (actually, further chronicles of 4-hour programming)

21:02 The first stage of the quest - I'm trying to install the library in Python. From experience with neural networks, I already know that, in fact, code sometimes takes less time than configuring packets.

Using the classic pip install pycosat is not installed, it produces an error in the spirit of Microsoft Visual C ++ 14.0 is required .

We climb to look for the source of the problem, we get to Stackoverflow , where we find a small link to the required Visual C ++ 2015 Build Tools , apparently including the C ++ compiler.

Download, trying to install ... wow, it requires 4 GB! I actually only need a compiler, but oh well.

While this thing is being downloaded and installed, we follow the links in the program posted from Sirius ... Yes, not a lot - examples of Python projects, and a list of possible tasks for the course. No presentations, no videos to quickly understand how to work with the library.

Okay, let's start by parsing the examples. We go to queens_pycosat.py ... Wow, along the way this is the solution to the familiar problem of placing queens on a chessboard (the task is to arrange 8 queens so that they do not beat each other).

21: 10-21: 30 We understand and try to start.

The general logic begins to be traced:
SAT- () (1, 2, 3),(-1, 2 ..), (True/False).

, , .

( , ); , , 8x8 = 64 , .

(1 2 3… 8) — 1 . — [1,2,3...8]
— (9… 16) ..

— 1 !
[ 1 2] ( — ). [-1,-2],[-1,-3] .

, , . , .

( ) — (1 2… 8) ( 1 2) (...). , — .

Yes, I wasn’t so smart right away, now that I’ve figured it out ...

In general, we copy the teaching code to ourselves. To make sure that we understand something, turn off the check for diagonals - then the task turns into a task about rooks.

We start, we receive: It works! 21:35 Having got a little acquainted with the library documentation, we find out that there are challenges for finding one solution or all solutions. Tell me, my dear friend, how many rook arrangements are there on an 8x8 board? Will you find everything? The machine thought for a while, and I began to google the factorial table. A minute later, the solver replied - "40320 solutions were found." Check - really 8!, Everything is correct.

| | | | | | | |x|
| | | | | | |x| |
| | | | | |x| | |
| | | | |x| | | |
| | | |x| | | | |
| | |x| | | | | |
| |x| | | | | | |
|x| | | | | | | |











I removed the condition that the rooks do not hit each other vertically ... but how much will it be now? Must be 8 ^ 8 = 16777216. Then it dawned on me that he would take a long time to count - we interrupt the process.

Also found a quick way to derive 4 solutions without calculating everything - from everything in the same library. This is done - we find a solution, after which we add it to the exceptions of the Boolean formula ...

How to deduce several solutions in python
for cnter in range(4):
  sol = pycosat.solve(clauses)
  if isinstance(sol, list):
    print(sol)
    clauses.append([-x for x in sol]) #   -      
  else: #    - 
    return
#     ,    ,   .


In general, we get a few more solutions to the “rook problem”: 21:45 I: In general, everything is clear, if anything, now I can fasten it. I’m going to read a book and spa ... CREATIVE START: Well, you understand how everything works. And let's write a simple little crossword puzzle of Japanese crosswords! Take the simplest case - on each row and column there will be no more than one digit (and, accordingly, one block). You simply search through the possible options for the row / column, it’s quick here, and then load it into the solver. Let's do it, huh? Me: ( without feeling an ambush ) Well, come on ... it seems the truth is not for long ... For simplicity, we only take NxN square crosswords - at the same time, you can take the output code ready and think where the horizontal and vertical are not needed ...

| | | | | | | |x|
| | | | | | |x| |
| | | | | |x| | |
| | | | |x| | | |
| | | |x| | | | |
| |x| | | | | | |
| | |x| | | | | |
|x| | | | | | | |











22:10 It doesn’t work: ((
We try for the 3x3 crossword puzzle, in the 1-0-1 format (so far only blocks are horizontal, we don’t take verticals into account). It gives out all empty cells, infection, doesn’t even take into account the filled blocks ...

Okay, in detail We delve into the format of data transfer to the solver, and then something, to put it mildly, unexpected is revealed ...

, «-» ?

/ /, — . , .

[1] 3 , :
[X__],[_X_],[__X]
, ,
[1,-2,-3],[-1,2,-3],[-1,-2,3]

, …
:
[1 -2 -3] [-1, 2...] [...]

, . , . — , !

Ambush. Actually, there are two options - either to come up with the conditions for a crossword puzzle in KNF, or to come up with how to convert DNF to KNF (well, someone should have thought about this, especially since both are Boolean formulas!)

22:50 I finished the snack. I did not think of how to make conditions for KNF (later I found that they were described in the article , but it was a long time to program them, and, frankly, unsportsmanlike).

So we are looking for a converter of DNF to CNF. Formal descriptions of transformations in the academic style are scattered on the Internet ... why do I need them? I’m definitely not going to stick into them now, I would solve the problem. So, we see the only (!) Python library with the appropriate name and functions. Now we will connect ...

23:10It works funny - it makes additional variables, and through them it brings DNF to CNF. Apparently, honestly making KNF is quite difficult ...

So, it also does not work! What's the matter?
We submit the test input of DNF: lstall = [[1,2,3]]. That is, we expect that only [1,2,3] (corresponds to the full line [XXX]) will be a valid solution. The

library gives KNF: cnf = [[-1, -2, -3, 4], [1 , -4], [2, -4], [3, -4]]
It is not clear, but very interesting. Well, let's say I believe you ...

We ask pycosat to find all the solutions for cnf:
[1, 2, 3, 4]
[1, 2, -3, -4]
[1, -2, -3, -4]
[1 , -2, 3, -4]
[-1, -2, -3, -4]
[-1, -2, 3, -4]
[-1, 2, -3, -4]
[-1, 2, 3, -4]

Instead of the expected one, there are eight of them! What to do?

, , 4 True. ?

: cnf2 = [[-1, -2, -3, 4], [1, -4], [2, -4], [3, -4], [4]]
pycosat cnf2:
1: [1, 2, 3, 4]

, , ! , , .
— !

23:10 She decides! Here is a 1-3-1 crossword vertically and horizontally: Here is a double solution of a 1-0-1 crossword vertically and horizontally: Me: So, everything works for me, I'm done. Wow, it's already 11 p.m. Spa ... CREATIVE START: Do you understand that you have very little left to solve any Japanese crosswords? It’s only necessary to write one recursive procedure, which will produce all variants of the string, not for one block, but for N ... Me: Well ... it seems to be easy, yes ... 00:00 At midnight, the head tupit is cooking not so well, but I continue to debug recursion. Terms of the Japanese crossword - between the blocks there must be at least one space, but there may be more than one.

| |x| |
|x|x|x|
| |x| |




1
| | |x|
| | | |
|x| | |
2
|x| | |
| | | |
| | |x|












So under the condition [1,2,1] for 6 cells, the program should generate the only option
[X_XX_X],
and for 7 cells there are already 4 options:
[X_XX_X_],
[X_XX__X],
[X__XX_X],
[_X_XX_X],

All the time in I’m mistakenly calculating the location of cells by one, and to hell with it - it’s easier to start and fix than to think ...

00:30 Does it work? What else to check? So, where to get a machine-oriented description of crosswords ... On this site, no, on this - no ... Okay, to hell with it, I will interrupt the pens. 00:50 It really works O_O Here's a crab with jcrosswords.ru/crossword/65

3x3
rows = [[1,1],[0],[1,1]]
cols = [[1,1],[0],[1,1]]

:
|X| |X|
| | | |
|X| |X|

4x4
rows = [[1,1],[0],[0],[1,1]]
cols = [[1],[1],[1],[1]]

:
1
| |X| |X|
| | | | |
| | | | |
|X| |X| |
2
|X| |X| |
| | | | |
| | | | |
| |X| |X|







decided by my solver. The maximum size tested is 10x10, now I’m very lazy to interrupt more crosswords into the program ... - Me: Wow, it's already one in the morning! Wait, it took only 4 hours from the moment when I did not know anything about the SAT-solver? So, how was it at all? (quick notes are written about what happened ...)

|x|_|x|_|_|_|_|x|_|x|
|x|x|x|_|_|_|_|x|x|x|
|_|x|_|_|_|_|_|_|x|_|
|_|x|_|x|_|_|x|_|x|_|
|_|x|x|x|x|x|x|x|x|_|
|_|_|x|x|x|x|x|x|_|_|
|x|x|x|x|x|x|x|x|x|x|
|_|_|x|x|x|x|x|x|_|_|
|x|x|_|x|x|x|x|_|x|x|
|x|_|_|_|_|_|_|_|_|x|








Afterword.

Morning . On Habr the article about a solver on Rust is found . Downloading from crossword puzzle sites does not work, but after going to the Github article specified in the article, we find on the Webpbn crossword puzzle site a hidden link to export crosswords in a machine-readable format . The option “Format for version 1 of Kyle Keen's Python solver” is suitable for us, it is the same as ours.

When you don’t need to interrupt crosswords with pens, things go faster. It quickly becomes clear that the maximum that a solver is able to solve is crosswords in the region of 30x30. The main problem is the very generation of variants of individual lines, in the case of a heap of blocks of these conditions, VERY much and everything starts to slow down:

1. Firstly, string options take a long time to generate. Generating only all the options for the string [2,1,3,1,4] of length 60 takes already about half a minute ... (there are only 2118760, if anyone is interested)
2. Also the DNF-> CNF conversion creates too many conditions and additional variables ...

But if there are few blocks, then even large crosswords can be solved in a meaningful time.

Cat - everything is ok
20x20
51511 clauses...
2851 last var number
--- 0.05186057090759277 seconds for clauses gen ---
1
| | |X|X| | | | | | | | | | | | | | | | |
| |X|X| | | | | | | | | | | | | | | | | |
| |X| | | | | | | | | | | | | | | | | | |
| |X| | | | | | | | | | | | | | | | | | |
| |X| | | | | |X|X|X| | | | | | | | | | |
|X|X| | | | |X|X|X|X|X| | | | | | | | | |
|X| | | | |X|X|X|X|X|X|X| | | |X| | | |X|
|X| | | | |X|X|X|X|X|X|X|X| | |X|X| |X|X|
|X| | | |X|X|X|X|X|X|X|X|X| | |X|X|X|X|X|
|X|X| | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|
| |X| |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|
| |X|X|X|X|X|X|X| |X|X|X|X|X|X|X|X|X|X|X|
| | |X|X|X|X|X| | | |X|X|X|X|X| |X|X|X| |
| | |X|X|X|X|X| | | |X|X|X|X| | | | | | |
| | | |X|X|X| | | | | |X|X|X| | | | | | |
| | | |X|X| | | | | | |X|X| | | | | | | |
| | |X|X| | | | | | | | |X| | | | | | | |
| | |X| | | | | | | | | |X| | | | | | | |
| | |X|X| | | | | | | | |X|X| | | | | | |
| | |X|X| | | | | | | | |X|X| | | | | | |
--- 0.02073383331298828 seconds for solve ---


In a desire to test more crosswords, he quickly made alignment - any crossword is reduced to a square when we fill the remaining lines with zeros. Yes, it’s not optimal, but now I’m reluctant to redo the code ...

Horse - 30x38
22
13622799 clauses... -
350737 last var number
--- 14.18206787109375 seconds for clauses gen ---
1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |X|X|X| | | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|X|X| | | | |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|X|X| | | |
| | | | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|X|X|X| |X|X| | |
| | | | | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|X|X|X|X|X|X| |
| | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|
| | | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|X|X|X| |X|X|X|X|X|
| | | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|X|X| | | | | |X| |
| | | | | | | | | | |X|X|X|X| | | | | | | | | |X|X|X|X|X|X|X|X| | | | | | | |
| | | | | | | | |X|X|X|X|X|X|X|X|X| | | |X|X|X|X|X|X|X|X|X|X| | | | | | | | |
| | | | | | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| | | | | | | | |
| | | | | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| | | | | | | | | |
| | | | |X|X|X| |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| | | | | | | | | |
| | | |X|X|X| | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| | | | | | | | | |
| | |X|X|X| | | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| | | | |
| |X|X|X| | | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| | | |
|X|X|X|X| | | |X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X|X| | | |X|X| | | |
| |X|X| | | |X|X|X|X|X| | | | |X|X|X|X|X|X|X|X|X|X|X|X| | | | | | |X|X| | | |
| | |X| | | |X|X|X|X| |X| | | | | | |X|X|X|X|X|X| | | |X| | | | | |X|X| | | |
| | | | | |X|X|X|X| |X|X| | | | | | | | | | | | | |X|X|X| | | | | |X|X| | | |
| | | | |X|X|X|X| |X|X| | | | | | | | | | | | | | |X|X|X| | | | | |X|X| | | |
| | | | |X|X|X| | |X|X| | | | | | | | | | | | | | | |X|X| | | | | |X|X| | | |
| | | | |X|X| | | |X|X| | | | | | | | | | | | | | | |X|X| | | | |X|X|X| | | |
| | | |X|X|X| | | | |X|X| | | | | | | | | | | | | | |X|X| | | | |X|X| | | | |
| | | |X|X| | | | | |X|X| | | | | | | | | | | | | | |X|X| | | | |X| | | | | |
| | | |X|X| | | | | |X|X| | | | | | | | | | | | | | |X|X| | | | | | | | | | |
| | | |X|X| | | | | | |X|X| | | | | | | | | | | | | |X|X| | | | | | | | | | |
| | | |X|X| | | | | | |X|X| | | | | | | | | | | | | |X|X| | | | | | | | | | |
| | | | |X|X| | | | | | |X|X| | | | | | | | | | | | | |X|X| | | | | | | | | |
| | | | |X|X|X| | | | | |X|X|X| | | | | | | | | | | | |X|X|X| | | | | | | | |

--- 17.535892963409424 seconds for solve ---



In the original publication on the SAT-solver , conditions and variables are more honestly generated - therefore, there the solution is faster and probably takes less memory. When deciding a horse, it takes me about 2.7 GB of RAM, and it takes up some complicated things to all 12 installed, and then they start frantically swapping to disk ...

More from the conclusions:

1. It dawned on me why most quickly googled pycosat examples are Sudoku solutions. Sudoku rules are much easier to put on the solver, because you do not need to describe the complex relationship between blocks and cells, basically only the conditions "each digit in each row" and "only one digit in one row".
But I love sudoku much less, so I do not regret the decision I have chosen.

2. I liked how you can mix DNF and CNF. If you don’t want to write the right conditions - don’t; just generate all the options. Of course, this is not optimal, but it works.

Of course, I did not set records for the speed of solving crosswords, but I had a good time. The most important thing is I can do it! :) What I wish you too!

All Articles